You know how current text-to-image models work, right? They start out with noise and they transform it to make an image.
I wonder what happens if I do that with text
something like that