Generative AI is pretty impressive in terms of its fidelity these days, as viral memes like Balenciaga Pope would suggest. The latest systems can conjure up scenescapes from city skylines to cafes, creating images that appear startlingly realistic — at least on first glance.
But one of the longstanding weaknesses of text-to-image AI models is, ironically, text. Even the best models struggle to generate images with legible logos, much less text, calligraphy or fonts.
But that might change.
Last week, DeepFloyd, a research group backed by Stability AI, unveiled DeepFloyd IF, a text-to-image model that can “smartly” integrate text into images. Trained on a data set of more than a billion images and text, DeepFloyd IF, which requires a GPU with at least 16GB of RAM to run, can create an image from a prompt like “a teddy bear wearing a shirt that reads ‘Deep Floyd'” — optionally in a range of styles.