Generating Images from Text with AI

Recently, I discovered a fascinating artificial intelligence (AI) technology that has allowed me to generate realistic images from text. This incredible tool uses a diffusion model, a generative AI technique. There are several popular models such as Stable Diffusion (developed by Stability AI), DALL-E (from OpenAI), and Midjourney, all based on diffusion architectures that convert text into images.

What is Stable Diffusion?

Stable Diffusion is an open-source latent diffusion model (LDM) developed by Stability AI in collaboration with CompVis Group and Runway. Unlike DALL-E (which is proprietary to OpenAI), Stable Diffusion was released as open-source, allowing anyone to use and modify it. Unlike other similar solutions, Stable Diffusion is open-source, meaning it is accessible to all developers interested in using and modifying it according to their needs.

How to Generate Images from Text

The process of generating images from text is truly fascinating. To get started, it is important to have suitable hardware, as the process can be slow without the necessary processing power. If you do not have powerful hardware, you can use cloud tools like Google Colab to leverage the computational power they offer.

Next, I will show you an example of how I generated images using this technology:

Step 1: Preparing the Dataset

To generate images of my little dog named Manchita, I collected a set of photos of her in different poses. However, the results were not satisfactory, as the generated images were not very realistic.

You can see the images; the ones above are the real ones, and the ones below are the generated ones. Images of my dog Manchita

Then, I decided to try with the dataset of Manchita's sister, named Yeye. This time, the results were much better.

The following are real images of Yeye:

Step 2: Generating Images from Text

Using the Stable Diffusion model, I entered different texts or "prompts" to generate images of Yeye. Below are the prompts and the generated images:

"Photo of my_yeye, digital painting" Generated images of Yeye

"Photo of my_yeye" Generated images of Yeye

"my_yeye with flowers" - "my_yeye is blue with hat" -"my_yeye with hat" Generated images of Yeye

"my_yeye is a tattoo" Generated images of Yeye

"my_yeye is brown" - "my_yeye is green" - "my_yeye is blue" - "my_yeye is white" Generated images of Yeye

"my_yeye in the pool" Generated images of Yeye

"A portrait of an anthropomorphic cyberpunk greyhound my_yeye eating a donut, cyberpunk!, fantasy, elegant, digital painting, artstation, concept art, matte, sharp focus, illustration, art by josan gonzalez" Generated images of Yeye

These images showcase the incredible power of AI to generate visual content from text.

Conclusions

The technology for generating images from text using AI has advanced significantly in recent years. Thanks to tools like Stable Diffusion, it is possible to create realistic, high-quality images with just a few text commands. This has great potential for various applications, such as generating dynamic content for websites, product images, digital portraits, and much more.

If you are interested in learning more about this technology and how to implement it in your projects, I recommend exploring the references provided below.

References:

This file allows you to generate images from text, using a dataset of images that you add, all detailed. DreamBooth_Stable_Diffusion.ipynb
OpenAI (2022). DALL-E
REVISTA BBVA (2022). Link

Image Generation with AI: Technological Advancement in the Field of Artificial Intelligence