Does AI understand the term diversity?

Generative imaging models continue to make news, New York Times article published An A.I.-Generated Picture Won an Art Prize. Artists Aren’t Happy.

Covering the repercussions of graphic designer Jason Allen winning an award for digital art or digitally manipulated photography using an image generated by the AI ​​Midjourney model as the basis.

The claim that the winning image had been generated by a model sparked a wave of controversy over whether there is a place in art for Artificial Intelligence, or whether that means, again, that art is dead. Allen on his side argued that using a model to implement his idea does not imply that there is no creative challenge. Underlining the importance of choosing the words necessary to generate the image and then there is a job as a digital artist on it to improve it.

In June of this year the news came out that Cosmo generated for the first time in the world a magazine cover using artificial intelligence, using the Dalle-2 model, going as far as dimensioning on the very lid that only took 20 seconds to make.

In our previous blog What would be the ideal Uruguay alternative jersey design? Pablo Molina used 4 imaging models, including Midjourney and Dalle-2, to reimagine Uruguay’s alternate jersey.

Applications to facilitate graphic design from image to text are undeniable and are gaining ground in industries. Some of them can be to generate images without copyrights, to reimagine logos, to illustrate stories, to change the style to images and Why not visualize possible products from sketches?.

These models dazzle us with their potential and are making aggressive competition in the world of graphic design. However, you have to understand their limitations: they work from short sentences, written in English, to generate the images. Allen was somewhat right that knowing how to convey what we want is the unavoidable part of the creative process.

This task of generating the correct text leads us to the eternal question that has powered Natural Language Processing (NLP) since its origins, how well do these models understand what I ask of them? How can I communicate optimally to get the image I want?

At the end of the year and commemorating the 30th anniversary of the first diversity march, I wanted to take the opportunity to test these models’ understanding of poetry, not by asking them directly for an image about diversity, but by passing them 4 poems written around my feelings about diversity, to explore what concept they grasp of something as subjective as the poetic text.

DreamStudio – Stable Difussion

Crayon – Dall-e mini

MidJourney

DreamStudio – Stable Difussion

Crayon – Dall-e mini

MidJourney

DreamStudio – Stable Difussion

Crayon – Dall-e mini

MidJourney

DreamStudio – Stable Difussion

Crayon – Dall-e mini

MidJourney

An added difficulty of this experiment was that the poems are written in Spanish and in the translation they also lose their poetic structure a bit. In any case, we saw how each model made their interpretation.

We see how Mini Dall-e a more literal model is, it could be more appropriate to generate images when we have a precise idea. For cases in which we can capture in a few words, since we see how it focuses on known words. For example, in the poem “celebrate our uniqueness/ collectively/ win spaces/ where the rare is the norm” he uses the word space to generate the images.

Stable diffusion in DreamStudio I consider it appropriate for creative tasks, its interpretation of the poems is highly interesting. You even have the option of building the image on a base image, helping us to improve the design or adapt it to our idea to, for example, work on style changes or generate varieties of an image.

While Midjourney, also managed to generate his own interpretation of the poems which I think is close to a generic concept of the poems. Its images have a slightly more hyper-realistic look that can really help the illustration. Additionally, because the interaction is through a discord server, you can see the images generated by the community in real time. Generating a catalog of examples of possible images from which to draw inspiration.

Finally, I close this blog with a poem that I could not include in the models because it is too long, as a final reflection on what diversity means to me and why I celebrate it.

Camila Palomeque
Data & Analytics Consultant