Does using text-to-image models like Stable Diffusion in the creative process affect what people make in the physical world?
Work for a Member company and need a Member Portal account? Register here with your company email address.
Does using text-to-image models like Stable Diffusion in the creative process affect what people make in the physical world?
Last fall, Media Lab PhD students Hope Schroeder and Ziv Epstein ran a workshop with Amy Smith, a visiting student in the Viral Communications research group, as part of AI Alchemy Lab. At a trash-themed social event, 30 participants opted into a research study where they visualized ideas for a sculpture using a generative text-to-image model. This pilot study sought to understand if using a text-to-image model as part of the ideation process informs the design of objects in the physical world. The paper describing the findings from this pilot study is being presented today, 2/13, in Washington DC at AAAI as part of the first Creative AI Across Modalities workshop.
The study found that seeing AI-generated images before making a sculpture did inform what people created; 23/30 participants reported that the images they generated informed their design. Here, generated images informed a sculpture of a building.
Here, a participant created a bottle robot sculpture after seeing some generated images:
We noticed that participants varied in the amount of conceptual exploration they did through “prompting” the model, with some participants using the images to explore ideas and others using the images to refine existing ones.
“Refiners” made minor edits to a main idea through prompting:
“Rephrasers” had a main concept but reworded it between prompts:
“Explorers” gave largely unrelated prompts, showing a high degree of conceptual exploration across prompts:
We created a computational measure of conceptual exploration over a participant’s prompting journey by taking the average cosine distance between prompt embeddings. The image below shows an example of each of the three styles that emerged:
The average semantic distance a participant traveled in their prompting journey during the visualization activity was lower if participants had a sculpture idea at the start of the activity than if they did not. This suggests participants who started visualization with ideas already used image generation as an opportunity to “exploit” or refine ideas, traveling less average semantic distance than those who were unsure what to build and used the images to explore. To better support creators, text-to-image tools could identify a user’s semantic distance traveled over a prompting session to suggest hints that are useful to their current design stage.
Participants in the activity had some great Media Lab fun in the process, using play to interrogate new technologies and gain scientific insight in the process.
This effort was a collaboration between Amy Smith (QMUL/IGGI), Hope Schroeder, Ziv Epstein, and Andy Lippman at the Media Lab, Mike Cook at King’s College London, and Simon Colton at QMUL.