An exploration of how advances in deep learning and generative models can be used to help us synthesize our ideas.
Work for a Member company and need a Member Portal account? Register here with your company email address.
An exploration of how advances in deep learning and generative models can be used to help us synthesize our ideas.
There are several software tools available that help us be productive while working remotely, but few solutions exist that help us be creative in remote environments. Shared virtual whiteboarding environments and multi-user CAD applications are examples of collaboration tools that somewhat serve this need, they allow participants to explore open-ended ideas together. However, these tools require explicit definition from users to build a working model for shared understanding. For example, effective whiteboarding requires users to have some drawing abilities, while 3D modeling has an even steeper learning curve - every spline of a design must be defined.
Our work on Computer-Aided Synthesis considers advances in deep learning and generative models to extend Seymour Papert’s ideas around Tools to Think With and explores the creation of Tools to Synthesize With - tools that allow users to implicitly synthesize complex ideas. This is an early prototype, built in 2 weeks, to demonstrate the concept.
To clarify, the goal is not for the computer to do the creating, but rather to augment the ability of humans by offering inspiration and a starting point for further (human) refinement.
The video above shows a webpage capable of supporting multi-user sessions for collaboration. In the center of the page we have the output of a generative neural network. The neural net, which is StyleGAN2, was trained on a large dataset of cars to generate somewhat realistic images of cars that don’t actually exist. For this proof of concept, we assume a team of industrial designers collaborating on the synthesis of a new car design.
The generative model enables the team to quickly generate a starting point as the output design, which serves as a working model for the entire team to visualize and immediately share an understanding around. The UI allows each user to upload images of existing cars as a means of implicit definition, to guide the output design. The uploaded images are projected into the latent space of the generative model, a low dimensional space where we can represent car images as a vector of numbers - a set of coordinates where similar looking cars exist. These uploaded images appear in the UI as targets for team members to select and “steer” the output towards a car with similar characteristics as the selection. As the team iteratively chooses different targets, the output becomes a synthesis of these selections. The influence of each target on the synthesized design can be seen from the width of the line connecting the target to the output - a wider line corresponds to a stronger influence.
When users select different directions to steer the design at the same time, the output model will converge towards an image projected in latent space that is equidistant from all selections. Users will also have the option to enable additional computer input, “wandering”, that results in a less direct path towards the projected selection - a scenic route with additional output variations along the way.
The current prototype demonstrates an interaction method that enables a team to use a generative model for implicitly synthesizing a new visual design. We are now generalizing this demonstration to support different generative models and synthesized subjects. There are several others directions we are exploring to extend this prototype, a few of these directions include: allowing users to provide input via sketch or voice descriptions, outputting 3D models instead of images, and disentangling our generative model(s) for users to control what specific characteristics of target images are reflected in the output design. If you have any questions or comments, or if you have a use case that could benefit from extending this work, please send an email to Kevin Dunnell (dunnell@media.mit.edu).