One of the most interesting achievements of AI research in 2021 is DALL.E, OpenAI’s deep learning system that can build pictures from text descriptions. The 12-billion parameter version of GPT-3 deep neural network shows some unexpected abilities. It can design new objects by combining unrelated concepts, build cartoonish humanized versions of objects and pets, make funny chimeras, render text, apply different textures and styles to 3D objects and even infer unmentioned visual features of a scene.
The most famous project by OpenAI, GPT-3 is a deep neural network capable of generating natural-looking text based on instructions in human language. DALL.E uses the same kind of network to modify images based on natural language prompts. Trained by a vast, unspecified dataset of image-caption pairs (probably extracted from the web), it takes as input a 256-token caption and optionally an incomplete image with an empty rectangle at its right-bottom corner, in a single token stream, and generates 32* 32 pixel picture following these prompts from scratch.
However, the team is concerned about the ethical aspects and societal impact of its research. From endangering certain jobs to producing biased outputs and long-term effects on the economy and politics, it seems they are predicting and analyzing the ethical challenges their work may create before making it accessible for public use. The model is not publicly available and still, there is no paper published explaining their work. For now, we should be content to see some very interesting results are available on their website to view, and you can try different versions of pre-generated prompts. The laboratory claims that although the results presented on their website have been selected from the outputs using automatic scoring, they aren’t manually cherry-picked.
In a series of tests, DALL.E was prompted to alter the features of certain objects. In most cases it can acceptably reshape, recolor, repeat and change the texture of a specified subject. It can melt the geometry of daily objects into familiar polygons and polyhedrons, although it shows faults when the shapes are too “odd” for the object, e.g. a pentagon manhole cover. In this case, repeating or rewording the prompt results in higher quality outputs.
DALL.E renders alternative textures on common objects in a near-efficient manner. It illustrates multiple things when count, related positioning, and arrangement – like a stack, a grid, or a batch of something- and qualities of each one are clearly described. However, this one is one of the more troublesome tasks for DALL.E: When asked to do more than three, or the wording is a little opaque, very few of the results demonstrate the expected scene. Controlling the association of properties to objects is another challenge for the AI where it shows relative -although fragile in respect to the wording of prompts- success at first, but fails more and more as the complexity of the task increases.
DALL.E also manipulates the 3D representation and perspective of objects, animals, and plants. In many cases when demonstrating the subject from a different view, it infers knowledge of the new parts that are absent in the image prompt. It can render the subject in different styles as prompted, sometimes even plausibly simulating the pose according to the actual bone structure of the animal – that’s fantastic- and rebuild hidden parts of the human head plausibly. Inferring cross-sectional and macroscopic views of subjects are another strong capability of DALL.E advertised in the reports. Contextual details like the effects of timing, drawing style, and medium are also well estimated by the deep network.
By combining unrelated concepts and conditions, DALL.E shows some simulation of creativity. In fact, some of the pictures it produces are absolutely adorable, like these cute, brilliant illustration of a walking radish:
OpenAI claims its newest project shows significant capabilities to be later used in the area of design. The report contains samples of garments and interior design generated by DALL.E that seem very promising. Like GPT-3 more innovative use cases for this new AI tool will be discovered as time passes after it is released. We should wait until the scientists finally publish a paper to learn more about the inner workings of DALL.E and the dataset used to train it. Many AI enthusiasts, involving our data scientists at Artin, love to take a closer look at the designer of these beautiful living rooms sooner.