On Friday, Nvidia researchers announced Magic3D, an AI model that can generate 3D models from textual descriptions. After entering a prompt such as “Blue poison dart frog sitting on a waterlily”, Magic3D generates a 3D mesh model, complete with color texture, in about 40 minutes. With modifications, the resulting model can be used in video games or CGI art scenes.
In its academic paper, Nvidia frames Magic3D as a response to DreamFusion, a text-to-3D model that Google researchers announced in September. Similar to how DreamFusion uses a text-to-image model to generate a 2D image that is then optimized into volumetric NeRF (Neural Radiation Field) data, Magic3D uses a two-step process that takes a raw model generated in low resolution and optimizes it. to a higher resolution. According to the authors of the paper, the resulting Magic3D method can generate 3D objects two times faster than DreamFusion.
Magic3D can also do instant-based editing of 3D meshes. Based on a low-resolution 3D model and a basic prompt, it is possible to change the text to change the resulting model. Also, the authors of Magic3D prove to maintain the same theme through several generations (a concept often called coherence) and apply the style of a 2D image (such as a cubist painting) to a 3D model.
Nvidia did not release any Magic3D code along with its academic paper.
The ability to generate 3D from text feels like a natural evolution in today’s diffusion models, which use neural networks to synthesize new content after intensive training on a body of data. Only in 2022, we saw the emergence of capable text-to-image models such as DALL-E and Stable Diffusion and rudimentary text-to-video generators from Google and Meta. Google also debuted the aforementioned DreamFusion text-to-3D model two months ago, and since then people have adapted similar techniques to work as an open-source model based on Stable Diffusion.
As for Magic3D, the researchers behind it hope it will allow anyone to create 3D models without the need for special training. Once refined, the resulting technology could accelerate the development of video games (and VR) and perhaps eventually find applications in special effects for film and television. Near the end of their article, they write, “We hope with Magic3D, we can democratize 3D synthesis and open up everyone’s creativity in 3D content creation.”