Zero-Shot Text-Guided Object Generation with Dream Fields

Abstract

We combine neural rendering with multi-modal image and text representations to synthesize diverse 3D objects solely from natural language descriptions. Our method, Dream Fields, can generate the geometry and color of a wide range of objects without 3D supervision. Due to the scarcity of diverse, captioned 3D data, prior methods only generate objects from a handful of categories, such as ShapeNet. Instead, we guide generation with image-text models pre-trained on large datasets of captioned images from the web. Our method optimizes a Neural Radiance Field from many camera views so that rendered images score highly with a target caption according to a pre-trained CLIP model. To improve fidelity and visual quality, we introduce simple geometric priors, including sparsity-inducing transmittance regularization, scene bounds, and new MLP architectures. In experiments, Dream Fields produce realistic, multi-view consistent object geometry and color from a variety of natural language captions.

Compositional generation

The compositional nature of language allows users to combine concepts in novel ways and control generation. A template prompt describing a primary object (an armchair or a teapot) is stylized with 16 materials: avocado, glacier, orchid, pikachu, brain coral, gourd, peach, rubik's cube, doughnut, hibiscus, peacock, sardines, fossil, lotus root, pig, or strawberry. These prompt templates are sourced from DALL-E.

Related publications

Overview of DietNeRF's semantic consistency loss.

Putting NeRF on a Diet: Semantically Consistent Few-Shot View Synthesis

Ajay Jain, Matthew Tancik, Pieter Abbeel ICCV 2021 International Conference on Computer Vision

DietNeRF regularizes Neural Radiance Fields with a CLIP-based loss to improve 3D reconstruction. Given only a few images of an object or scene, we reconstruct its 3D structure & render novel views using prior knowledge contained in large image encoders.

Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields

Jonathan T. Barron, Ben Mildenhall, Matthew Tancik, Peter Hedman, Ricardo Martin-Brualla, Pratul Srinivasan ICCV 2021 International Conference on Computer Vision

NeRF is aliased, but we can anti-alias it by casting cones and prefiltering the positional encoding function. Dream Fields combine mip-NeRF's integrated positional encoding with Fourier features.

Citation

Ajay Jain, Ben Mildenhall, Jonathan T. Barron, Pieter Abbeel, Ben Poole. Zero-Shot Text-Guided Object Generation with Dream Fields. arXiv, 2021.

@article{jain2021dreamfields,
author = {Jain, Ajay and Mildenhall, Ben and Barron, Jonathan T. and Abbeel, Pieter and Poole, Ben},
title = {Zero-Shot Text-Guided Object Generation with Dream Fields},
joural = {CVPR},
year = {2022},
}

Zero-Shot Text-Guided Object
Generation with Dream Fields

CVPR 2022 and AI4CC 2022 (Best Poster)

Ajay
Jain

UC Berkeley, Google Research

Ben Mildenhall

Google Research

Jonathan T. Barron

Google Research

Pieter
Abbeel

UC Berkeley

Ben
Poole

Google Research

Abstract

Example generated objects

bouquet of flowers sitting in a clear glass vase.

a sculpture of a rooster.

a robotic dog. a robot in the shape of a dog.

matte painting of a castle made of cheesecake surrounded by a moat made of ice cream; trending on artstation; unreal engine. [ref]

a beautiful epic wonderous fantasy painting of the ocean. [ref]

matte painting of a bonsai tree; trending on artstation.

a cluster of pine trees are in a barren area.

a boat on the water tied down to a stake.

a small green vase displays some small yellow blooms.

a bus covered with assorted colorful graffiti on the side of it.

a pile of crab is seasoned and well cooked.

a tray that has meat and carrots on a table.

a snowboard standing upright in a snow bank.

Compositional generation

an archair in the shape of a .
an archair imitating a .

a teapot in the shape of a .
a teapot imitating a .

Related publications

Putting NeRF on a Diet: Semantically Consistent Few-Shot View Synthesis

Ajay Jain, Matthew Tancik, Pieter Abbeel ICCV 2021 International Conference on Computer Vision

Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields

Jonathan T. Barron, Ben Mildenhall, Matthew Tancik, Peter Hedman, Ricardo Martin-Brualla, Pratul Srinivasan ICCV 2021 International Conference on Computer Vision

Citation

Zero-Shot Text-Guided Object Generation with Dream Fields

CVPR 2022 and AI4CC 2022 (Best Poster)

AjayJain

UC Berkeley, Google Research

Ben Mildenhall

Google Research

Jonathan T. Barron

Google Research

PieterAbbeel

UC Berkeley

BenPoole

Google Research

Abstract

Example generated objects

bouquet of flowers sitting in a clear glass vase.

a sculpture of a rooster.

Compositional generation

an archair in the shape of a ____.an archair imitating a ____.

a teapot in the shape of a ____.a teapot imitating a ____.

Related publications

Putting NeRF on a Diet: Semantically Consistent Few-Shot View Synthesis

Ajay Jain, Matthew Tancik, Pieter Abbeel ICCV 2021 International Conference on Computer Vision

Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields

Jonathan T. Barron, Ben Mildenhall, Matthew Tancik, Peter Hedman, Ricardo Martin-Brualla, Pratul Srinivasan ICCV 2021 International Conference on Computer Vision

Citation

Zero-Shot Text-Guided Object
Generation with Dream Fields

Ajay
Jain

Pieter
Abbeel

Ben
Poole

an archair in the shape of a .
an archair imitating a .

a teapot in the shape of a .
a teapot imitating a .