Prior based reconstruction of neural fields

This is part of my journey of learning NeRF.

2.4. Prior-based reconstruction of neural fields

Sounds like a one-shot task: instead of fitting and optimizing a neural field each for one scene; let's learn a prior distribution of neural field. Then, given a specific scene, it adjusts the neural field in just one forward.

image-20221211234430727
image-20221211234430727

How does the latent code look like?

image-20221211234923290
image-20221211234923290
  • Global: not local. A small latent code represents a neural field
    • main limitation: can only represent very simple (single) object. coz if you have multiple objects in a scene, the degree of freedom grows non-linearly.
    • How about giving the natural language descriptions as conditions???
  • Local: you get different latent codes considering the locality where you are. So, you have a prior 3D data structure to store the latent codes.
    • 3D point clouds -> grids -> triplanes interpolation

Convolutional Occupancy Networks

Autodecoder instead of Encoder-decoder

image-20221212005012783
image-20221212005012783
  • Encoder is a 2D CNN structure.

  • But while using autodecoder, the backpropogate through the forward map (i.e., the neural renderer) will give the 3D structural information to the latent codes directly. \[ \text{latent code }\hat z=\arg \min_z \|\text{Render(}\Phi)-g.t.\| \] image-20221212004946040

Instead of trying to build the encoder, sometimes just use the backpropogation through the forward map is helpful.

Light field networks -- Don't need to render anymore

image-20221212005908926
image-20221212005908926

Instead of learning a NeRF that you use a neural renderer to generate all points along a ray; you can learn a network to directly give you a color along a ray. So you do not use a 3d coordinate as the query, instead, use a ray.

But this do not work in complicated task yet.

image-20221212010316991
image-20221212010316991

Outlook

  • You don't need to use 600 images of a scene to reconstruct it. Synthesis images?
  • Open minds: other ways to skip the expensive forward map? (e.g., the light field)
  • Understanding the scene like humans do: disentangle different objects
  • Local conditioning methods? Regular grids are easy to tackle with, but it's harder for point clouds / factorized representations
  • Transformers: seems like local conditioning