Prior based reconstruction of neural fields

发表 2023-01-12 分类 Computer Notes 阅读量: Valine: 字数: 1.8k 阅读时长 ≈ 2 分钟

This is part of my journey of learning NeRF.

2.4. Prior-based reconstruction of neural fields

Sounds like a one-shot task: instead of fitting and optimizing a neural field each for one scene; let's learn a prior distribution of neural field. Then, given a specific scene, it adjusts the neural field in just one forward.

How does the latent code look like?

Global: not local. A small latent code represents a neural field
- main limitation: can only represent very simple (single) object. coz if you have multiple objects in a scene, the degree of freedom grows non-linearly.
- How about giving the natural language descriptions as conditions???
Local: you get different latent codes considering the locality where you are. So, you have a prior 3D data structure to store the latent codes.
- 3D point clouds -> grids -> triplanes interpolation

Convolutional Occupancy Networks

Autodecoder instead of Encoder-decoder

Encoder is a 2D CNN structure.
But while using autodecoder, the backpropogate through the forward map (i.e., the neural renderer) will give the 3D structural information to the latent codes directly. \[ \text{latent code }\hat z=\arg \min_z \|\text{Render(}\Phi)-g.t.\| \]

Instead of trying to build the encoder, sometimes just use the backpropogation through the forward map is helpful.

Light field networks -- Don't need to render anymore

Instead of learning a NeRF that you use a neural renderer to generate all points along a ray; you can learn a network to directly give you a color along a ray. So you do not use a 3d coordinate as the query, instead, use a ray.

But this do not work in complicated task yet.

Outlook

You don't need to use 600 images of a scene to reconstruct it. Synthesis images?
Open minds: other ways to skip the expensive forward map? (e.g., the light field)
Understanding the scene like humans do: disentangle different objects
Local conditioning methods? Regular grids are easy to tackle with, but it's harder for point clouds / factorized representations
Transformers: seems like local conditioning