NeRF Hybrid representations

This is part of my journey of learning NeRF.

2.2. Hybrid representations

Tradeoffs of choosing a proper representation

image-20221208172055153
image-20221208172055153
image-20221208172209556
image-20221208172209556

You may choose one proper representation depending on your own application

1. Grid

image-20221205195659841
image-20221205195659841

Input is too huge. Then you need too huge neural network. So, this grid interpolation acts like a "position encoding", which encodes the low dimensional features into high dims.

image-20221208162026398
image-20221208162026398

NeRFusion CVPR22: online!

2. point cloud

image-20221208162541770
image-20221208162541770

Cons:

  1. To access local points, you need to specifically design the data structure. Otherwise, it is O(n)!
  2. Choose different kernels to retrieve nearby points' features. Oftentimes you assume it is local kernel.

image-20221208163050867

3. Mesh

Unstructed grids. Compared with point clouds, meshes have connectivity info.

image-20221208163526289
image-20221208163526289
image-20221208163746237
image-20221208163746237

4. Multiplanar Images

Something like project a 3D grid into an axis to get levels of planes.

image-20221208164038729
image-20221208164038729

Pros:

  1. Compact
  2. Very efficient because the hardware and software designs are accelerated to these 2D operations, like bi-linear operations.

Cons:

  1. Resolution bias on plane axis: coz it is discrete betweens planes.

This is not very wise in my opinion. It is just a temporary tradeoff given nowadays' technologies. Coz everything will be 3D in the future.

image-20221208165534056Generate 2D images from different camera views (perhaps). Key point is the tri-plane representation of 3D features.

5. Multiresolution grids

image-20221208165714329
image-20221208165714329

Pros:

  1. Stable coz you indeed need both low and high resolution info

6. Hash grids

image-20221208170131069 \[ [x,y,z]\text{ coordinates}\rightarrow \text{Hash function()} \rightarrow \text{Fixed size codebook} \] Pros:

  1. No matter how big is the original data, you can use a fixed size codebook as the input feature.
  2. Can be online!

Cons:

  1. May still need large codebooks
  2. Features not spatially local. I don't think the hash grid is a good idea if this drawback exists. But isn't there a simple way to generate features with local info remaining?

7. Codebook grids

image-20221208170955887
image-20221208170955887

Instead of storing features of points in grids, store a (index to a) code in a codebook. The size of the codebook is fixed, so the overall size can be controlled as much smaller.

cons:

  1. To make the indexing operation differentiable, the computing complexity rises here.
  2. Using hash is to get rid of the complex data structure, but the indices bring it back.

8. Bounding Volume Hierarchies

image-20221208171806113
image-20221208171806113

Commonly used method in computer graphics

9. Others (voxel)

image-20221208173124734
image-20221208173124734
  • For dynamic nerfs, is there any better hybrid representation? Sure.
  • Is there any explicit bias of these hybird representations that we can discover and then design regularization? Sure.