Introduction to NeRF

This is part of my journey of learning NeRF.

1. Introduction to NeRF

What is NeRF

Reference: Original NeRF paper; an online ariticle

在已知视角下对场景进行一系列的捕获 (包括拍摄到的图像,以及每张图像对应的内外参),合成新视角下的图像。

NeRF 想做这样一件事,不需要中间三维重建的过程,仅根据位姿内参和图像,直接合成新视角下的图像。为此 NeRF 引入了辐射场的概念,这在图形学中是非常重要的概念,在此我们给出渲染方程的定义:

那么辐射和颜色是什么关系呢?简单讲就是,光就是电磁辐射,或者说是振荡的电磁场,光又有波长和频率,\(波长\times 频率=光速\),光的颜色是由频率决定的,大多数光是不可见的,人眼可见的光谱称为可见光谱,对应的频率就是我们认为的颜色:


MLP Structure

  1. The net is constrained to be multi-view consistent by restricting the predicting of \(\sigma\) to be independent of viewing direction
  2. While the color \(\bold c\) depends on both viewing direction and in-scene coordinate.

How is this implemented?

The MLP is designed to be two-stages:

  1. \(F_{\theta_1}(\bold x) = (\sigma, \text{<256 dim features>})\)
  2. \(F_{\theta_2}(\text{<256 dim features>}, \bold d)=\bold c\)

Novel view synthesis

For each pixel, sample points along the camera ray through this pixel;

For each sampling point, compute local color and density;

Use volume rendering, an integral along the camera ray through pixels is used: \[ C(\bold r)=\int_{t_1}^{t_2} T(t)\cdot \sigma (\bold r(t))\cdot \bold c(\bold r(t),\bold d)\cdot dt \\ T(t)=\exp (-\int_{t_1}^t \sigma(\bold r(u))\cdot du) \] We can get the color C of the pixel.

This can be implemented by sampling approaches.

Now everything can be approximated: \[ \hat C(\bold r)=\sum_{i=1}^N \alpha_iT_i\bold c_i \\ T_i=\exp (-\sum_{j=1}^{i-1}\sigma_i\delta_j) \\ \alpha_i=1-\exp(\sigma_i\delta_i)\\ \delta_i=\text{distance between sampling point i and i+1} \]

  • Loss is just L2 on color of the pixels:

\[ L=\sum_{r\in R}\| \hat C(\bold r)-C_{gt}(\bold r)\|^2_2 \]

Depth regularization

Similar to the above formulas, expected depth can also be calculated, and can be used to regularize the depth smoothness.

Positional encoding

It is required to greatly improve the fine detail results.

There are many other positional encoding techs, including trainable parametric, integral, and hierarchical variants

SDF - Signed Distance Function


相较于其他像点云(point cloud)、体素(voxel)、面云(mesh)那样的经典3D模型表示方法,SDF有固定的数学方程,更关注物体的表面信息,具有可控的计算成本。

Features of NeRF

  • Representation can be discrete or continuous. but the discrete representation will be a big one if you have more dimensions, e.g., 3 dim.
    • Actually the Plenoxels try to use 3D grids to store the fields. Fast, however, too much memory.
  • Neural Field has advantages:
    1. Compactness 紧致:
    2. Regularization: nn itself as inductive bias makes it easy to learn
    3. Domain Agonostic: cheap to add a dimension
  • also problems
    • Editability / Manipulability
    • Computational Complexity
    • Spectral Bias

Problem Formulation

  • Input: multiview images
  • Output: 3D Geometry and appearance
  • Objective:

\[ \arg \min_x\|y-F(x)\|+\lambda P(x) \]

y is multiview images, F is forward mapping, x is the desired 3D reconstruction.

F can be differentiable, then you can supervise this.

  • nn本身就是某种constraints,你就不需要加太多handicraft constraints