This is part of my journey of learning NeRF.

# 1. Introduction to NeRF

## What is NeRF

Reference: Original NeRF paper; an online ariticle

NeRF 想做这样一件事，不需要中间三维重建的过程，仅根据位姿内参和图像，直接合成新视角下的图像。为此 NeRF 引入了辐射场的概念，这在图形学中是非常重要的概念，在此我们给出渲染方程的定义：

### Implementation

#### MLP Structure

1. The net is constrained to be multi-view consistent by restricting the predicting of $$\sigma$$ to be independent of viewing direction
2. While the color $$\bold c$$ depends on both viewing direction and in-scene coordinate.

How is this implemented?

The MLP is designed to be two-stages:

1. $$F_{\theta_1}(\bold x) = (\sigma, \text{<256 dim features>})$$
2. $$F_{\theta_2}(\text{<256 dim features>}, \bold d)=\bold c$$

#### Novel view synthesis

For each pixel, sample points along the camera ray through this pixel;

For each sampling point, compute local color and density;

Use volume rendering, an integral along the camera ray through pixels is used: $C(\bold r)=\int_{t_1}^{t_2} T(t)\cdot \sigma (\bold r(t))\cdot \bold c(\bold r(t),\bold d)\cdot dt \\ T(t)=\exp (-\int_{t_1}^t \sigma(\bold r(u))\cdot du)$ We can get the color C of the pixel.

This can be implemented by sampling approaches.

Now everything can be approximated: $\hat C(\bold r)=\sum_{i=1}^N \alpha_iT_i\bold c_i \\ T_i=\exp (-\sum_{j=1}^{i-1}\sigma_i\delta_j) \\ \alpha_i=1-\exp(\sigma_i\delta_i)\\ \delta_i=\text{distance between sampling point i and i+1}$

• Loss is just L2 on color of the pixels:

$L=\sum_{r\in R}\| \hat C(\bold r)-C_{gt}(\bold r)\|^2_2$

#### Depth regularization

Similar to the above formulas, expected depth can also be calculated, and can be used to regularize the depth smoothness.

#### Positional encoding

It is required to greatly improve the fine detail results.

There are many other positional encoding techs, including trainable parametric, integral, and hierarchical variants

### SDF - Signed Distance Function

SDF是一种计算图形学中定义距离的函数。SDF定义了空间中的点到隐式曲面的距离，该点在曲面内外决定了其SDF的正负性。

## Features of NeRF

• Representation can be discrete or continuous. but the discrete representation will be a big one if you have more dimensions, e.g., 3 dim.
• Actually the Plenoxels try to use 3D grids to store the fields. Fast, however, too much memory.
1. Compactness 紧致:
2. Regularization: nn itself as inductive bias makes it easy to learn
3. Domain Agonostic: cheap to add a dimension
• also problems
• Editability / Manipulability
• Computational Complexity
• Spectral Bias

## Problem Formulation

• Input: multiview images
• Output: 3D Geometry and appearance
• Objective:

$\arg \min_x\|y-F(x)\|+\lambda P(x)$

y is multiview images, F is forward mapping, x is the desired 3D reconstruction.

F can be differentiable, then you can supervise this.

• nn本身就是某种constraints，你就不需要加太多handicraft constraints