Mashaan blog

NeRF: Neural Radiance Fields

Code releases

Original algorithm:
- TF repository: The one released with the paper.
- JAX repository: The most recent, and referred to frequently in this tutorial.
Followup Works:
- Mip-NeRF: This implementation is written in JAX, and is a fork of Google’s JaxNeRF implementation.
- MultiNeRF: The code release for three CVPR 2022 papers: Mip-NeRF 360, Ref-NeRF, and RawNeRF. This implementation is written in JAX, and is a fork of Mip-NeRF.

References

@inproceedings{mildenhall2020nerf,
title     ={NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis},
author    ={Ben Mildenhall and Pratul P. Srinivasan and Matthew Tancik and Jonathan T. Barron and Ravi Ramamoorthi and Ren Ng},
year      ={2020},
booktitle ={ECCV},
}

@software{jaxnerf2020github,
author  = {Boyang Deng and Jonathan T. Barron and Pratul P. Srinivasan},
title   = {JaxNeRF: an efficient JAX implementation of NeRF},
url     = {https://github.com/google-research/google-research/tree/master/jaxnerf},
version = {0.0},
year    = {2020},
}

Ray casting

Ray casting is the process in a ray tracing algorithm that shoots one or more rays from the camera (eye position) through each pixel in an image plane.

image source: https://developer.nvidia.com/discover/ray-tracing

NeRF input

NeRF Input is a set of 2D images along with their corresponding camera poses. NeRF can also use the sparse points produced by Structure-from-Motion.

drawings-01 002

Concept	Description
Ground Truth	Pixels in training images.
Training Samples	3D points along the ray.
Loss Function	How close are we to reconstruct the color from the ground truth pixel? All 3D points along the ray contribute to the final color of the pixel.
Objective	Overfit the network as much as we can to match the colors in training images. Test images can be used to test how our network performs on angles that are not present in the training set.

Volumetric formulation for NeRF

image source: https://graphics.stanford.edu/courses/cs348n-22-winter/LectureSlides/FinalSlides/leo_class_nerf_2022.pdf

Hierarchical volume sampling

image source: https://jaminfong.cn/neusample/

Positional encoding

Fourier features let networks learn high frequency functions in low dimensional domains

image source: https://bmild.github.io/fourfeat/

Evaluation Metrics

Peak signal-to-noise ratio (PSNR)

PSNR is commonly used to quantify reconstruction quality for images and video subject to lossy compression. But in NeRF, PSNR is used to compare a training image with a rendered image of the radiance field. The rendered image is taken from the same angle as the training image.

Screenshot 2025-02-06 at 9 29 49 PM

where:

$I(i,j)$ is the rendered image.
$K(i,j)$ is the training image.
${MAX}_I$ is the maximum possible pixel value of the image.

Structural similarity index measure (SSIM)

Given two windows $x$ and $y$ of size $N \times N$, SSIM is calculated as:

$SSIM = \frac{(2 \mu_x \mu_y + c_1)(2 \sigma_{xy} + c_2)}{(\mu_x^2+\mu_y^2+c_1)(\sigma_x^2+\sigma_y^2+c_2)}$

where:

$\mu_x$ the pixel sample mean of $x$
$\mu_y$ the pixel sample mean of $y$
$\sigma_x^2$ the variance of $x$
$\sigma_y^2$ the variance of $y$
$\sigma_{xy}$ the covariance of $x$ and $y$
$c_1=(k_1L)^2$ , $c_2=(k_2L)^2$ two variables to stabilize the division with weak denominator
$L$ the dynamic range of the pixel-values (typically this is $2^\text{bits per pixel} - 1$)
$k_1=0.01$ and $k_2=0.03$

source: PSNR, SSIM

Screenshot 2025-01-20 at 6 49 40 AM