Mashaan blog

NeRF: Neural Radiance Fields

Code releases

Original algorithm:
- TF repository: The one released with the paper.
- JAX repository: The most recent, and referred to frequently in this tutorial.
Followup Works:
- Mip-NeRF: This implementation is written in JAX, and is a fork of Google’s JaxNeRF implementation.
- MultiNeRF: The code release for three CVPR 2022 papers: Mip-NeRF 360, Ref-NeRF, and RawNeRF. This implementation is written in JAX, and is a fork of Mip-NeRF.

References

@inproceedings{mildenhall2020nerf,
title     ={NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis},
author    ={Ben Mildenhall and Pratul P. Srinivasan and Matthew Tancik and Jonathan T. Barron and Ravi Ramamoorthi and Ren Ng},
year      ={2020},
booktitle ={ECCV},
}

@software{jaxnerf2020github,
author  = {Boyang Deng and Jonathan T. Barron and Pratul P. Srinivasan},
title   = {JaxNeRF: an efficient JAX implementation of NeRF},
url     = {https://github.com/google-research/google-research/tree/master/jaxnerf},
version = {0.0},
year    = {2020},
}

Ray casting

Ray casting is the process in a ray tracing algorithm that shoots one or more rays from the camera (eye position) through each pixel in an image plane.

image source: https://developer.nvidia.com/discover/ray-tracing

NeRF input

Volumetric formulation for NeRF

image source: https://graphics.stanford.edu/courses/cs348n-22-winter/LectureSlides/FinalSlides/leo_class_nerf_2022.pdf

Hierarchical volume sampling

image source: https://jaminfong.cn/neusample/

Positional encoding

Fourier features let networks learn high frequency functions in low dimensional domains

image source: https://bmild.github.io/fourfeat/

Evaluation Metrics

Peak signal-to-noise ratio (PSNR)

PSNR is commonly used to quantify reconstruction quality for images and video subject to lossy compression. But in NeRF, PSNR is used to compare a training image with a rendered image of the radiance field. The rendered image is taken from the same angle as the training image.

Screenshot 2025-02-06 at 9 29 49 PM

where:

$I(i,j)$ is the rendered image.
$K(i,j)$ is the training image.
${MAX}_I$ is the maximum possible pixel value of the image.

Structural similarity index measure (SSIM)

Given two windows $x$ and $y$ of size $N \times N$, SSIM is calculated as:

$SSIM = \frac{(2 \mu_x \mu_y + c_1)(2 \sigma_{xy} + c_2)}{(\mu_x^2+\mu_y^2+c_1)(\sigma_x^2+\sigma_y^2+c_2)}$

where:

$\mu_x$ the pixel sample mean of $x$
$\mu_y$ the pixel sample mean of $y$
$\sigma_x^2$ the variance of $x$
$\sigma_y^2$ the variance of $y$
$\sigma_{xy}$ the covariance of $x$ and $y$
$c_1=(k_1L)^2$ , $c_2=(k_2L)^2$ two variables to stabilize the division with weak denominator
$L$ the dynamic range of the pixel-values (typically this is $2^\text{bits per pixel} - 1$)
$k_1=0.01$ and $k_2=0.03$

source: PSNR, SSIM

Screenshot 2025-01-20 at 6 49 40 AM