Mashaan blog

NeRF: Neural Radiance Fields

Code releases

References

@inproceedings{mildenhall2020nerf,
title     ={NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis},
author    ={Ben Mildenhall and Pratul P. Srinivasan and Matthew Tancik and Jonathan T. Barron and Ravi Ramamoorthi and Ren Ng},
year      ={2020},
booktitle ={ECCV},
}
@software{jaxnerf2020github,
author  = {Boyang Deng and Jonathan T. Barron and Pratul P. Srinivasan},
title   = {JaxNeRF: an efficient JAX implementation of NeRF},
url     = {https://github.com/google-research/google-research/tree/master/jaxnerf},
version = {0.0},
year    = {2020},
}

Ray casting

Ray casting is the process in a ray tracing algorithm that shoots one or more rays from the camera (eye position) through each pixel in an image plane.

image

image source: https://developer.nvidia.com/discover/ray-tracing

NeRF input

NeRF Input is a set of 2D images along with their corresponding camera poses. NeRF can also use the sparse points produced by Structure-from-Motion.

drawings-01 002

Concept Description
Ground Truth Pixels in training images.
Training Samples 3D points along the ray.
Loss Function How close are we to reconstruct the color from the ground truth pixel? All 3D points along the ray contribute to the final color of the pixel.
Objective Overfit the network as much as we can to match the colors in training images. Test images can be used to test how our network performs on angles that are not present in the training set.

Volumetric formulation for NeRF

image

image source: https://graphics.stanford.edu/courses/cs348n-22-winter/LectureSlides/FinalSlides/leo_class_nerf_2022.pdf

Hierarchical volume sampling

image

image source: https://jaminfong.cn/neusample/

Positional encoding

Fourier features let networks learn high frequency functions in low dimensional domains

image

image source: https://bmild.github.io/fourfeat/

Evaluation Metrics

Peak signal-to-noise ratio (PSNR)

PSNR is commonly used to quantify reconstruction quality for images and video subject to lossy compression. But in NeRF, PSNR is used to compare a training image with a rendered image of the radiance field. The rendered image is taken from the same angle as the training image.

Screenshot 2025-02-06 at 9 29 49 PM

where:

Structural similarity index measure (SSIM)

Given two windows $x$ and $y$ of size $N \times N$, SSIM is calculated as:

$SSIM = \frac{(2 \mu_x \mu_y + c_1)(2 \sigma_{xy} + c_2)}{(\mu_x^2+\mu_y^2+c_1)(\sigma_x^2+\sigma_y^2+c_2)}$

where:

source: PSNR, SSIM

image

Screenshot 2025-01-20 at 6 49 40 AM