@inproceedings{mildenhall2020nerf,
title ={NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis},
author ={Ben Mildenhall and Pratul P. Srinivasan and Matthew Tancik and Jonathan T. Barron and Ravi Ramamoorthi and Ren Ng},
year ={2020},
booktitle ={ECCV},
}
@software{jaxnerf2020github,
author = {Boyang Deng and Jonathan T. Barron and Pratul P. Srinivasan},
title = {JaxNeRF: an efficient JAX implementation of NeRF},
url = {https://github.com/google-research/google-research/tree/master/jaxnerf},
version = {0.0},
year = {2020},
}
Ray casting is the process in a ray tracing algorithm that shoots one or more rays from the camera (eye position) through each pixel in an image plane.
image source: https://developer.nvidia.com/discover/ray-tracing
NeRF Input is a set of 2D images along with their corresponding camera poses. NeRF can also use the sparse points produced by Structure-from-Motion.
Concept | Description |
---|---|
Ground Truth | Pixels in training images. |
Training Samples | 3D points along the ray. |
Loss Function | How close are we to reconstruct the color from the ground truth pixel? All 3D points along the ray contribute to the final color of the pixel. |
Objective | Overfit the network as much as we can to match the colors in training images. Test images can be used to test how our network performs on angles that are not present in the training set. |
image source: https://graphics.stanford.edu/courses/cs348n-22-winter/LectureSlides/FinalSlides/leo_class_nerf_2022.pdf
image source: https://jaminfong.cn/neusample/
Fourier features let networks learn high frequency functions in low dimensional domains
image source: https://bmild.github.io/fourfeat/
PSNR is commonly used to quantify reconstruction quality for images and video subject to lossy compression. But in NeRF, PSNR is used to compare a training image with a rendered image of the radiance field. The rendered image is taken from the same angle as the training image.
where:
$I(i,j)$ is the rendered image.
$K(i,j)$ is the training image.
${MAX}_I$ is the maximum possible pixel value of the image.
Given two windows $x$ and $y$ of size $N \times N$, SSIM is calculated as:
$SSIM = \frac{(2 \mu_x \mu_y + c_1)(2 \sigma_{xy} + c_2)}{(\mu_x^2+\mu_y^2+c_1)(\sigma_x^2+\sigma_y^2+c_2)}$
where: