NERF

By Javier Cantón Ferrero, Chief Technology Innovation Officer at Plain Concepts

 

What is NeRF technology about?

NeRF or Neural Radiance Field technology was introduced in 2020 and represents an innovative approach in the field of computer vision and 3D graphics that is used to create 3D representations of environments based on a limited set of photographs.

This technology is a great advance over previous techniques such as photogrammetry, which had been used to generate 3D models of an element or environment using only 2D images.

The core idea behind NeRF is, given a set of 2D images of an object or environment, captured from various points of view, being able to infer any new, uncaptured point of view for that or environment.

Until now, to generate 3D environments, polygons have always been used to represent elements and textures to have colour applied to them. This means that, as you approach an element of a scene and need a photorealistic result, an immense amount of polygons and textures with a very high resolution is required, which implies dealing with huge files.

NeRF does not use such a strategy, but instead draws rays for each pixel from the input images (similar to how a magician’s swords do when traversing a box where their assistant is). With that information, mathematical functions are defined to code for each point in space the color that said point has from all angles of view. This means that a point in space can return a different colour depending on where you look from, which is useful for representing certain physical effects found in nature, such as reflections on metals, changes in lighting, or refraction through transparent objects.

All that colour-related information is not stored, but encoded in a mathematical function, so for different values in the relevant function, which represent different viewing angles, a different colour is returned, thus achieving incredible information compression as compared to 3D generation based on textures and polygons.

All these mathematical functions are encoded within an artificial intelligence (AI) model based on Neural Networks. Therefore, once the model has been trained based on the set of photos from the original environment, we can move a 3D camera to points in said scene from which a photo was never taken and ask the AI model to generate a new image from that new point of view with a photorealistic result, which also simulates all the physical lighting and reflection effects mentioned above.

It seems like magic, but it works, and companies like NVidia were among the first to launch a library called Instant-NGP that could be integrated into graphics engines so as to be able to use this type of technology in real time.

At Plain Concepts we became interested in this technology at a very early stage and taking advantage of the fact that we have been working for 10 years on our own graphics engine called Evergine -which we use to create products for companies-, we decided to integrate NVidia’s technology, achieving amazing results in real time, thus helping companies to accelerate the creation of industrial Digital Twins.

 

What uses can it have in audiovisual media?

To help imagine possible uses of this technology in audiovisual media, let’s bear in mind its key advantages: ability to capture an environment with photorealistic quality through a set of images, incredible compression of data and generation of images from points of view that were not in the original set. The latter is achieved thanks to the fact that the colour information is stored in mathematical functions that are defined based on the colour of each pixel in the input images and the interpolation between all those pixels to obtain continuous functions. That is, you can ask for any new point of view, and these will always have a color returned for each pixel on the screen.

This means that, for example, we can generate videos in which locations are recreated based on a set of photos or a video from which we can extract the frames to generate that particular set. Sometimes, a news reporter covers a piece of news and takes some photos or videos with their mobile from a location, but that content may not be suitable for display, either because it is too short, because it was recorded on portrait view, because the framerate is not adequate or because the video fragment is not stabilized. Using this content as input, we can create a new video of that location with the resolution and aspect ratio we need, regardless of the original. This can last as long as needed, featuring camera movements with perfect stabilization. In addition, we can generate as many videos as we want of this space by making different camera movements until we find the one that most suits our needs. All this is possible thanks to the fact that once the space is captured, the NeRF model can generate frames from any point of view.

One of the most interesting free projects that allow checking the results that this technology can offer in the generation of videos is called NerfStudio, which is very easy to configure and install. In addition, this project is being used by the scientific community as a test bed for each new advance in NeRF technology.

 

 

The use of NeRF in Virtual Productions

The popularity of virtual sets increased a lot during the pandemic, as they became the solution that allowed to keep things running for many major production companies, such as Netflix or HBO.

These sets, instead of using the green or blue backgrounds better known as chroma, where the sequences would be recorded and then, in postproduction, the environments would be added, use backgrounds based on gigantic led screens that project an environment, which is dynamic and changes and adapts to the camera’s perspective, creating a sense of real three-dimensionality.

This technology is behind large productions such as “The Mandalorian” from Disney+, which made the shooting much easier, since the main character wears a metallic suit that reflects the entire environment. If it had been shot with a chroma as background, not only would the environment have to be replaced in each sequence, but also all the green or blue reflections of the character’s costume would have to be removed. However, when using led screens that project a virtual environment, the suit will reflect the light of the environment, thus eliminating many hours of post-production. Therefore, as these screens cast light on the characters, we can quickly simulate different lighting or weather conditions, without having to wait for the post-production phase to add all this.

It is important to note that, therefore, the pre-production and post-production phases are mixed up, and in these virtual sets you must have the environments prepared before the takes are shot. For the rendering of these backgrounds, the latest advances in video game engines are currently being used. Therefore, 3D environments based on polygons and textures must be created, being it necessary to devote long development times to achieve photorealistic results that can fit well into the scenes.

This is where NeRF technology can help immensely. If there is the need to recreate familiar places in a city or in unusual locations like the Amazon rainforest or the Sahara desert, you wouldn’t need to send someone to take photos and videos, pass this information to a graphic production team, and spend weeks modeling and texturing a 3D environment to get a photorealistic look. It is enough to just send a person to the place and take the necessary photographs to build the NeRF model, which can be generated in a matter of hours. Then, already in one of these virtual sets, we could move the camera through said space to track scenes, and all this with the best photo realism, thus greatly reducing the time required to build the necessary backgrounds for each scene. I am convinced that large production companies will begin to create their own catalogue of environments captured with this technology for reuse in different productions, thus greatly reducing the cost of this technology. In addition, NeRF visualizations can be mixed as if they were layers by means of real-time rendering based on polygons, so it is not about this technology replacing others, but the key will lie instead in getting the best out of each of them to improve production times.

 

Challenges facing this technology

NeRF is a recent technology, but since its emergence in 2020 it has attracted the attention of many different sectors due to the great potential offered. It is based on neural networks that require high computing power for real-time inference. The original paper embraces the CUDA technology, which only works on NVIDIA cards, and to achieve the best possible results, very well stabilized and high-resolution input images are required. Finally, a significant amount of VRAM video memory is required for training, which is higher in the latest generation graphics cards.

All this is essential if we want to make the most of this technology in real time, which is key in some sectors such as Virtual Productions in the audiovisual industry or Interactive Digital Twins in the construction and engineering industry.

Although important advances are taking place around this technology, such as the paper presented in 2023 called Zip-NeRF, which allows us to capture larger and larger spaces with increased sharpness, achieving these results in real time remains a major computational challenge with the hardware currently available.

 

3D Gaussian Splattings

We are living in a fast-paced world where we all want breakthroughs and want them immediately, without being willing to wait for real-time NeRF technology to become available with today’s hardware. This reminds me of Ian Malcolm’s line in “Jurassic Park”: “Life finds a way.” In May 2023, a new paper called “3D Gaussian Splatting for Real-time Radiance Field Rendering” appeared. It does not introduce a new NeRF algorithm based on artificial intelligence, but uses the advances seen in the field of NeRF to build photorealistic scenes based on Gaussian Splats or particles used as strokes.

This introduces a new scene rasterization system. In 3D, the traditional basic unit used to represent objects is the triangle, and each 3D element is made up of small triangles to which pieces of textures are applied in order to represent the colours of an element. On the other hand, with the 3D Gaussian Splatting technique, scenes are created from small particles as strokes in a painting, which have different sizes, positions and transparencies. The colour information of each particle is also stored in mathematical functions called Spherical Harmonics, which, as in NeRF, can return a different colour for the particle depending on the angle from which it is viewed. Remember that this is important to be able to simulate physical properties of matter, such as the reflection of metals or the refraction of glass.

The great advantage of this technique is that, although it achieves similar visual results, the associated computational needs are much lower. It is as if we had experienced a 4-year leap in the ability to visualize NeRF in real time, thus bringing that future to our present.

 

Conclusions

NeRF technology has caused a major breakthrough in computer-aided representation of scenes by achieving amazing automatic results that far exceed those seen in previous techniques, such as photogrammetry, which entalied significant difficulties in the representation of materials such as metals or glass. At the same time, productivity has increased in manual generation of real environments, and the latest advances in a rasterization technique called 3D Gaussian Splatting allow us to take a leap of several years in the possibilities of representing photorealistic scenes and creating videos.

The real challenge, as usual, lies in our ability to combine this technology with existing techniques and make the most of its possibilities. There is an important technological leap ahead in the digitization of elements and spaces thanks to these technologies. 

Nevion expands Virtu
PlayBox Neo elevates