Until now GPUs were primarily created for rasterization, all other uses being secondary. But other fields of application and new game engine algorithms gradually appear as GPU features become richer, now including support for computing APIs like CUDA, DirectCompute and OpenCL.
The GF100 architecture is designed to efficiently execute various algorithms and solve numerous general-purpose computing tasks that can be parallelized. For example, shared memory is useless in ray tracing, physical computing and AI algorithms, but that's where cache comes to rescue. 48KB of L1 cache per Streaming Multiprocessor and the global L2 cache can boost performance of many algorithms.
An improved Scheduler is another important change introduced into GF100. G80 and GT200 execute large programs, spending a relatively long time to switch context between tasks. This is fine for purely computing tasks involving large volumes of data. But games run several tasks at the same time: cloth imitation, physics of liquids, postprocessing, etc. So, GF100 can perform these tasks in parallel -- with the maximum efficiency.
In games that use computing shaders, context is switched every frame, so high switching speed is the key for keeping frame rates high. GF100 has considerably reduced context switching time of 20µs, enabling fast, multiple switches between streams within a frame.
Computing algorithms can be used in solving lots of tasks in games. These include new hybrid rendering algorithms in which ray tracing is used to render correct reflections and refractions, as well as voxel rendering for realistic imitation of volumetric data. These also include complex image postprocessing: advanced HDR rendering, complex filters for antialiasing and optical effects imitation (depth of field and bokeh). Besides, games already feature physical effects that can be taken further by adding liquid dynamics, turbulence for particle effects (smoke, liquids) and such.
There's a number of actual examples as well. Just Cause 2 uses CUDA and NVIDIA GPUs to create realistic water surfaces. Not to mention DirectCompute used in Aliens vs Predator, Metro 2033 and DiRT 2 for postprocessing purposes.
To uncover the entire potential of the new products, NVIDIA released CUDA Toolkit 3.0, naturally with support for GF100-based cards and support for C++, ECC, BLAS and LAPACK libraries, CUDA-GDB debugger and Visual Profiler.
NVIDIA also releases a convenient suite for 3D developers -- Parallel Nsight, also known as Nexus. This suite helps develop GPU-using applications in Visual Studio 2008. It includes debuggers, profilers, analyzers of GPU code and its performance. All of the above is conveniently integrated into Visual Studio. Supported are CUDA C, OpenCL, DirectCompute, Direct3D, OpenGL. We are sure that developers will appreciate this suite.
Ray tracing is often used in 3D graphics, but it's too resource-consuming to use it in real time. Future applications may be capable of using tracing simultaneously with rasterization. The former isn't easy to perform efficiently by means of GPU, because calculated rays are directed unpredictably. Calculating them requires access to random memory addresses, while GPUs usually read data from memory in linear blocks.
The GF100 architecture differs from the rest, because it has ray tracing algorithms requirements considered. GF100 is the first GPU to support hardware recursion which makes it possible to perform such tasks. Besides, the dual-level caching architecture considerably improves ray tracing efficiency, speeding up memory access. The L1 cache improves memory "locality" for adjacent rays, while the L2 cache increases VRAM access bandwidth.
GF100 can efficiently execute advanced global lighting algorithms as well, including path tracing. This technique is similar to ray tracing and uses a large number of rays to collect data on indirect scene lighting. The performance of GF100 is 3.5-4 times higher in this algorithm, compared with GF200.
But these methods are still to complex to be used widely in games. What developers can do is use rasterization and ray tracing simultaneously. This is called hybrid rendering. For example, rasterization can be used in the first rendering pass, while ray tracing can be used to calculate reflections for a number of pixels during the next pass. Such hybrid schemes are a great way of providing high performance at very high quality.
To demonstrate what company's products can do, NVIDIA has created a special Design Garage demo in which global lighting is calculated by means of the NVIDIA OptiX technology while car models are rendered in an adjustable scene. This application is available for all NVIDIA cards, but it will run quite slowly with every graphics card rolled out before GTX 470 and GTX 480.
We'd very much like this feature to be a part of some racing game. It could provide very high quality images in the photo or gallery modes, nothing like these modes can produce today.
NVIDIA 3D Vision Surround
With the rollout of the GTX 400 series NVIDIA introduced a technology capable of outputting stereo images to three monitors simultaneously. Obviously, this innovation has been forced by the competitor's Eyefinity.
The technology utilizes wireless active-shutter glasses and NVIDIA 3D Vision drivers. With two GTX 400 graphics cards working in a SLI configuration, 3D Vision Surround can produce high-quality stereo images on three monitors simultaneously.
Output to three monitors is supported in the stereo mode at 1920x1080, or in regular 2D at 2560x1600. 3D Vision Surround can also compensate for offscreen images. When this feature is enabled, parts of the picture obstructed by monitor frames will not be shown. This improves image integrity which is especially important for the stereo mode, when even a slight discrepancy of image on different monitors can ruin the 3D effect.
Note that 3D Vision Surround is a purely software solution. And it only works with two or more GPUs working in a SLI configuration. It won't work with a single card, because it won't have enough active video outputs. However, this feature should also work with older GTX 200 graphics cards. The support for 3D Vision Surround should be added to drivers in April.
Conclusions on the architecture
NVIDIA GF100 clearly has a completely new architecture. The new GPU has much better graphical and general-purpose computing capabilities. It has become more universal and can now compete with CPUs in the field of high-performance computing.
GF100 introduces important changes to the graphics pipeline. The new GPU features 16 Tessellators and 4 Raster Engines that are quite useful for DirectX 11 graphics. Tessellation and displacement mapping are the key innovations offered by this API. These two considerably improve image quality. Besides, both GTX 470 and GTX 480 should offer high geometry processing performance.
However, the new architecture boasts of more than changes to the graphics pipeline. The new products have advanced general-purpose computing capabilities. These are the first graphics solutions to support C++, recursion, and read/write caching. These innovations provide developers with tools to accomplish a lot, including ray tracing, global lighting, complex physical effects, AI and such.
The new architecture also fixes certain drawbacks of the previous GPUs. For example, ROPs were significantly improved, and fullscreen AA got better in terms of both quality and performance.
Compared with the previous GPU, GF100 has twice as many stream processors, a somewhat higher memory bandwidth due to GDDR5 support. The memory bus itself is reduced from 512-bit to 384-bit, though. One thing that may be a weak spot is the number of TMU. GF100 has fewer Texture Mapping Units than GT200.
Also note that in some cases GTX 480 may perform on a par with or even slower than previous-generation solutions from both NVIDIA and AMD. Especially, in older applications not using Gather4 and SSAO. We can see two possible explanations to that. Either developers may have deliberately improved computing capabilities at the expense of graphical, assuming that games will move towards computing, or the actual TMU clock rates have turned out to be lower than initially planned.
As we know, different parts of NVIDIA's GPU can work at different clock rates. So, perhaps, TMUs should've worked faster, which explains the fewer amount of them compared to GT200. Perhaps, NVIDIA failed to raise TMU clock rates as planned due to TSMC's process technology issues, and this negatively affected peak texturing performance.
If the first variant is correct and NVIDIA has deliberately shifted focus to math capabilities at the expense of texturing, we wonder if this is justified, considering that most recent games have been aimed at multiple platforms, and only a few have been resource-consuming. Though, perhaps, this drawback will be negated in real applications thanks to the improved caching architecture.
Another disadvantage is that all of the first GF100-based products have some stream processors disabled due to the issues with the 40nm process technology. As we have already mentioned, GTX 480 uses only 480 and GTX 470 uses only 448 stream processors of the available 512. As a result, the architecture that looks very good on paper may fail to demonstrate the same nice results in certain real applications.
But that's what we're going to find out in the next sections of the review. We'll see how the new cards perform in synthetic tests on the background of NVIDIA's previous-generation solutions, as well as competing products from AMD.
Write a comment below. No registration needed!