NVIDIA GeForce FX, or "Cinema show started".

Characters:

Main hero - NV30 starring as a GPU of the new generation GeForce FX.
Secondary main hero - R300 as RADEON 9700 PRO.
128 bit DDR II memory clocked at 1 GHz as a main villain.
DirectX 9.0 shaders and cinematographic effects as arguments.
Progressive technologies of memory optimization and full-screen anti-aliasing in the science-fiction role of defenders. of the main hero.

Popular synthetic actors of finished and scheduled full-length 3D movies are also taking part in this movie. We will return to the scenario a bit later, and now some general information on what was happening at the presentation as an overture to the today's review...

Let's sum up what we have added to the original article:

The highest texture sampling and filtering rate is up to 8 per clock.
Number of pixel shader instructions executed per clock cycle: 2 integer and 1 floating-point or 2 (!) texture access instructions. The latter option is possible as during preceding shader's computational operations the texture units could sample texture values with known coordinates beforehand and save them in special temporary registers, which are 16 in all. I.e. the texture units can single out not more than 8 textures per clock but the pixel shader can get up to 16 results per clock.
Like the previous generation of the chips, the GeForce FX works with two types of MSAA blocks - 2x diagonal and grid 2x2. The 6xS and 8x AA modes are hybrid modes based on averaging of several base blocks one or another way (pattern).
The frame buffer compression works only in the FSAA modes of MSAA, and only on the MSAA blocks level. Hence lossless compression, about 4:1 in the modes with the 4x base MSAA block, and 2:1 for 2x blocks.
The chip supports the activity control scheme which controls intensity of operation of a cooling system depending on the load and heating of the chip.
The chip doesn't incorporate DVI or TV-Out controllers, like all earlier top NVIDIA's solutions. Integrated controllers are used mainly in mass products.
The mass production of the second revision of the chip, which will be used for production cards, is already about to start.

Around the scenario: "Run, Industry, run"!

The NVIDIA's GPU codenamed NV30, so long-awaited and one of the most arguable GPUs, was announced on November 18, 2002. It was coupled with the announcement of a new marketing name of GeForce FX. First NVIDIA was going to give a new name to a new trademark to emphasize its importance but then it turned out that it was impossible to refuse its mark. According to the public-opinion polls, the GeForce brand is known to much more people than the company's name NVIDIA is. It's like with Pentium. The parallel with Intel, which is a recognized locomotive of the IT industry is well suitable here as at the moment NVIDIA is a flagship on the 3D graphics ocean, like 3dfx was in its time, a creator of the first really successful hardware 3D solution. Symbolically, while working on the NV30 the developers used a lot of ideas of the 3dfx's project codenamed Mojo which failed to be completed.

No secret that the GeForce FX is an incarnation of a so called flexibly programmable graphics architecture, i.e. a graphics processor. Therefore, this chip should be called GPU, but on the other hand, this term was earlier used for less flexible solutions of the previous generation of accelerators (let's call it the DX8 generation: NV2x, R200 etc.). Let's glance at the transient process from a fixed architecture to flexibly programmable ones:

DX7/NV1x/R100 brought configurable pixel stages.
DX8/NV2x/R200 brought vertex shaders and limited pixel shaders based on the same shaders.
DX9/NV3x/R300 added normal vertex shaders with controllable command streams, software-controlled pixel shaders with arbitrarily positioned commands.

This is an "evolutionary revolution" whose current stage is not completed yet. There is one or maybe several stages coming in the near future:

DX9.1/NV4x/R400 - vertex and pixel shaders with totally controlled command streams without any limitations in size.
DX10 - possible generation of new vertices in a shader and expended possibilities of the system of shader instructions.
A fully symmetric scheme of a programmable GPU as an array of full-featured shader processors without any predetermined architectural division into pixel and vertex shaders tessellating or selecting texture units.

Well, we'll see what we will see. But two latter steps can bring in here much more than two DX versions both because of Microsoft and a guileful intent of graphics chip makers.

On the other hand, while in terms of such evolutionary layout the current stage looks expected, users and programmers can take it as a revolutionary period as it provokes a switchover to utilization of capabilities of flexible programming of accelerators. Big flexible pixel shaders, even without the command stream control, or with simplified control on the predicate level, are able to bring to PC an earlier unreachable visual level making a much greater jump compared to the first attempts of the previous generation clamped by the awkward assembler code of shaders and a limited number of pixel instructions. Quality, rather than quantity, can win this time, and the epoch of DX9 accelerators can become as significant as the arrival of the 3dfx Voodoo. If you remember, the Voodoo wasn't conceptually the first. But it did provoke a quantitative jump of accelerators, which then turned into a qualitative jump of development of games for them. I hope this time the industry will be given a powerful spur caused by the possibility to write complex vertex and pixel shaders on higher-level languages.

Let's leave aside the issue concerning a revolutionary nature of DX9 solutions which are so much spoken about among sales managers. Just must say that while the revolutionary character of the solution as a whole is yet to be proved, the revolutionary character of the separate technologies of the GeForce FX is undoubted.

The accelerators are gradually approaching common general-purpose processors in several aspects:

Considerable increase of clock speeds
The rough force is now being replaced by fine optimization algorithms and approaches
Computational aspect is in the forefront
Developed system of the general-graphic-purpose commands.
Support of several universal formats (types) of data.
Possible superscalar and speculative execution.
Complexity and flexibility of programs are getting less limited.

The accelerators are striding toward CPUs and they have already outpaced average general-purpose processors in the number of transistors or peak computational power. The issue of convergence depends only on time and flexibility. CPUs are also nearing them by increasing their performance, especially in vector operations, and soon they will be able to fulfill yesterday's tasks of graphics acceleration. Moreover, the degree of brute force parallelism of CPU is growing up as well - just remember the HT or multicore CPUs. The direct confrontation is not close, but it will definitely take place, and primarily between trendsetters in one or another sphere rather than between the classes of devices (the outcome will be called CPU (or CGPU :). Wallets of users are being fought for now, and graphics in this sphere doesn't lose to CPUs.

Before going further I recommend that you read (if you haven't yet) the following key theoretical materials:

Now we are finishing the digression and turning to our main hero - GeForce FX.

GeForce FX: leading role in focus

Straight away: the key specifications of the new GPU:

0.13 micron fabrication process, copper connections.
125M transistors
3 geometrical processors (each exceed the specs of the DX9 VS 2.0)
8 pixel processors (exceed the specs of the DX9 PS 2.0 markedly)
Flexibly configurable array of 8 pipelined texture filtering units calculates up to 8 sampled and filtered results at a clock.
AGP 3.0 (8x) system interface
128 bit (!) DDR II interface of the local memory
Effective 4-channel memory controller with a crossbar
Developed optimization techniques for the local memory: full frame buffer compression including color data (for the first time the compression ratio is set to 4:1, only in the MSAA modes), and depth (Z buffer compression)
Tile optimization: caching, compression and Early Cull HSR
Support of precise integer-valued formats (10/16 bit per component) and precise floating-point formats (16 and 32 bits per component - also known as 64 and 128 bit color) for the textures and frame buffer
Through accuracy of all operations - 32 bit floating-point arithmetic
Being activated, the new algorithm of optimized anisotropic filtering reduces the performance drop (fps) without bad quality degradation
Anisotropic quality up to 8õ of the usual bilinear texture, i.e. up to 128 discrete samples per one texture fetch
New hybrid AA modes — 8õ (DirectX and OpenGL) and 6xS (only DirectX)
Frame buffer compression makes possible to reduce a performance drop markedly with FSAA enabled
Two integrated 10 bit RAMDACs 400 MHz
Integrated interface for external TV-Out chip
Integrated into GPU TV-Out
Three TDMS interfaces for external DVI interface chips
Current consumed by the GeForce FX chip based on the .13 technology is comparable to the requirements of the AGP 3.0 specification. Therefore, it's possible to make cards without external power.

And now look at the block diagram of the GeForce FX:

Functions of the blocks:

Cache controller, Memory controller, Crossbar - the block controls exchange and caching of data coming from the local memory of GPU and AGP system bus.
Vertex Processors - geometrical processors which execute vertex shaders and emulate a fixed T&L. They fulfill geometric transformations and prepare parameters for rendering and for pixel processors.
Pixel Processors - execute pixel shaders and emulate pixel stages. They shade pixels and make requests for texture fetch units.
Texture Fetch & Filtering & Decompression Units. They implement fetching of certain values of certain textures required by pixel processors.
Texture & Color Interpolators - they are interpolators of texture coordinates and color values calculated as output parameters in the vertex processor. These units calculate for each pixel processor its unique input parameters according to the position of a pixel it is shading.
Frame Buffer Logic - the unit controls operation with the frame buffer including Frame Buffer Compression & Decompression, caching, Tile HSR Logic - so called Early Cull HSR, and MSAA Allocation and post processing of samples - the final filtering in FSAA modes (FSAA post-processor)
2D Core
Two display controllers, two RAMDACs and interfaces

Let's go through the main units of the new chip approximately in the same order as a usual data flow goes. Along the way we will comment on their functions, architecture (the basic one regarding the DirectX 9.0 and additional capabilities of the GeForce FX) and make small lyrical digressions.

[ next part ]

Aleksander Medvedev (unclesam@ixbt.com)

Write a comment below. No registration needed!