Characters:
- Main hero - NV30 starring as a GPU of the new generation
GeForce FX.
- Secondary main hero - R300 as RADEON 9700 PRO.
- 128 bit DDR II memory clocked at 1 GHz as a main
villain.
- DirectX 9.0 shaders and cinematographic effects as arguments.
- Progressive technologies of memory optimization and full-screen
anti-aliasing in the science-fiction role of defenders.
of the main hero.
Popular synthetic actors of finished and scheduled full-length
3D movies are also taking part in this movie. We will
return to the scenario a bit later, and now some general
information on what was happening at the presentation
as an overture to the today's review...
Let's sum up what we have added to the original article:
- The highest texture sampling and filtering rate
is up to 8 per clock.
- Number of pixel shader instructions executed per
clock cycle: 2 integer and 1 floating-point or 2 (!)
texture access instructions. The latter option is
possible as during preceding shader's computational
operations the texture units could sample texture
values with known coordinates beforehand and save
them in special temporary registers, which are 16
in all. I.e. the texture units can single out not
more than 8 textures per clock but the pixel shader
can get up to 16 results per clock.
- Like the previous generation of the chips, the
GeForce FX works with two types of MSAA blocks - 2x
diagonal and grid 2x2. The 6xS and 8x AA modes are
hybrid modes based on averaging of several base blocks
one or another way (pattern).
- The frame buffer compression works only in the
FSAA modes of MSAA, and only on the MSAA blocks level.
Hence lossless compression, about 4:1 in the modes
with the 4x base MSAA block, and 2:1 for 2x blocks.
- The chip supports the activity control scheme
which controls intensity of operation of a cooling
system depending on the load and heating of the chip.
- The chip doesn't incorporate DVI or TV-Out controllers,
like all earlier top NVIDIA's solutions. Integrated
controllers are used mainly in mass products.
- The mass production of the second revision of
the chip, which will be used for production cards,
is already about to start.
Around the scenario: "Run, Industry, run"!
The NVIDIA's GPU codenamed NV30, so long-awaited and one of
the most arguable GPUs, was announced on November 18, 2002.
It was coupled with the announcement of a new marketing name
of GeForce FX. First NVIDIA was going to give a new name to
a new trademark to emphasize its importance but then it turned
out that it was impossible to refuse its mark. According to
the public-opinion polls, the GeForce brand is known to much
more people than the company's name NVIDIA is. It's like with
Pentium. The parallel with Intel, which is a recognized locomotive
of the IT industry is well suitable here as at the moment
NVIDIA is a flagship on the 3D graphics ocean, like 3dfx was
in its time, a creator of the first really successful hardware
3D solution. Symbolically, while working on the NV30 the developers
used a lot of ideas of the 3dfx's project codenamed Mojo which
failed to be completed.
No secret that the GeForce FX is an incarnation of a so called
flexibly programmable graphics architecture, i.e. a graphics
processor. Therefore, this chip should be called GPU, but
on the other hand, this term was earlier used for less flexible
solutions of the previous generation of accelerators (let's
call it the DX8 generation: NV2x, R200 etc.). Let's glance
at the transient process from a fixed architecture to flexibly
programmable ones:
- DX7/NV1x/R100 brought configurable pixel stages.
- DX8/NV2x/R200 brought vertex shaders and limited pixel
shaders based on the same shaders.
- DX9/NV3x/R300 added normal vertex shaders with controllable
command streams, software-controlled pixel shaders with
arbitrarily positioned commands.
This is an "evolutionary revolution" whose current
stage is not completed yet. There is one or maybe several
stages coming in the near future:
- DX9.1/NV4x/R400 - vertex and pixel shaders with totally controlled command
streams without any limitations in size.
- DX10 - possible generation of new vertices in a shader and expended possibilities
of the system of shader instructions.
- A fully symmetric scheme of a programmable GPU as an array of full-featured
shader processors without any predetermined architectural division into pixel
and vertex shaders tessellating or selecting texture units.
Well, we'll see what we will see. But two latter steps can
bring in here much more than two DX versions both because
of Microsoft and a guileful intent of graphics chip makers.
On the other hand, while in terms of such evolutionary layout the current stage
looks expected, users and programmers can take it as a revolutionary period as
it provokes a switchover to utilization of capabilities of flexible programming
of accelerators. Big flexible pixel shaders, even without the command stream control,
or with simplified control on the predicate level, are able to bring to PC an
earlier unreachable visual level making a much greater jump compared to the first
attempts of the previous generation clamped by the awkward assembler code of shaders
and a limited number of pixel instructions. Quality, rather than quantity, can
win this time, and the epoch of DX9 accelerators can become as significant as
the arrival of the 3dfx Voodoo. If you remember, the Voodoo wasn't conceptually
the first. But it did provoke a quantitative jump of accelerators, which then
turned into a qualitative jump of development of games for them. I hope this time
the industry will be given a powerful spur caused by the possibility to write
complex vertex and pixel shaders on higher-level languages.
Let's leave aside the issue concerning a revolutionary nature
of DX9 solutions which are so much spoken about among sales
managers. Just must say that while the revolutionary character
of the solution as a whole is yet to be proved, the revolutionary
character of the separate technologies of the GeForce FX is
undoubted.
The accelerators are gradually approaching common general-purpose
processors in several aspects:
- Considerable increase of clock speeds
- The rough force is now being replaced by fine optimization
algorithms and approaches
- Computational aspect is in the forefront
- Developed system of the general-graphic-purpose commands.
- Support of several universal formats (types) of data.
- Possible superscalar and speculative execution.
- Complexity and flexibility of programs are getting less
limited.
The accelerators are striding toward CPUs and they have already
outpaced average general-purpose processors in the number
of transistors or peak computational power. The issue of convergence
depends only on time and flexibility. CPUs are also nearing
them by increasing their performance, especially in vector
operations, and soon they will be able to fulfill yesterday's
tasks of graphics acceleration. Moreover, the degree of brute
force parallelism of CPU is growing up as well - just remember
the HT or multicore CPUs. The direct confrontation is not
close, but it will definitely take place, and primarily between
trendsetters in one or another sphere rather than between
the classes of devices (the outcome will be called CPU (or
CGPU :). Wallets of users are being fought for now, and graphics
in this sphere doesn't lose to CPUs.
Before going further I recommend that you read (if you haven't
yet) the following key theoretical materials:
Now we are finishing the digression and turning to our main
hero - GeForce FX.
GeForce FX: leading role in focus
Straight away: the key specifications of the new GPU:
- 0.13 micron fabrication process, copper connections.
- 125M transistors
- 3 geometrical processors (each exceed the specs of the DX9 VS 2.0)
- 8 pixel processors (exceed the specs of the DX9 PS 2.0 markedly)
- Flexibly configurable array of 8 pipelined texture filtering units calculates
up to 8 sampled and filtered results at a clock.
- AGP 3.0 (8x) system interface
- 128 bit (!) DDR II interface of the local memory
- Effective 4-channel memory controller with a crossbar
- Developed optimization techniques for the local memory: full frame buffer
compression including color data (for the first time the compression ratio is
set to 4:1, only in the MSAA modes), and depth (Z buffer compression)
- Tile optimization: caching, compression and Early Cull HSR
- Support of precise integer-valued formats (10/16 bit per component) and precise
floating-point formats (16 and 32 bits per component - also known as 64 and 128
bit color) for the textures and frame buffer
- Through accuracy of all operations - 32 bit floating-point arithmetic
- Being activated, the new algorithm of optimized anisotropic filtering reduces
the performance drop (fps) without bad quality degradation
- Anisotropic quality up to 8õ of the usual bilinear texture, i.e. up
to 128 discrete samples per one texture fetch
- New hybrid AA modes 8õ (DirectX and OpenGL) and 6xS (only DirectX)
- Frame buffer compression makes possible to reduce a performance drop markedly
with FSAA enabled
- Two integrated 10 bit RAMDACs 400 MHz
- Integrated interface for external TV-Out chip
- Integrated into GPU TV-Out
- Three TDMS interfaces for external DVI interface chips
- Current consumed by the GeForce FX chip based on the .13 technology is comparable
to the requirements of the AGP 3.0 specification. Therefore, it's possible to
make cards without external power.
And now look at the block diagram of the GeForce FX:
Functions of the blocks:
- Cache controller, Memory controller, Crossbar - the block
controls exchange and caching of data coming from the
local memory of GPU and AGP system bus.
- Vertex Processors - geometrical processors which execute
vertex shaders and emulate a fixed T&L. They fulfill geometric
transformations and prepare parameters for rendering and
for pixel processors.
- Pixel Processors - execute pixel shaders and emulate
pixel stages. They shade pixels and make requests for
texture fetch units.
- Texture Fetch & Filtering & Decompression Units. They
implement fetching of certain values of certain textures
required by pixel processors.
- Texture & Color Interpolators - they are interpolators
of texture coordinates and color values calculated as
output parameters in the vertex processor. These units
calculate for each pixel processor its unique input parameters
according to the position of a pixel it is shading.
- Frame Buffer Logic - the unit controls operation with
the frame buffer including Frame Buffer Compression &
Decompression, caching, Tile HSR Logic - so called Early
Cull HSR, and MSAA Allocation and post processing of samples
- the final filtering in FSAA modes (FSAA post-processor)
- 2D Core
- Two display controllers, two RAMDACs and interfaces
Let's go through the main units of the new chip approximately
in the same order as a usual data flow goes. Along the way
we will comment on their functions, architecture (the basic
one regarding the DirectX 9.0 and additional capabilities
of the GeForce FX) and make small lyrical digressions.
Write a comment below. No registration needed!