iXBT Labs - Computer Hardware in Detail

Platform

Video

Multimedia

Mobile

Other

NVIDIA GeForce FX, or "Cinema show started".






Characters:

  • Main hero - NV30 starring as a GPU of the new generation GeForce FX.
  • Secondary main hero - R300 as RADEON 9700 PRO.
  • 128 bit DDR II memory clocked at 1 GHz as a main villain.
  • DirectX 9.0 shaders and cinematographic effects as arguments.
  • Progressive technologies of memory optimization and full-screen anti-aliasing in the science-fiction role of defenders. of the main hero.

Popular synthetic actors of finished and scheduled full-length 3D movies are also taking part in this movie. We will return to the scenario a bit later, and now some general information on what was happening at the presentation as an overture to the today's review... 

Let's sum up what we have added to the original article: 

  1. The highest texture sampling and filtering rate is up to 8 per clock. 
  2. Number of pixel shader instructions executed per clock cycle: 2 integer and 1 floating-point or 2 (!) texture access instructions. The latter option is possible as during preceding shader's computational operations the texture units could sample texture values with known coordinates beforehand and save them in special temporary registers, which are 16 in all. I.e. the texture units can single out not more than 8 textures per clock but the pixel shader can get up to 16 results per clock. 
  3. Like the previous generation of the chips, the GeForce FX works with two types of MSAA blocks - 2x diagonal and grid 2x2. The 6xS and 8x AA modes are hybrid modes based on averaging of several base blocks one or another way (pattern). 
  4. The frame buffer compression works only in the FSAA modes of MSAA, and only on the MSAA blocks level. Hence lossless compression, about 4:1 in the modes with the 4x base MSAA block, and 2:1 for 2x blocks. 
  5. The chip supports the activity control scheme which controls intensity of operation of a cooling system depending on the load and heating of the chip. 
  6. The chip doesn't incorporate DVI or TV-Out controllers, like all earlier top NVIDIA's solutions. Integrated controllers are used mainly in mass products. 
  7. The mass production of the second revision of the chip, which will be used for production cards, is already about to start. 

Around the scenario: "Run, Industry, run"!




The NVIDIA's GPU codenamed NV30, so long-awaited and one of the most arguable GPUs, was announced on November 18, 2002. It was coupled with the announcement of a new marketing name of GeForce FX. First NVIDIA was going to give a new name to a new trademark to emphasize its importance but then it turned out that it was impossible to refuse its mark. According to the public-opinion polls, the GeForce brand is known to much more people than the company's name NVIDIA is. It's like with Pentium. The parallel with Intel, which is a recognized locomotive of the IT industry is well suitable here as at the moment NVIDIA is a flagship on the 3D graphics ocean, like 3dfx was in its time, a creator of the first really successful hardware 3D solution. Symbolically, while working on the NV30 the developers used a lot of ideas of the 3dfx's project codenamed Mojo which failed to be completed.

No secret that the GeForce FX is an incarnation of a so called flexibly programmable graphics architecture, i.e. a graphics processor. Therefore, this chip should be called GPU, but on the other hand, this term was earlier used for less flexible solutions of the previous generation of accelerators (let's call it the DX8 generation: NV2x, R200 etc.). Let's glance at the transient process from a fixed architecture to flexibly programmable ones:

  • DX7/NV1x/R100 brought configurable pixel stages.
  • DX8/NV2x/R200 brought vertex shaders and limited pixel shaders based on the same shaders.
  • DX9/NV3x/R300 added normal vertex shaders with controllable command streams, software-controlled pixel shaders with arbitrarily positioned commands.

This is an "evolutionary revolution" whose current stage is not completed yet. There is one or maybe several stages coming in the near future:

  • DX9.1/NV4x/R400 - vertex and pixel shaders with totally controlled command streams without any limitations in size.
  • DX10 - possible generation of new vertices in a shader and expended possibilities of the system of shader instructions.
  • A fully symmetric scheme of a programmable GPU as an array of full-featured shader processors without any predetermined architectural division into pixel and vertex shaders tessellating or selecting texture units.

Well, we'll see what we will see. But two latter steps can bring in here much more than two DX versions both because of Microsoft and a guileful intent of graphics chip makers.

On the other hand, while in terms of such evolutionary layout the current stage looks expected, users and programmers can take it as a revolutionary period as it provokes a switchover to utilization of capabilities of flexible programming of accelerators. Big flexible pixel shaders, even without the command stream control, or with simplified control on the predicate level, are able to bring to PC an earlier unreachable visual level making a much greater jump compared to the first attempts of the previous generation clamped by the awkward assembler code of shaders and a limited number of pixel instructions. Quality, rather than quantity, can win this time, and the epoch of DX9 accelerators can become as significant as the arrival of the 3dfx Voodoo. If you remember, the Voodoo wasn't conceptually the first. But it did provoke a quantitative jump of accelerators, which then turned into a qualitative jump of development of games for them. I hope this time the industry will be given a powerful spur caused by the possibility to write complex vertex and pixel shaders on higher-level languages.

Let's leave aside the issue concerning a revolutionary nature of DX9 solutions which are so much spoken about among sales managers. Just must say that while the revolutionary character of the solution as a whole is yet to be proved, the revolutionary character of the separate technologies of the GeForce FX is undoubted.

The accelerators are gradually approaching common general-purpose processors in several aspects:

  • Considerable increase of clock speeds
  • The rough force is now being replaced by fine optimization algorithms and approaches
  • Computational aspect is in the forefront
  • Developed system of the general-graphic-purpose commands.
  • Support of several universal formats (types) of data.
  • Possible superscalar and speculative execution.
  • Complexity and flexibility of programs are getting less limited.

The accelerators are striding toward CPUs and they have already outpaced average general-purpose processors in the number of transistors or peak computational power. The issue of convergence depends only on time and flexibility. CPUs are also nearing them by increasing their performance, especially in vector operations, and soon they will be able to fulfill yesterday's tasks of graphics acceleration. Moreover, the degree of brute force parallelism of CPU is growing up as well - just remember the HT or multicore CPUs. The direct confrontation is not close, but it will definitely take place, and primarily between trendsetters in one or another sphere rather than between the classes of devices (the outcome will be called CPU (or CGPU :). Wallets of users are being fought for now, and graphics in this sphere doesn't lose to CPUs.

Before going further I recommend that you read (if you haven't yet) the following key theoretical materials:

Now we are finishing the digression and turning to our main hero - GeForce FX.

GeForce FX: leading role in focus


    

Straight away: the key specifications of the new GPU:

  • 0.13 micron fabrication process, copper connections.
  • 125M transistors
  • 3 geometrical processors (each exceed the specs of the DX9 VS 2.0)
  • 8 pixel processors (exceed the specs of the DX9 PS 2.0 markedly)
  • Flexibly configurable array of 8 pipelined texture filtering units calculates up to 8 sampled and filtered results at a clock.
  • AGP 3.0 (8x) system interface
  • 128 bit (!) DDR II interface of the local memory
  • Effective 4-channel memory controller with a crossbar
  • Developed optimization techniques for the local memory: full frame buffer compression including color data (for the first time the compression ratio is set to 4:1, only in the MSAA modes), and depth (Z buffer compression)
  • Tile optimization: caching, compression and Early Cull HSR
  • Support of precise integer-valued formats (10/16 bit per component) and precise floating-point formats (16 and 32 bits per component - also known as 64 and 128 bit color) for the textures and frame buffer
  • Through accuracy of all operations - 32 bit floating-point arithmetic
  • Being activated, the new algorithm of optimized anisotropic filtering reduces the performance drop (fps) without bad quality degradation
  • Anisotropic quality up to 8õ of the usual bilinear texture, i.e. up to 128 discrete samples per one texture fetch
  • New hybrid AA modes — 8õ (DirectX and OpenGL) and 6xS (only DirectX)
  • Frame buffer compression makes possible to reduce a performance drop markedly with FSAA enabled
  • Two integrated 10 bit RAMDACs 400 MHz
  • Integrated interface for external TV-Out chip
  • Integrated into GPU TV-Out
  • Three TDMS interfaces for external DVI interface chips
  • Current consumed by the GeForce FX chip based on the .13 technology is comparable to the requirements of the AGP 3.0 specification. Therefore, it's possible to make cards without external power.

And now look at the block diagram of the GeForce FX:




Functions of the blocks:

  • Cache controller, Memory controller, Crossbar - the block controls exchange and caching of data coming from the local memory of GPU and AGP system bus.
  • Vertex Processors - geometrical processors which execute vertex shaders and emulate a fixed T&L. They fulfill geometric transformations and prepare parameters for rendering and for pixel processors.
  • Pixel Processors - execute pixel shaders and emulate pixel stages. They shade pixels and make requests for texture fetch units.
  • Texture Fetch & Filtering & Decompression Units. They implement fetching of certain values of certain textures required by pixel processors.
  • Texture & Color Interpolators - they are interpolators of texture coordinates and color values calculated as output parameters in the vertex processor. These units calculate for each pixel processor its unique input parameters according to the position of a pixel it is shading.
  • Frame Buffer Logic - the unit controls operation with the frame buffer including Frame Buffer Compression & Decompression, caching, Tile HSR Logic - so called Early Cull HSR, and MSAA Allocation and post processing of samples - the final filtering in FSAA modes (FSAA post-processor)
  • 2D Core
  • Two display controllers, two RAMDACs and interfaces

Let's go through the main units of the new chip approximately in the same order as a usual data flow goes. Along the way we will comment on their functions, architecture (the basic one regarding the DirectX 9.0 and additional capabilities of the GeForce FX) and make small lyrical digressions.

 

Aleksander Medvedev (unclesam@ixbt.com)

Write a comment below. No registration needed!


Article navigation:



blog comments powered by Disqus

  Most Popular Reviews More    RSS  

AMD Phenom II X4 955, Phenom II X4 960T, Phenom II X6 1075T, and Intel Pentium G2120, Core i3-3220, Core i5-3330 Processors

Comparing old, cheap solutions from AMD with new, budget offerings from Intel.
February 1, 2013 · Processor Roundups

Inno3D GeForce GTX 670 iChill, Inno3D GeForce GTX 660 Ti Graphics Cards

A couple of mid-range adapters with original cooling systems.
January 30, 2013 · Video cards: NVIDIA GPUs

Creative Sound Blaster X-Fi Surround 5.1

An external X-Fi solution in tests.
September 9, 2008 · Sound Cards

AMD FX-8350 Processor

The first worthwhile Piledriver CPU.
September 11, 2012 · Processors: AMD

Consumed Power, Energy Consumption: Ivy Bridge vs. Sandy Bridge

Trying out the new method.
September 18, 2012 · Processors: Intel
  Latest Reviews More    RSS  

i3DSpeed, September 2013

Retested all graphics cards with the new drivers.
Oct 18, 2013 · 3Digests

i3DSpeed, August 2013

Added new benchmarks: BioShock Infinite and Metro: Last Light.
Sep 06, 2013 · 3Digests

i3DSpeed, July 2013

Added the test results of NVIDIA GeForce GTX 760 and AMD Radeon HD 7730.
Aug 05, 2013 · 3Digests

Gainward GeForce GTX 650 Ti BOOST 2GB Golden Sample Graphics Card

An excellent hybrid of GeForce GTX 650 Ti and GeForce GTX 660.
Jun 24, 2013 · Video cards: NVIDIA GPUs

i3DSpeed, May 2013

Added the test results of NVIDIA GeForce GTX 770/780.
Jun 03, 2013 · 3Digests
  Latest News More    RSS  

Platform  ·  Video  ·  Multimedia  ·  Mobile  ·  Other  ||  About us & Privacy policy  ·  Twitter  ·  Facebook


Copyright © Byrds Research & Publishing, Ltd., 1997–2011. All rights reserved.