NVIDIA GeForce GTX 280 1024MB
|
Introduction
All 3D graphics fans have been looking forward to the real update of the G80 architecture. There were a lot of rumors about the next generation of GPUs, some of them were confirmed. But in 2007 we got only a minor architectural update in the form of G92 solutions. They were good cards for their market segments. These GPUs allowed to drop prices for powerful solutions, they consumed less power and dissipated less heat. However, enthusiasts were waiting for the true update.
Meanwhile, NVIDIA confused everybody with its codenames and name changes, etc. Future cards and/or GPUs were supposed to be named as G100, G200, GT200, and D10U at different times. We speak of information leaks, of course. And deliberate confusion may be one of the ways to fight them. Or it may have the same reasons as the confusion with names of existing cards.
As a result, the company announced two solutions based on the GPU codenamed G100 or GT200 (they have different names in different sources), which only complicates things. NVIDIA traditionally names its GPUs by card names - GeForce GTX 260 and GeForce GTX 280. In this article we'll refer to this GPU as GT200, ready solutions will be called as NVIDIA calls them.
We are happy to write a sterling theoretical chapter about new products at last. The new GPU features architectural changes, even if not as big as with the rollout of the G80. Before you read this article, you may want to take a look at the baseline theoretical articles that describe various aspects of modern graphics cards and architectural peculiarities of products from NVIDIA and AMD.
These articles predicted the current situation with GPU architectures, and confirmed many of our assumptions about future solutions. The detailed information about NVIDIA G8x/G9x unified architecture is provided in these articles:
In architectural terms, GT200 has a lot in common with G8x/G9x, the new GPU inherited the best features and got many improvements. And now we'll analyze the new solutions. We'll start with theoretical information about the updated architecture.
GeForce GTX 280
- Code name: GT200
- Process technology: 65 nm
- 1.4 billion (!) transistors
- Unified architecture with an array of common processors for streaming processing of vertices and pixels, as well as other data
- Hardware support for DirectX 10, including Shader Model 4.0, geometry generation, and stream output
- 512-bit memory bus, eight independent 64-bit controllers
- Core clock: 602 MHz (GeForce GTX 280)
- ALUs operate at more than a doubled frequency (1.296 GHz for GeForceGTX 280)
- 240 scalar floating-point ALUs (integer and floating-point formats, support for FP32 and FP64 according to IEEE 754(R), two MAD+MUL per cycle - read the details below)
- 80 texture address and filtering units (as in G84/G86 and G92), support for FP16 and FP32 components in textures
- Dynamic branching in pixel and vertex shaders
- 8 wide ROPs (32 pixels) supporting antialiasing with up to 16 samples per pixel, including FP16 or FP32 frame buffer. Each unit consists of an array of flexibly configurable ALUs and is responsible for Z generation and comparison, MSAA, blending. Peak performance of the entire subsystem is up to 128 MSAA samples (+ 128 Z) per cycle, in Z only mode - 256 samples per cycle
- Multiple render targets (up to 8 buffers)
- All interfaces (2 x RAMDAC, 2 x Dual DVI, HDMI, DisplayPort, HDTV) are integrated into a separate chip.
Reference GeForce GTX 280 specifications
- Core clock: 602 MHz
- Frequency of unified processors: 1296 MHz
- Unified processors: 240
- 80 texture units, 32 blending units
- Effective memory frequency: 2.2 GHz (2x1100 MHz)
- Memory type: GDDR3
- Memory: 1024 MB
- Memory bandwidth: 141.7 GB/s
- Maximum theoretical fill rate: 19.3 gigapixel per second
- Theoretical texture sampling rate: up to 48.2 gigatexel per second
- 2 x DVI-I Dual Link, 2560x1600 video output
- Double SLI connector
- PCI Express 2.0
- TV-Out, HDTV-Out, DisplayPort (optional)
- Power consumption: up to 236 W
- Dual-slot design
- Recommended price: $649
Reference GeForce GTX 260 specifications
- Core clock: 576 MHz
- Frequency of unified processors: 1242 MHz
- Unified processors: 192
- 64 texture units, 28 blending units
- Effective memory frequency: 2.0 GHz (2*1000 MHz)
- Memory type: GDDR3
- Memory: 896 MB
- Memory bandwidth: 111.9 GB/s
- Maximum theoretical fill rate: 16.1 gigapixel per second
- Theoretical texture sampling rate: up to 36.9 gigatexel per second
- 2 x DVI-I Dual Link, 2560x1600 video output
- Double SLI connector
- PCI Express 2.0
- TV-Out, HDTV-Out, DisplayPort (optional)
- Power consumption: up to 182 W
- Dual-slot design
- Recommended price: $399
GPU complexity is the first thing that attracts attention. One point four billion transistors make GT200 the most complex GPU ever. Being manufactured by the same 65nm process technology as G9x chips, the new GPU is rather big, consumes much power, and dissipates much heat. Its operating frequencies are apparently reduced relative to the previous solutions. The new 55nm process technology will reduce manufacturing costs. As for now, these TSMC production facilities are probably used for other chips.
GT200 has a lot of quantitative and qualitative changes. The graphics processor used in GeForce GTX 200 cards features a lot of improvements and modifications relative to the previous GPUs.
NVIDIA engineers had to solve certain tasks to design these solutions:
- Design a GPU, which should be twice as fast as the previous top solution (we mean GeForce 8800 GTX based on G80)
- Rebalance the architecture for future 3D applications, which use more complex shaders (reinforce the arithmetic part and enlarge built-in caches)
- Raise architecture efficiency, its performance per Watt and square millimeter of GPU surface
- Remove some bottlenecks that were present in previous chips, detected with the help of geometry shaders and stream out
- Significantly raise GPU computing performance for CUDA applications and physics, adding expanded functionality (FP64)
- Reduce power consumption in idle mode
These problems were solved in GT200 (compared to G8x/G9x). Let's enumerate the key architectural improvements in the new GPU versus the first GPU of the first generation of unified architectures from NVIDIA (G80):
- Almost twice as many streaming processors
- 2.5 times as many simultaneously executed computing threads
- Simultaneous execution of two instructions (dual issue) by streaming processors
- Double-precision floating-point operations in compliance with IEEE 754R
- Updated control logic of increased efficiency
- Twice as large register file, which accelerates execution of complex shaders
- Architectural improvements to accelerate stream out and geometry shaders
- A gigabyte screen buffer and a 512-bit memory bus
- Full-speed blending by ROPs (versus half-speed operation in G80)
- Improved compression and z-cull technologies, higher efficiency of memory bandwidth usage, improved compression algorithms
- Hardware support for 10-bit output via DisplayPort.
Write a comment below. No registration needed!
|
|
|
|
|