No wonder that both major GPU manufacturers have recently paid special attention to non-graphical computations on graphics cards. GPU computing has gradually found its way to scientific fields and regular software, used in everyday tasks. For example, there have already been released (or will be soon) updated versions of Adobe Photoshop, Adobe Premier, Cyberlink PowerDirector. Hardware acceleration of physical computations in games is already used by NVIDIA PhysX.
In order to raise performance and flexibility of parallel non-graphical computations, several changes have been introduced into RV770:
- Faster floating-point computations (FP64). Peak performance of current solutions based on RV770 reaches 240 gigaflops, which is five times as high as performance of the fastest quad-core CPU. Computational accuracy meets the IEEE 754 standard requirements.
- Increased performance of random read/write operations (MemExport/MemImport). Scatter and gather operations are executed at double speed relative to RV670. Maximum performance is sixteen 64-bit export operations or eight 128-bit operations per cycle.
- Fast creation of computing threads, leading to reduced overheads on parallel computing.
- Data exchange between computing threads. Each SIMD has dedicated memory, separated from texture cache. Global data exchange between all SIMD units is also possible. According to the company, these changes resulted in a seven-fold performance gain in fast Fourier transformations (FFT).
- Fast bit shift operations available to all SPs. The new solution is 12.5 times as fast as the previous generation here. This change accelerates video data processing and encoding tasks, as well as compression and encryption algorithms.
AMD publishes the following relative performance graph (RV670 and RV770) in synthetic computing tasks. The average performance difference between two generations amounts to 2.5-3 times (which matches the increased number of streaming processors). But the maximum performance gain was reached in the FFT algorithm - performance difference reached seven times here. AES encryption has been accelerated almost four times. That's the effect of architectural changes.
RV770 has a built-in Unified Video Decoder of the second generation (UVD 2). It decodes video data in all popular formats: H.264, VC-1, and MPEG2. This unit can decode two full-scale (1080p) streams simultaneously. It offers better video post processing features - there have been added scaling DVD video to HD resolutions and dynamic contrast adjustment.
Other important innovations include support for 24- and 30-bit displays (up to 2560x1600 via DisplayPort). HDMI output is also supported through DVI-2-HDMI adapters, up to 1920x1080. The new audio controller is a real improvement. Now it supports uncompressed stereo stream with the sampling rate of 48 kHz or eight-channel (7.1) stream in AC3 format with the bitrate up to 6.144 Mbps.
It's a new surge of interest to hardware-assisted video encoding and re-encoding from one format into another. AMD calls it Accelerated Video Transcoding (AVT). It supports H.264 and MPEG2. The company claims that 1080p video is re-encoded faster than 30 FPS - that is faster than in real time, on the fly.
In fact, the speed of GPU-assisted encoding is similar to that of NVIDIA solutions - it takes Core 2 Duo E8500 almost ten hours to re-encode one-hour 1080p video clip, while RADEON HD 4800 copes with it for 32 minutes. That is the new GPUs are 20 times as fast as dual-core processors. To crown it all, Cyberlink PowerDirector 7 will be released soon, which will support GPU-assisted encoding.
PowerPlay power management
Dynamic power management technology, ATI PowerPlay, came from GPUs for notebooks. But it was improved. A special control circuit in a GPU monitors its load and determines an optimal operating mode by controlling clock rates of a GPU, memory, voltages, and other parameters, optimizing power consumption and heat release. For example, voltages and frequencies (as well as fan speed) will be minimized under low 2D load. Under average 3D load all parameters will be set to medium. And when the GPU works at full capacity, voltages and clock rates will be set to maximum.
A special microcontroller is integrated into the GPU for power management. It constantly monitors temperatures and bus activity, both internal and PCI Express. The driver controls everything - clock rates of the GPU and memory, voltages, fan speed. It can also disable idle GPU units. Owing to the updated power management technology and other modifications, the new GPU is twice as efficient in terms of performance per Watt.
So, we've just covered theoretical aspects of the new RV770 GPU. The next part of the article will be devoted to performance tests. We'll learn how RV770 fares against previous solutions from AMD and competing solutions from NVIDIA. The most interesting part is how the above mentioned architectural changes in the RV770 affect its performance.
Write a comment below. No registration needed!