Another major change in the GPU is the upgrade from the widely advertized ring bus, which however was a failure, to the architecture with a central hub. Memory controllers in the RV770 are on the periphery of the chip, near the main traffic consumers. And the hub distributes data flows between the above mentioned units, as well as PCI Express, CrossFireX, UVD2, output controllers.
According to the company's estimates, the new design of the memory interface significantly raised efficiency of bandwidth utilization. Besides, the controller supports new GDDR5 memory modules, operating at the frequency effectively quadrupled to 3.6-4 GHz and higher, which gives us memory bandwidth of up to 120 GB/sec with an inexpensive 256-bit interface.
Prime costs are important, because a 512-bit bus is required to raise GDDR3 bandwidth, which leads to a larger GPU and its package. Cards themselves become bigger and more complex, consuming more power. Upgrading to GDDR5 allows to raise performance by 2-3 times and keep GPUs smaller, consuming less power.
AMD was taking an active part in the development of GDDR5 standard together with all major memory manufacturers (Hynix, Qimonda, and Samsung) and JEDEC. This memory type took three years to develop, from the very beginning to the final specs. AMD spent even more time on this project. GDDR5 memory is planned to operate at up to 7 GHz of effective (quadrupled) frequency. The first 0.5-2 Gbit chips supporting 1.5 V (unlike 2.0 V for GDDR3) can operate at up to 1000*4=4.0 GHz.
To conclude the architectural part of the review, we should publish the GPU chart that shows GPU surface areas occupied by various units. As we can see, most part of the GPU is occupied by 800 streaming processors, a significant part of the chip is taken by texture processors and various controllers. To all appearances, the remaining part is occupied by ROPs, L2 Caches, and other auxiliary logic units.
Just like in the GT200, the RV770 offers higher execution efficiency of geometry shaders, to be more exact - higher speed of generating vertex data. The GPU supports four times as many geometry execution threads, lots of generated vertices can be stored in the GPU.
The tessellator is also said to be modified. Now it's compatible with DirectX 10 and 10.1, it also supports instancing. We shall have a look at the speed of executing geometry shaders in our synthetic tests in the next part of the article. The tessellator was described in the baseline article about the R600.
Drawing a bottom line, we can say that the GPU can be called RV-"correction-of-mistakes"-770. All apparent bugs in the RV670 design were fixed. The number of ROPs hasn't changed, but they are twice as fast now, doing twice as much work per cycle in most cases. As a result, algorithms of MSAA 2x and 4x for RADEON HD 4800 comes almost free of ROP charge, although memory bandwidth limitations may still reduce performance.
These are not all changes. TMUs were also overhauled. The dedicated TMU pool was removed. The new solution includes TMUs into each of ten SIMD cores. Another major change is the upgrade of the ring bus to the hub architecture. It's interesting to note that almost all changes bring the RV770 GPU closer to what we see in NVIDIA solutions. It seems that ATI(AMD) have learned to admit their mistakes and fix them. Good of them. All we have to do now is to benchmark the new solutions.
Write a comment below. No registration needed!