DirectX.Update: Graphics Accelerators: Half a Step Forward

Warning: In this article I want to speculate freely about what is going on (according to the author, most likely) now and will go on in the nearest future on the 3D accelerators market. Most considerations in this article are essentially author's assumptions or a creative revision of rumours circulating in Internet. However, hypotheses and rumours have been carefully sifted through the experience, thorough analysis of tendencies, and other effective tools to separate the husk from the grain, which are available to the author. Read it to your own risk, I hope you will find it nice and interesting.

Semi-generations problems

So, we'll start with analyzing the most likely announcements of graphics chips and solutions based on them, which we shall probably witness in the nearest future. In alphabetic order:

A positive in every respect hero in a red cloak and Shaders 3.0

ATI is about to complete the development of a new chip, code named R520. To all appearances, ATI specialists, who previously stated that the time of Shader Model 3.0 had not come yet, studied the situation and worked out a new position. They have taken into consideration the gradual supplanting of assembler code in shaders by the high-level HLSL code, a noticeable progress of shader compilation technologies in DirectX (especially in the latest SDK and Update), as well as the increased interest of developers and even releases of programs that can take advantage of Shaders 3.0. So, the time had probably come, decided specialists from ATI and started to design R520, which can execute Shaders 3.0 (we've heard both the rumours and statements made by ATI employees on this account).

What will bring support for the new shader model? Firstly, it'll be a noticeable increase of pixel pipeline complexity. Secondly, the balance of powers will tip to NVIDIA architecture – compilation and drivers will have a tad more effect. To what extent is ATI ready for it? We'll see everything in practical tests in the nearest future, but now we can make several possible assumptions:

Scenario 1: The chip contains 16 pipelines (like the previous generation), but it has a higher core clock.

Scenario 2: The chip contains 24 pipelines and has a similar core clock.

Scenario 2-bis: The chip has 24 pipelines, eight of them are hybrid ones (that is they can execute both pixel and vertex shaders!) and perform this or that function depending on current needs. The distribution is obviously carried out by drivers instead of a fully dynamic distribution on the hardware level. Though the second option is also possible – it's more effective and advanced.

The scenario with several texture units and 16 pipelines is not likely, so we shall not mention it.

2-bis scenario is the most interesting, in the final analysis it's one of the most likely and interesting prospects for the future development of video accelerators in general. It allows more effective use of chip resources, both computing and caching capacities (theoretically).

From the strategic, programming point of view, there will be no significant architectural changes, except for Shaders 3.0 support. But that's quite enough. Let's see how efficient there will be access to vertex shader textures, their implementation in NVIDIA chips leaves much to be desired so far. Note that in case of 2-bis scenario a vertex shader gets all interpolation and texture fetch capacities available in a pixel shader, which seems an optimal option to me (if it's implemented of course).

There is another interesting point – symmetric support for floating point formats. As you may know, in most cases NV4X chips (except for NV44) provide not only fetching floating point textures and writing to a floating point frame buffer, but also FP16 blending (at multipass rendering) in the floating point buffer. This feature seems the most likely candidate to be implemented in R520 after Shaders 3.0

So, fetching and interpolating FP16 textures at minimum and blending FP16 frame buffer. Besides, there will probably be added an additional anisotropy level and minor changes in antialiasing technology, but not principal ones – flexible ATI's program MSAA patterns have been long up to the mark, and there are no new cardinal motions in this sphere.

And the last point – MMU. All PCI-Express chips (R520 will be solely a PCI-Express solution) will be gradually equipped with page MMUs, which will take up managing resources and their automatic (to some extent, together with the driver) uploading into local memory of an accelerator as needed. This evolution is described in the article "DX Next". It is the straight road to new requirements of the Longhorn driver model. There will be obviously taken various engineering steps to reduce resources spent by CPU on transferring and preparing video data as well as to reduce delays it takes to change rendering contexts (textures, frame buffers, shaders, etc). These parameters are so critical to modern applications that they often hold back applications, making their CPUs depend on drivers and preventing them from using hardware capacities of accelerators completely. However, all these innovations are limited in many respects by the modern DirectX, which will not be changed cardinally until Longhorn.

What concerns the memory bus and its type – current architectures and applications are increasingly dependent on shader performance and a source data delivery from CPU – the local memory stopped playing a critical bottleneck role as it used to be two generations ago. Up-to-date chips, GDDR3, (1.4 and even 1.2 ns will be available) can provide synchronous memory bus operation at 600 MHz and higher. Judging from the core complexity, we'll hardly deal with frequencies higher than 600-650 MHz. On the contrary, there are less fast scenarios. It's impossible to guess how much less, only mass production will show. As the previous year demonstrated, even specialists from NV or ATI can only suppose such issues, while the actual mass production of chips tends to quite different figures.

The new chip will be manufactured using the 90nm process technology – it's rather risky considering the high solution complexity and "greenhorn" technology, not yet tried out by the manufacturers of graphics chips. However, nothing ventured, nothing gained – if successful, this technology may become a pledge of benefits and ensure effective price competition. But only if it's a success: remember all those problems NVIDIA had, when it developed the 130nm technology.

So we get the following most likely picture in the long run:

Code name: R520
Probable name of the product line: RADEON 900 or RADEON 10000
Process 90nm (already known)
Probable time of the announcement: May 2005
Probable number of pipelines 16+8 or 24+8 or 16+8 universal
Shader Model 3.0
MMU, storing textures and rendering in PCI-Express memory when necessary
Likely features: fetching and interpolating FP textures, FP blending
512 MB of memory in desktop solutions (already known), 256 bit GDDR3 memory, typical operating frequency (600-750MHz)*2
Core clock is unknown, but it will hardly exceed 650 MHz anyway
Rather high price – the first 512 MB video cards may cost over $600

Cautionary tale about NVIDIA, ATI and the PCI-Express <=> AGP bridge

Time has shown that the NVIDIA strategy to use PCI-Express bridges was a better choice. Transition to PCI-Express proceeds at a normal pace: no stunning leaps, as some manufacturers planned. It's quite a traditional luring out of the market of old platforms as they grow outdated and their replacement with new models as new computers are bought. According to my experience, this evolutionary platform replacement used to require and still requires at least 1.5 or even two years, before the volume of accelerators bought for PCI-Express platforms gets equal to the volume of AGP accelerators. Especially as there are decent processors and cost-efficient memory for the old platform so far, which still encourages buying them for new PCs.

There are other fine points as well: like no motivation to replace typical 2.X GHz CPUs in general (they are still quite sufficient for many users) and a more reasonable way to upgrade your accelerator. It can be upgraded to a top AGP solution – at the same cost it will provide larger gain in games than if you replace the entire platform and buy a new middle end solution for PCI-Express.

PCI-Express is undoubtedly cool, it will certainly overcome in future, but the fact remains: there currently formed a noticeable deficit of AGP solutions based on the new generation chips. NVIDIA jumped at this chance and increased the production volume of video cards with bridges and simply AGP cards based on NV40/41, which were sold out like hot pies. On the contrary, ATI couldn't offer its latest architectural technologies on the AGP market in several price segments and thus lost a considerable number of clients. It also had to manufacture some outdated solutions and increase the number of new chips to realize the current demand. Which was certainly unprofitable. For example, the launch of R481 could have been avoided if a bridge had been available.

So, it's already known that ATI had to create its bridge (RIALTO) anyway, which is similar to the HIS bridge from NVIDIA in many respects. In the nearest future we can expect an entire series of announcements proclaiming solutions based on the latest PCI-Express chips from ATI with the AGP bridge, to repair the shortage of models and satisfy the demand on the AGP market. This move will allow to curtail the production of old architecture chip incarnations, having simplified and organized the ATI product line as well as having made it more uniform in features.

Hardly a year has passed and the time has already shown which approach was correct and which was wrong. IT technology has never been so excitable and interesting to gamblers and forecasters, hasn't it?

There will obviously be a pause after the spring announcement of R520, and (probably in the beginning of autumn) there will be announced low end solutions for the masses based on this architecture.

Agenda – what's on the menu from the best NVIDIA cooks: today and tomorrow.

90 nm chip is waiting for us in the long and fair prospect of the end of 2005, it complies with WGF 2.0 (Windows Graphics Foundation 2.0, the main part of it will be so called DirectX.Next aka DirectX 10) and Longhorn driver model. A new version of shaders, a new driver model, new interaction principles between API and hardware, new applications. It will all appear in late spring 2006, not before. But the trouble is that the hardware must be ready by the end of 2005. To all appearances, NVIDIA will pioneer a WGF 2.0 solution (we'll see how fast ATI will step in). Such cardinal architectural changes come at a cost, so we have to wait. We are not going to see any architectural innovations from NVIDIA till the late autumn 2005.

There will be NV48 instead – 110 nm optimized version of NV45 with built-in PCI-Express interface. The number of pipelines will not be changed (the most likely scenario). But if everything is ok, the clock speed will grow and there will probably appear 512 MB cards. There is no reason to worry about lagging behind ATI – NV4x architecture was a step ahead of ATI anyway – it already includes Shaders 3.0 and fetching/blending FP textures. The point is only in speed, memory volume, and probably such trifles as hardware technology 3Dc (if ATI licenses it to Microsoft or NVIDIA) and optimized access to textures from vertex shaders (which will probably happen only in NV50).

NV4X feels in its element in other market segments and may dominate them for a long time as it is equipped with faster memory, the production gets smoother and the yield gets higher.

Interestingly, NVIDIA finally got a license from Intel for the Pentium 4 CPU bus. Now, very good nForce chipsets will come to this (I will not shy away from this word) main platform of modern IT industry. On the one hand, they will compete with proprietary Intel chipsets; on the other hand, they will compensate it by the disappearance of another significant case for AMD. But we are interested in the other point – having got a license for the CPU bus, will NVIDIA take up CPU manufacturing in future? The company that can create such a complex accelerator can try its hand in CPU building, why not? On the other hand, high frequencies and CPU specifics require tight cooperation directly with plants and thorough work on chip manufacturing technologies – NVIDIA does not have an opportunity and, what's more important, it does not have the experience, while Intel has been practicing in this sphere for decades as well as AMD. Anyway, the appearance of a strong and actual player on the Pentium 4 chipsets market is certainly good news in the nearest future as well as in the long view. SiS and VIA have recently failed to humour us with innovations. They have been running down their presence on this market, having focused on cheap and not very apt solutions.

And the key question is when NVIDIA starts a fight with Intel for a share of the growing integrated video market, which used to consume only potential customers of NVIDIA and ATI up to now?

Don't have any doubts – the fight will burst out.

Bonus: the problem of competition and the lack of competitors as a problem of tomorrow and of the new OS in the light of the problem of limited requirements and their growth (quite incomprehensible).

In conclusion – several words about fine matters. 2006 will present a leap of opportunities connected with the new OS. A new graphics architecture and shader model, a new generation of accelerators, adaptation of new PCI-Express features, LaGrand and Vanderpool technologies, multi-core processors on every desk. We can expect an increase in needs of regular PC users this year, which "froze" in the beginning of 2004 and resulted in a fair amount of conservatism in regular buyers of PC components. The last serious breakthrough in end user opportunities – DVD recording – was rather long ago. The main trends of development fall on mobile solutions so far (mass purchases of notebooks resulting in increased notebook sales, rapid quantitative growth of the PDA/smartphone sector), but in the nearest future we can expect noticeable progress on the PC market as well.

It's obvious that only new PC usage models (for example – digital house), new OS, and new software will set afoot this local stagnation. As in any conservative country, we have maximum two leading parties so far (ATI and NVIDIA, Intel and AMD), which are either poised (two scenarios – 50/50 parity and 80/20 opposition), or in transition from one stable state into the other :-)

While this little stagnation goes on, the launch rate of new graphics architectures has also slowed down. New manufacturers cannot force their way into the market and occupy a noticeable place there (remember the failed XGI attempts as well as many other examples) without rapid development of opportunities for software and correspondingly hardware technologies. Only with the launch of Longhorn, say in 2007, we can hope for the appearance of new players or modified alignment of forces. But these hopes are also feeble – the hi tech boom is over, personal computers are reasonably and undeviatingly going for consumer electronics. This way or another, a monolithic PC in the form of a notebook, a small music center or a DVD player is a natural ending for this evolution, except for specific areas of application (servers, etc).

Content rules all (content and access to it are in the first place), devices for its creation or playback (modern PCs) are secondary. Content urges forward the development of hardware infrastructure, be it software (games) or other forms (movies on demand from servers in a network, TV on demand, etc) and new models for its creation and consumption. It's no secret that an average user consumes much more content than creates it: it's more difficult to create.

So, these days such large companies as Intel have only two ways in chase of PC market expansion: to teach the masses to create and to teach the masses new consumption methods. Only then there will appear people who will buy something with a processor and OS inside. That's what Internet acceleration is :-)

Alexander Medvedev (unclesam@ixbt.com)

March 18, 2005

Write a comment below. No registration needed!