ForewordIn this article I would like to look into the current situation with the second version of pixel shaders in DirectX9. For a start let's have a look at the history. John CarmackYet in 2000 John Carmack mentioned the necessity of floating point numbers in a pixel pipeline in addition to those in a geometry pipeline of graphic cards. Let's quote his words:
The whole text of Carmack's plan can be found here: http://www.bluesnews.com/cgibin/finger.pl?id=1&time=20000429013039 The idea of the abstract quoted above is the need for floating point precision operations in a pixel pipeline. A bit later I'll point to other parts of Carmack's plan.
MicrosoftIn February 2001 Microsoft presented their DirectX9 architecture vision (very close to what we've finally got in ATi R300). At the presentation they announced that the next pixel shader version in DirectX9 known as "PS 2.0" would operate with single precision floatingpoint numbers and will be functionally more close to vertex shaders. Floating point representation of numbers in PS 2.0 was implemented in DirectX9 released in December 2002. What are floatingpoint numbers?There are several ways to represent real numbers on computers. 1) Fixed point places a radix point somewhere in the middle of digits, and is equivalent to using integers that represent portions of a unit. For example, one may represent 1/100ths of a unit; if you have four decimal digits, you could represent 10.82, or 00.01. 2) Rational is another approach where a number is represented as a ratio of two integers. 3) Floatingpoint representation  the most common solution  basically represents reals in scientific notation, like this one  1.45*10^{19}. Later we will have a closer look at it. Floatingpoint representationThe scientific notation represents numbers as a base number and an exponent. For example, 123.456 could be represented as 1.23456 x 102. In the hexadecimal system, the number 123.abc can be represented as 1.23abc x 162. Floatingpoint representation solves a number of problems. Fixedpoint numbers have a fixed range of representation, which limits them from representing very big or very small numbers. Also, fixedpoint numbers may lose precision when two large numbers are divided. Floatingpoint numbers, on the other hand, employ a kind of a "sliding window" of precision depending on the scale of the number. This easily allows representing numbers from 1,000,000,000,000 to 0.0000000000000001. In this article I will focus only on the main difference between integer and floating points numbers  ranges and precision, and compare currently available CPU implementations and GPU ones. But now a bit of the history again. Intel's way to do floating point operationsToday the IEEE754 floatingpoint standard is the most common representation of real numbers on computers, including Intelbased PC's, Macintoshes, and most Unix platforms. But how was it formed? In 1976 Intel began to design a floatingpoint coprocessor for its i8086/8 and i432 microprocessors. At Stanford, ten years earlier, Dr. John Palmer (Manager of Intel's floatingpoint effort) recruited William Kahan as a consultant for the upcoming i8087 coprocessor for i8086/8. Subsequently Silicon Valley caught some rumors about the i8087, and the developers were so worried that it resulted in foundation of a committee working on a standard for floatingpoint arithmetic for microprocessors. In 1977 after several committee meetings Professor Kahan, his student Jerome Coonen at U.C. Berkeley, and a visiting Prof. Harold Stone prepared a draft specification in the format of an IEEE standard and brought it back to the IEEE p754 meeting. This draft was called "KCS" until p754 adopted it. By 1985 when IEEE Standard 754 was canonized it has already became a defacto standard. Modern x86 compatible microprocessors support 32, 64 and 80 bit floating point formats.
Storage LayoutIEEE floatingpoint numbers have three basic components: sign, exponent, and mantissa. The mantissa is composed of a fraction and an implicit leading digit (explained below). The exponent base (2) is implicit and doesn't need to be stored. The following figure shows the layout for single (32bit), double (64bit), quadruple (128bit) and extended (80bit) precision floatingpoint values. The number of bits for each field is indicated (bit ranges are in square brackets):
One of the common representations of floating point numbers is "sXXeYY" where XX represents the number of mantissa bits and YY represents the number of exponent bits. Here: single  s23e8; double  s52e11; extended  s64e15; quadruple  s112e15.
Here is how the bits memory are ordered:
Let see what's stored in these fields: The Sign BitThere are two possible values: 0 equals to a positive number; 1 to a negative number. The ExponentThe exponent field must represent both positive and negative exponents. For this purpose, a bias is added to the actual exponent in order to get the stored exponent. For IEEE singleprecision floats, this value is 127. Thus, an exponent of zero means that 127 is stored in the exponent field. A stored value of 200 indicates an exponent of (200127), or 73. The MantissaThe mantissa represents precision bits of the number. It is composed of an implicit leading bit and fraction bits. To find out the value of the implicit leading bit we should take into account that any number can be expressed in scientific notation in many different ways. For example, the number five can be represented as any of these:
In order to maximize the quantity of representable numbers, floatingpoint numbers are stored in the normalized form. This basically puts the radix point after the first nonzero digit. In the normalized form, five is represented as 5.0 x 10^{0}. A nice little optimization is available to us in base two, since the only possible nonzero digit is 1. Thus, we can just assume a leading digit of 1.
Ranges and precision of FloatingPoint Numbers
A closer look at PS 2.0 standard and current PS 2.0 capable hardwareAt the presentation of PS 2.0 and later with the release of the first beta of DirectX9 Microsoft established unified requirements for the minimal range and precision of floatingpoint numbers used in PS 2.0. Ideally the floatingpoint arithmetic precision should comply with s23e8 (32bit single precision) numbers. Later, obviously after some lobbying from NVIDIA, PS 2.0 standard was extended with "partial precision" execution of the floating point operations. Note that this "partial precision" flag for PS operation is only a hint for a videocard's driver that operations do not need a fully precise result. But the driver can ignore this flag and execute PS command in the normal/full precision mode. Below you can see a part of the current specification of PS 2.0 standard concerning the floatingpoint precision:
As we can see, only r#, c# and t# registers require the high precision representation, and colors (both diffuse and specular) can be represented using the same fixed point registers as in DriectX8 PS 1.x.
So, we have reached the central part of our article,  determination of precision of floating point numbers used in the current generation of videochips. For this purpose we developed a special test utility. The utility stores the test results in a logfile formatted the following way:
In this logfile you can see six values reflecting precision of floatingpoint numbers in videochips, one for each register type in two different op execution modes. Below is the summary of the results obtained on the NVIDIA and ATI videocards. The link to this test utility can be found at the end of the article. The program package also contains pixel shader stencils used in determining precision of registers.
If it were not the NV35's results, the numbers in the table wouldn't be so different, right? It's well known that ATi chips use 24 bit floatingpoint numbers internally in the R300 core and this precision is not influenced by the partial precision modifier. But it's interesting that NVIDIA uses 16 bit floatingpoint numbers irrespective of the operation precision requested(!), though the partial precision term was introduced by NVIDIA's request, NV3x GPUs support 32 bit floatingpoint precision under OpenGL NV_fragment_program extension, and NVIDIA advertised their newgeneration videochips as capable of TRUE 32bit floatingpoint rendering! The NV35 demonstrates various and the most correct behavior among NVIDIA's video chips. We can see that calculations are fulfilled with the 32bit precision in the standard mode in line the with the Microsoft specifications, but when it's indicated that partial precision is supported, temporary and constant registers use 16 bit precision and texture registers use 32 bit precision, though according to the Microsoft specification texture registers can also use 16 bit precision. Note that the NV3x results were obtained with the WHQL certified drivers, and I'm very sorry that Microsoft does not keep control over implementation of its own DirectX specifications. Also note that the 16 bit floating point numbers format used by NVIDIA is identical to that suggested by John Carmack in 2000. Let's analyze the results obtained. Below you can see properties of 16 and 24 bit floatingpoint numbers and 32 bit numbers as the standard ones.
It's clear that the s10e5 floatingpoint format is left behind all other formats in most areas. It may look like a paradox but it's more correct to compare s10e5 numbers and fixed point numbers used in PS 1.x. Precision of the numbers in PS 1.x even on the NV30 is equal to 12 bit, which is equal to precision of the s10e5 FP numbers (if we take into account the sign bit and the implicit leading bit). And the advantage of the s10e5 format can be noticed exactly in comparison with the fixed point numbers  much bigger absolute values: 1 (or 2 or 8 in different chips) in comparison with 65536 and simultaneously much smaller absolute values. If you remember, John Carmack indicated the areas where he would like to use s10e5 numbers  it's lighting. The extended range allows using overbright lighting, when someone needs to emulate very bright light sources and when details do not get lost in shadows. But the s10e5 numbers precision is the area where programmers should be very accurate. Obviously, precision of 16 bit numbers won't let making a correct raytracer, like it was demonstrated by ATi, but even calculation of texture coordinates in pixel shaders may lead to undesirable results. Precision of s10e5 numbers won't even let us correctly address textures of the size larger than 1024 pixels for one dimension with the bilinear filtering enabled. NVIDIA perfectly understands these limitations and has already started training game developers so that they can find areas where the insufficient precision of s10e5 numbers lead to incorrect results. NVIDIA also pushes ahead all high precision calculations in vertex shaders. What's next?I this article I've described floatingpoint numbers, current formats of these numbers used in microprocessors and what kind of support for floatingpoint numbers is provided by videochip companies today. I must say that 16 bit floatingpoint numbers are not sufficient for execution of general mathematic computations. But I hope that NVIDIA will let game developers choose when 32 bit floatingpoint numbers should be used and when the 16 bit version with limited precision. Moreover, such choice should be available not only to NVIDIA's flagman  NV35, but also to other representatives of the GeForce FX family. Probably, all video chips of the next generation supporting pixel shaders 3.0 will also support full precision 32 bit floatingpoint numbers. But programmers who use floatingpoint numbers in their work are well aware that one should be very careful when working with 32 bit singleprecision numbers and range overflow and precision loss happen quite often. So what? Should we wait for the next step  doubleprecision floatingpoint numbers (64 bit)? It seems they won't come so soon. Here is one more quote regarding usage of floatingpoint numbers in RenderMan rendering software packages.
Original quote can be found at: http://groups.google.com.ru/groups?hl=ru&lr=&ie=UTF8&oe=UTF8&selm=87a9n5%2482b%241%40sherman.pixar.com What does it mean to us? Probably, we should not expect much benefit from double precision numbers in DirectX. And when they will be finally introduced, it won't be a basic type but just an additional type for programmers to use. Singleprecision (32bit) floatingpoint numbers will remain the basic type for DirectX API yet for a long time. Links to Pixel Shader precision test utility
Bibliography
Alexey Barkovoy (clootie@ixbt.com)
Write a comment below. No registration needed!


Platform · Video · Multimedia · Mobile · Other  About us & Privacy policy · Twitter · Facebook
Copyright © Byrds Research & Publishing, Ltd., 1997–2011. All rights reserved.