iXBT Labs - Computer Hardware in Detail

Platform

Video

Multimedia

Mobile

Other

FarCry v.1.2 Tests Continued, Shader 2.b

July 25, 2004



Contents

  1. Overview, video cards' features
  2. Testbed configurations
  3. Test results
  4. Conclusions


We proceed with FarCry 1.2 tests. In the beginning of the review, by tradition, we list articles concerning our heroes:

Theoretical materials and reviews of video cards, which concern functional properties of the GPU ATI RADEON X800 (R420) and NVIDIA GeForce 6800 (NV40)

    Let's briefly introduce our contenders:

    Video cards



    ATI RADEON X800 XT (Gigabyte) NVIDIA GeForce 6800 Ultra (Leadtek)




    As you probably remember, both cards have 256 MB of GDDR3 memory, 1.6ns access time, memory frequencies being at 550 (1100) MHz for NV40, and at 575 (1150) MHz for R420. Chip clocks: NV40 - 400 and 450 MHz, R420 - 525 MHz. Both GPUs contain 16 pixel and 6 vertex pipelines. NV40 also supports shaders 3.0, and R420 - a proprietary technology for compressing normal maps - 3Dc. That's it in brief.

    Installation and drivers

    Testbed configurations:

    • Athlon 64 3200+ based computer:
      • AMD Athlon 64 3200+ (L2=1024K) CPU
      • ASUS K8V SE Deluxe (VIA K8T800) mainboard
      • 1 GB DDR SDRAM PC3200
      • Seagate Barracuda 7200.7 80GB SATA HDD
    • Operating system Windows XP SP1; DirectX 9.0b;
    • Monitors: ViewSonic P810 (21") and Mitsubishi Diamond Pro 2070sb (21").
    • ATI drivers v6.458 (CATALYST 4.7beta); NVIDIA v61.45.

    VSync is disabled.

    Test results

    Test results

    We used the following test applications:

    • FarCry 1.2 (Crytek/UbiSoft), DirectX 9.0, multitexturing, (the game is started with the key -DEVMODE), all test settings are set to Very High.

    Note that the patch includes premade demo-scripts, which can be started either via console or from the command prompt. Here is a sample BAT-file:

    echo Running Research...
    Bin32\FarCry.exe -DEVMODE "#demo_num_runs=1" "#demo_quit=1" "map research" "demo research" "r_sm30path 1"
    echo Running Regulator...
    Bin32\FarCry.exe -DEVMODE "#demo_num_runs=1" "#demo_quit=1" "map regulator" "demo regulator" "r_sm30path 1"
    echo Running Training...
    Bin32\FarCry.exe -DEVMODE "#demo_num_runs=1" "#demo_quit=1" "map training" "demo training" "r_sm30path 1"
    echo Running Volcano...
    Bin32\FarCry.exe -DEVMODE "#demo_num_runs=1" "#demo_quit=1" "map volcano" "demo volcano" "r_sm30path 1"

    Using variables #r_Width=1600 and #r_Height=1200 you can set resolution from the command prompt. The usage of Shaders 3.0 is controlled by the r_sm30path variable. When it is set to "1", SM 3.0 are enabled;0 - disabled and rendering is back to Version 2.0 maximum.

    Disclaimer: Most part of this review has been written before the final patch was released. At the end of the article you can find additional information about the changes between the patch version that we initially got hold of and the official version.

    The first part of the article provoked a lot of responses, both enthusiastic and skeptical. The former expected to see FarCry running in the new rendering mode (so called sm30path) on all video cards supporting pixel shaders v2. The latter expressed utter impossibility of this feature. Now that I've been experimenting for a week with different FarCry 1.2 modifications on video cards supporting only the second version of shaders, I can say that both groups were partly correct.

    I shall not hold things up, I confirm that our lab has a version of FarCry 1.2, which runs in the so called "sm30path" mode (more exactly in its pixel shaders part) on video cards supporting pixel shaders 2.x and providing image quality identical to that on NV40. We shall refer to these shaders as "long".

    It must be noted that geometry instancing in our tests has almost no impact on the final result. It does not mean that this function is useless, but it will display itself to maximum when the bottleneck of the rendering procedure is the CPU, which cannot cope with setting a great number of extra parameters for scene rendering. This does not happen with standard FarCry tests on NV40, partly because FarCry was initially optimized for the existing video cards. In future the situation may change. For example, using video cards with geometry instancing support, developers will be able to model real flora geometry on large open distances without imposters.

    Technical details

    When preparing the first part of the article we found out, that theoretically in the sm30path mode you can limit the usage of shaders to the second version instead of using Shaders 3.0. That's why we attempted to make FarCry use the rendering mode with "long" shaders of the third version on video cards supporting only Shaders 2.x.

    Here is the resulting step-by-step description of our work:

    1. FarCry was "cheated" to use the rendering mode similar to that on NV40
    2. The rendering procedure started to use Shaders 2.0 instead of their equivalent Shaders v3
    3. We checked whether the resulting images are identical

    The first stage is relatively easy to do: using 3DAnalize (we already mentioned this utility in the previous part) we substituted the VendorID/DeviceID combination, which identifies the manufacturer and the specific product, and we also set the Shaders 3.0 support flags. As a result, FarCry recognizes any video card as NV40 and allows to render multiple light sources at one pass with the help of "long" shaders.

    We managed to succeed in the second stage (substitution of Shaders 3.0 for their Shaders 2.0 counterparts) partly due to the FarCry engine as our aid. The fact is, FarCry cannot recompile its shaders, written in HLSL/Cg, and so it uses the external compiler from Microsoft: fxc.exe. So, to achieve our goals we replaced this compiler with another program performing all operations required from the original compiler plus introducing necessary corrections into shaders and collecting shaders statistics.

    As we were pressed for time to process and check the results, the data below is based on the four levels used in our demos, which were provided by NVIDIA together with FarCry patch 1.2, and on the demo results: Research, Regulator, Training and Volcano.

    At first I'll quote the general statistics. To render the four above-mentioned levels, all in all FarCry prepares about 3500 pixel shaders and 3500 vertex shaders of all available versions: from 1.1 to 3.0. Major part of these shaders is constituted by Shaders 3.0 permutations for multiple various values of the initial parameters. All vertex shaders 3.0 are easily compiled into Shaders 2.0. When we compile pixel shaders 3.0 as shaders 2.0 we get: 1700 shaders 2.0 and 1900 shaders 2.x (we used the ps_2_b profile for ATI R420 chips, but all these shaders can be certainly compiled with the ps_2_a profile). The largest of all compiled shaders consists of 123 instructions: 9 textural and 114 arithmetic. You can see that more than a half of the shaders have requirements beyond the base version 2.0. But compiling shaders does not mean using them in the real game situation. For example, NVIDIA demos use one Shader 2.x each on the levels Regulator and Training, and no shaders 2.x on the levels Research and Volcano. For all that, the number of shaders of the base version 2.0 used in these demos varies around 70.

    So, we gradually approached the third stage of our analysis: comparing whether the resulting images are identical. This stage proved the most difficult, because after the first two stages were completed, it turned out that the screenshots rendered with "long" shaders often don't match the ones obtained with Shaders 3.0. But you cannot say that they are wrong - just a little different.

    The reason was that FarCry developers used one of the shader features of the third version - 10 interpolation registers storing floating-point data, which are used to transfer vertex shader results to a pixel shader. Shaders 1.x-2.x specifications stipulate for only eight such registers of the standard precision. The two remaining registers are used to transfer colors, their precision being not less than 8bit per color channel (standard 32bit color). Another more important moment: when passed from a vertex shader to a pixel one, values of these "color" registers are cropped in the [0..1] range. That is any value written in a vertex shader, which is more than 1 or less than 0, will be inevitably distorted when passed to a pixel shader. It's important to note that this requirement is dictated by the DirectX specification, so even if a video card with pixel shaders 2.x support can interpolate more than 8 registers with high precision, it must emulate the above-mentioned behaviour.

    I have already mentioned above that the developers used a shader feature (10 interpolation registers), but in reality they use only 9 registers out of 10 to render lighting from four sources at one pass (four light sources - maximum quantity currently supported by the FarCry engine). It looks more like an artificial limitation, if they wished to support shaders 2.0, FarCry developers could have limited one pass rendering to three light sources for this class of video cards. But as we had neither opportunity nor desire to modify the code of the game, we had to look for a workaround.

    There are at least three workarounds, below is their simplified description.

    1. Use 8 available high precision registers for rendering less than four light sources. And with four light sources, leave one of the registers for color. This is the simplest workaround without any additional expenses. In the long run we chose this option. Though there is a possibility that in some places of the game the images rendered with such shaders will be different: we haven't found such places.
    2. Normalize the register values after their final calculation in a vertex shader, before they are passed to a pixel shader. This workaround will work, because these registers are normalized in the pixel shader anyway.
    3. None of the interpolation registers of the standard precision in Far Cry uses all the four components of the register (x,y,z,w) at once. Some registers even use two components (x and y), some - only three. So when using "long" shaders of the second version, the data for the "lacking" registers can be packed into the vacant components of the available registers.

    Using a combo of Methods (1) and (3) we can get "long" shaders of the second version, which will be 100% identical to the original shaders used in FarCry to demonstrate the NV40 features.

    Results

    So, we have come to the part of the review, which is hardly skipped by the majority of readers.

    Let's see what we get when using "long" shaders in ATI Radeon X800 XT. For our tests we used the same settings as for researching different rendering modes in GeForce 6800 Ultra.



    When using the "long" shaders, Radeon X800 XT obviously features the same performance gain as GeForce 6800 Ultra: again the Research demo demonstrates more than 20% gain, decent gain in the Volcano demo, and little gain in the Training and Regulator demos.

    Let's compare the results of both video cards in one diagram.



    Well, quite natural results: total leadership of NV40 in the demos prepared by NVIDIA is gone; in some parts Radeon shoots ahead again.

    Modifications in the official release of Patch 1.2

    On July, 21st the official version of Patch 1.2 has been released at last. I'm content with the fact that all the conclusions made in the first part of the article proved to be true, and that I will not have to complain much about the sheer marketing acts of Far Cry developers. I'll return to that later in the conclusion.

    So, ATI fans may rejoice: FarCry 1.2 supports the special rendering route for shaders 2.0b as well as the textures compressed with 3Dc.

    The special rendering method for ATI R420 is activated by analogy with NVIDIA using two keys: either "r_sm2bPath", or "r_noPs2b". To do this you should type in the console "\r_sm2bPath 1", or in the command prompt set the parameter ""r_sm2bPath 1"" (the brackets are obligatory). When r_sm2bPath is enabled, by analogy with NV40, the future R420 drivers will activate geometry instancing. FarCry developers most likely followed the path of least effort and limited the number of light sources to three at one pass for a method using shaders 2.x. That's what I suggested above :-). In this case the maximum number of interpolation registers used equals to their maximum available quantity - 8 items.

    We have promptly benchmarked the system with the new patch. Unfortunately the testbed has been already changed in comparison to the previous tests. In particular it didn't have ServicePack2 installed (it is not necessary for enabling sm2bPath). So we cannot compare results with a frame-to-frame precision. Nevertheless the results are obvious:



    Conclusion

    By the time this conclusion was written, with the official release of Patch 1.2 FarCry developers confirmed the above material with their own hands.

    Thus, let's summarize the main theses:

    1. The current version of Far Cry uses only the tiniest part of the Shaders 3.0 features.
    2. Pixel Shaders 2. can be easily used instead of Shaders 3.0 to implement the new "accelerated" rendering method.
    3. "Accelerated" rendering method can be implemented even for cards supporting only the base version of Shaders 2.0. It can be proved, if nothing else, by the fact that the maximum gain from the "accelerated" rendering was obtained in demos, which did not use shaders of the version higher than the base version 2.0.

    So, developers have a lot to do to optimize rendering for a huge number of the existing video cards, which support only pixel Shaders 2.0. On the other hand, developers are heading towards HDR (high dynamic range) rendering, which will lead to further lengthening of pixel shaders. We shall probably learn what has a higher priority for Crytek with the release of the next patch.

    Alexey Barkovoy (clootie@ixbt.com)

    July 25, 2004

    Write a comment below. No registration needed!


Article navigation:



blog comments powered by Disqus

  Most Popular Reviews More    RSS  

AMD Phenom II X4 955, Phenom II X4 960T, Phenom II X6 1075T, and Intel Pentium G2120, Core i3-3220, Core i5-3330 Processors

Comparing old, cheap solutions from AMD with new, budget offerings from Intel.
February 1, 2013 · Processor Roundups

Inno3D GeForce GTX 670 iChill, Inno3D GeForce GTX 660 Ti Graphics Cards

A couple of mid-range adapters with original cooling systems.
January 30, 2013 · Video cards: NVIDIA GPUs

Creative Sound Blaster X-Fi Surround 5.1

An external X-Fi solution in tests.
September 9, 2008 · Sound Cards

AMD FX-8350 Processor

The first worthwhile Piledriver CPU.
September 11, 2012 · Processors: AMD

Consumed Power, Energy Consumption: Ivy Bridge vs. Sandy Bridge

Trying out the new method.
September 18, 2012 · Processors: Intel
  Latest Reviews More    RSS  

i3DSpeed, September 2013

Retested all graphics cards with the new drivers.
Oct 18, 2013 · 3Digests

i3DSpeed, August 2013

Added new benchmarks: BioShock Infinite and Metro: Last Light.
Sep 06, 2013 · 3Digests

i3DSpeed, July 2013

Added the test results of NVIDIA GeForce GTX 760 and AMD Radeon HD 7730.
Aug 05, 2013 · 3Digests

Gainward GeForce GTX 650 Ti BOOST 2GB Golden Sample Graphics Card

An excellent hybrid of GeForce GTX 650 Ti and GeForce GTX 660.
Jun 24, 2013 · Video cards: NVIDIA GPUs

i3DSpeed, May 2013

Added the test results of NVIDIA GeForce GTX 770/780.
Jun 03, 2013 · 3Digests
  Latest News More    RSS  

Platform  ·  Video  ·  Multimedia  ·  Mobile  ·  Other  ||  About us & Privacy policy  ·  Twitter  ·  Facebook


19

Copyright © Byrds Research & Publishing, Ltd., 1997–2011. All rights reserved.