We proceed with FarCry 1.2 tests. In the beginning of the review, by tradition, we list articles concerning our heroes:
Theoretical materials and reviews of video cards, which concern functional properties of the GPU
Let's briefly introduce our contenders:
As you probably remember, both cards have 256 MB of GDDR3 memory, 1.6ns access time, memory frequencies being at 550 (1100) MHz for NV40, and at 575 (1150) MHz for R420. Chip clocks: NV40 - 400 and 450 MHz, R420 - 525 MHz. Both GPUs contain 16 pixel and 6 vertex pipelines. NV40 also supports shaders 3.0, and R420 - a proprietary technology for compressing normal maps - 3Dc. That's it in brief.
Installation and drivers
VSync is disabled.
We used the following test applications:
Note that the patch includes premade demo-scripts, which can be started either via console or from the command prompt. Here is a sample BAT-file:
Disclaimer: Most part of this review has been written before the final patch was released. At the end of the article you can find additional information about the changes between the patch version that we initially got hold of and the official version.
The first part of the article provoked a lot of responses, both enthusiastic and skeptical. The former expected to see FarCry running in the new rendering mode (so called sm30path) on all video cards supporting pixel shaders v2. The latter expressed utter impossibility of this feature. Now that I've been experimenting for a week with different FarCry 1.2 modifications on video cards supporting only the second version of shaders, I can say that both groups were partly correct.
I shall not hold things up, I confirm that our lab has a version of FarCry 1.2, which runs in the so called "sm30path" mode (more exactly in its pixel shaders part) on video cards supporting pixel shaders 2.x and providing image quality identical to that on NV40. We shall refer to these shaders as "long".
It must be noted that geometry instancing in our tests has almost no impact on the final result. It does not mean that this function is useless, but it will display itself to maximum when the bottleneck of the rendering procedure is the CPU, which cannot cope with setting a great number of extra parameters for scene rendering. This does not happen with standard FarCry tests on NV40, partly because FarCry was initially optimized for the existing video cards. In future the situation may change. For example, using video cards with geometry instancing support, developers will be able to model real flora geometry on large open distances without imposters.
When preparing the first part of the article we found out, that theoretically in the sm30path mode you can limit the usage of shaders to the second version instead of using Shaders 3.0. That's why we attempted to make FarCry use the rendering mode with "long" shaders of the third version on video cards supporting only Shaders 2.x.
Here is the resulting step-by-step description of our work:
The first stage is relatively easy to do: using 3DAnalize (we already mentioned this utility in the previous part) we substituted the VendorID/DeviceID combination, which identifies the manufacturer and the specific product, and we also set the Shaders 3.0 support flags. As a result, FarCry recognizes any video card as NV40 and allows to render multiple light sources at one pass with the help of "long" shaders.
We managed to succeed in the second stage (substitution of Shaders 3.0 for their Shaders 2.0 counterparts) partly due to the FarCry engine as our aid. The fact is, FarCry cannot recompile its shaders, written in HLSL/Cg, and so it uses the external compiler from Microsoft: fxc.exe. So, to achieve our goals we replaced this compiler with another program performing all operations required from the original compiler plus introducing necessary corrections into shaders and collecting shaders statistics.
As we were pressed for time to process and check the results, the data below is based on the four levels used in our demos, which were provided by NVIDIA together with FarCry patch 1.2, and on the demo results: Research, Regulator, Training and Volcano.
At first I'll quote the general statistics. To render the four above-mentioned levels, all in all FarCry prepares about 3500 pixel shaders and 3500 vertex shaders of all available versions: from 1.1 to 3.0. Major part of these shaders is constituted by Shaders 3.0 permutations for multiple various values of the initial parameters. All vertex shaders 3.0 are easily compiled into Shaders 2.0. When we compile pixel shaders 3.0 as shaders 2.0 we get: 1700 shaders 2.0 and 1900 shaders 2.x (we used the ps_2_b profile for ATI R420 chips, but all these shaders can be certainly compiled with the ps_2_a profile). The largest of all compiled shaders consists of 123 instructions: 9 textural and 114 arithmetic. You can see that more than a half of the shaders have requirements beyond the base version 2.0. But compiling shaders does not mean using them in the real game situation. For example, NVIDIA demos use one Shader 2.x each on the levels Regulator and Training, and no shaders 2.x on the levels Research and Volcano. For all that, the number of shaders of the base version 2.0 used in these demos varies around 70.
So, we gradually approached the third stage of our analysis: comparing whether the resulting images are identical. This stage proved the most difficult, because after the first two stages were completed, it turned out that the screenshots rendered with "long" shaders often don't match the ones obtained with Shaders 3.0. But you cannot say that they are wrong - just a little different.
The reason was that FarCry developers used one of the shader features of the third version - 10 interpolation registers storing floating-point data, which are used to transfer vertex shader results to a pixel shader. Shaders 1.x-2.x specifications stipulate for only eight such registers of the standard precision. The two remaining registers are used to transfer colors, their precision being not less than 8bit per color channel (standard 32bit color). Another more important moment: when passed from a vertex shader to a pixel one, values of these "color" registers are cropped in the [0..1] range. That is any value written in a vertex shader, which is more than 1 or less than 0, will be inevitably distorted when passed to a pixel shader. It's important to note that this requirement is dictated by the DirectX specification, so even if a video card with pixel shaders 2.x support can interpolate more than 8 registers with high precision, it must emulate the above-mentioned behaviour.
I have already mentioned above that the developers used a shader feature (10 interpolation registers), but in reality they use only 9 registers out of 10 to render lighting from four sources at one pass (four light sources - maximum quantity currently supported by the FarCry engine). It looks more like an artificial limitation, if they wished to support shaders 2.0, FarCry developers could have limited one pass rendering to three light sources for this class of video cards. But as we had neither opportunity nor desire to modify the code of the game, we had to look for a workaround.
There are at least three workarounds, below is their simplified description.
Using a combo of Methods (1) and (3) we can get "long" shaders of the second version, which will be 100% identical to the original shaders used in FarCry to demonstrate the NV40 features.
So, we have come to the part of the review, which is hardly skipped by the majority of readers.
Let's see what we get when using "long" shaders in ATI Radeon X800 XT. For our tests we used the same settings as for researching different rendering modes in GeForce 6800 Ultra.
When using the "long" shaders, Radeon X800 XT obviously features the same performance gain as GeForce 6800 Ultra: again the Research demo demonstrates more than 20% gain, decent gain in the Volcano demo, and little gain in the Training and Regulator demos.
Let's compare the results of both video cards in one diagram.
Well, quite natural results: total leadership of NV40 in the demos prepared by NVIDIA is gone; in some parts Radeon shoots ahead again.
Modifications in the official release of Patch 1.2
On July, 21st the official version of Patch 1.2 has been released at last. I'm content with the fact that all the conclusions made in the first part of the article proved to be true, and that I will not have to complain much about the sheer marketing acts of Far Cry developers. I'll return to that later in the conclusion.
So, ATI fans may rejoice: FarCry 1.2 supports the special rendering route for shaders 2.0b as well as the textures compressed with 3Dc.
The special rendering method for ATI R420 is activated by analogy with NVIDIA using two keys: either "r_sm2bPath", or "r_noPs2b". To do this you should type in the console "\r_sm2bPath 1", or in the command prompt set the parameter ""r_sm2bPath 1"" (the brackets are obligatory). When r_sm2bPath is enabled, by analogy with NV40, the future R420 drivers will activate geometry instancing. FarCry developers most likely followed the path of least effort and limited the number of light sources to three at one pass for a method using shaders 2.x. That's what I suggested above :-). In this case the maximum number of interpolation registers used equals to their maximum available quantity - 8 items.
We have promptly benchmarked the system with the new patch. Unfortunately the testbed has been already changed in comparison to the previous tests. In particular it didn't have ServicePack2 installed (it is not necessary for enabling sm2bPath).
So we cannot compare results with a frame-to-frame precision. Nevertheless the results are obvious:
By the time this conclusion was written, with the official release of Patch 1.2 FarCry developers confirmed the above material with their own hands.
Thus, let's summarize the main theses:
So, developers have a lot to do to optimize rendering for a huge number of the existing video cards, which support only pixel Shaders 2.0. On the other hand, developers are heading towards HDR (high dynamic range) rendering, which will lead to further lengthening of pixel shaders. We shall probably learn what has a higher priority for Crytek with the release of the next patch.
Alexey Barkovoy (firstname.lastname@example.org)
July 25, 2004
Write a comment below. No registration needed!