Actual HW T&L perfomance of NVIDIA GeForce/GeForce2 chips

Introduction

Much had been spoken and written since the announcement of GeForce256 chip by NVIDIA about such technologies, that essentially improve videocard rendering quality, as Hardware Transform and Lighting (HW T&L), Cube Environment Mapping (CEM), DOT3 Product Bumpmapping (DOT3).

And even though ATI RADEON card support all these functions, yet they have not been actually used in games, but the first such occurances begin to happen. In some demo programs the rendering speed of GeForce chips isn't good, therefore I have decided to find out the behavior of almost all GeForce cards gamma in games of the nearest future, i.e. with HW T&L, CEM, DOT3 support, but excluding DirectX 8 features like vertex and pixel shaders. These can be games aimed at DirectX 7 or OpenGL usage with extensions from videocard manufacturers, in particular from NVIDIA. I at once shall tell you that in this article there's no info about such chipsets as ATI RADEON and S3 Savage2000, that have HW T&L as well. It is so because cards on these GPUs do not work with a number of test tools (for example X-Isle, Gothic Chapel) used in the present article.

I'm generally interested HW T&L speed, as one of the most important characteristics for future games with average quantity of polygons and with new CEM, DOT3 and other effects. Having defined this metric, it is possible to approximately assume geometrical scene complexity of future games.

At once I want to tell, that for the moment of writing of this article I consider 30000-55000 polygons on the scene as the average quantity of polygons, low - everything that is lower than this threshold (almost all nowadays games), and high - 60000-90000 polygons per frame. It is a rather subjective approach, but it will be easier to express geometrical scene complexity. Games with lots of triangles (100000 and above) should appear later, but not earlier than the end of the year.

I shall not pay much attention to pixel and vertex shaders, as main API for shaders is DX8, and there are few games/demos indicating these features (and those existing usually use a reference rasterizer, i.e. calculate a frame by software). The major minus of GF chips is that they do not support pixel shaders in complete version 1.0 and this version is considered as minimum for DX8. So with release of games using DX8 shaders, GF (as well as all other today chips, including competing ATi Radeon) probably will be off the 3D industry. But still there's a year until this can happen or maybe some 8-9 months. And then X-Box will be released, new videocards from NVIDIA and its competitors on chips of following generations and only then DX8 should get wide and deserved distribution.

If there will be a possibility to test perfomance of DX8 applications, I shall try to renew the article. In due course testing of videocards on the next NVIDIA chips (NV20) in the same conditions is also possible, but it will be only after the release of these chips on real videocards.

Some theory

The theory will not be long, as actual speed almost always differs from theoretical to the worse and each case should be observed separately. The thing is not only in some chip perfomance uprating by their manufacturers (though it probably happens), but in impossibility of creation of acceptable "perfect" conditions to get the maximum perfomance in the actual applications. In some synthetic tests it is possible to get close to the peak speed of the chip, but this perfomance will not have anything in common with actual game situations. Therefore I have also tasked myself to find out the actual perfomance, using last developments in the field of game engines and to assume probable complexity of geometry in the future games of this year. It's not wise to measure T&L speed on modern games as the polygon complexity there is in the best case about 20 thousand polygons in the latest games, and the average is 4-8 thousand triangles per frame. Therefore it was necessary to use some benchmarks and demo programs.

Knowing theoretical speed of the transformation and lighting block of all NVIDIA chips (see table), it is possible to assume that the average perfomance of GeForce chip in comparison with newer products (GF2GTS, GF2Pro, GF2Ultra) makes about half from their perfomance, depending on the chip. So the actual speed of GeForce256 should be approximately twice less than for the best representatives of newer generation.

Characteristics	GeForce256 SDR	GeForce256 DDR	GeForce2 MX	GeForce2 GTS	GeForce2 Ultra
Resulting kernel/memory frequency	120/166	120/300	175/166	200/333	250/460
T&L block peak speed, Mpolygon/s	15	15	20	25	31
Memory bandwidth, Gb/s	2,7	4,8	2,7	5,3	7,4
Number of texture pipelines/units	4/1	4/1	2/2	4/2	4/2
Peak fillrate monotexturing/ multitexturing Mpix/s	480/240	480/240	350/350	800/800	1000/1000

As the support of HW T&L, CEM, DOT3, vertex and pixel shaders (partially in OpenGL extensions) has appeared on GeForce for the first time, we have the right to expect not too good speed of their usage. But for games of the nearest future it can be quite enough, as the high-detailed geometry is only planned. And users now have many videocards that do not support new features and it slows any innovations in games. This year should slightly advance us to the best geometrical detailing of games and wider application of such effects as DOT3, CEM. The first occurances have already begun to happen last year: Sacrifice, Giants and other. Almost all games under OpenGL use HW T&L, and the majority of games under DX7, released recently, also know how to use HW T&L. But, I shall repeat once again that low game world detalisation is used till now, excluding some very rare exceptions and there's little benefit of using HW T&L.

I'm sceptic about necessity of high-detailed graphics in network 3D shooters, like Quake III and Unreal Tournament and I do not consider > = 60 FPS vitally necessary for all other games, but in network games the high speed is really necessary for the serious player. And for the majority of 3D arcade, non-network FPS and TPS, RPG, strategies, autosimulators, aviasimulators about 40 average FPS with minimum above 25 is quite enough. This speed will suit the majority of people, excluding some perfomance-mad :) or rich gamers. From this position I shall also try to consider operation of NVIDIA's videocards of the last and today generation in the future games of 2001.

What can videocards on GeForce chips lack in the near future?

Video memory bandwidth
The bandwidth should suffice for single-player games to videocards of "lower level" (GeForce256 SDR, GeForce2 MX) under condition of average geometrical complexity and texture compression in games of the nearest future in resolutions of 800X600X16/32, 1024X768X16 and in rare cases of 1024X768X32. With application of DDR memory with 128-bit bus (i.e. not a card on GF2MX by Creative) the limit in resolutions is lifted even above. An example of such card is GeForce256 DDR. Resolution of 1024X768X32 will be accessible more often.
For average level videocards under the same conditions bandwidth will suffice for 1024X768X16/32, 1280X1024X16 and sometimes 1280X1024X32. Well and to the high level (Pro, Ultra) stable 1280X1024X32 and 1600X1200X16 will be added. Maybe even 32-bit variant of the last resolution will suit someone.
On the other hand, chips of GeForce serie have never got a chance to show all fillrate capabilities.
Fillrate
The fillrate for chips of all GeForce series was always higher, than the video memory bandwidth necessary for it and therefore fillrates for the GeForce chip256 should suffice for > =35-40 FPS, for example, at a nominal of 240 megapixels per second with multitexturing and 480 without it. Once again I shall tell that the actual fillrate for GeForce videocards is always limited to memory bandwidth and that is confirmed frequently in this article.
Speed of the built-in geometrical coprocessor
It is the most composite and interesting part. During a large part of the next year there should be games with average geometrical detailing and some new effects. And I consider that they will be oriented on first generation GeForce cards and GeForce2 MX that is very close to them in speed, as to the most widespread for the users. Therefore I can assume that in games of this year GeForce256 and GeForce2 MX will show about 30-40 FPS at standard installations with the most widespread screen resolutions. Certainly, there will be graphics settings for which more productive systems like GeForce2 GTS will be necessary. Probably, the manufacturers will not forget those unfortunate :) whose videocards lack HW T&L and therefore these games will be accessible to them, certainly with some limitations.
Now I shall get back to our GeForces. Theoretically, the maximum speed of HW T&L Block for GeForce256 is equal to 15 millions triangles per second that is not so bad. The perfomance of HW T&L for GeForce2 makes 25-31 million triangles per second for different chip frequencies. But in reality such speed can be achieved only having reduced as much as possible the requirements to memory bandwidth, fillrate etc. that is very seldom thing for game conditions, and consequently, we shall consider that peak speed for all GeForce chips is sufficient for average polygon complexity.

Major factor limiting speed of GeForce (and other chips too) under condition of sufficient CPU speed in the future games will be a ratio of HW T&L and actual fillrate capabilities, and therefore I shall devote tests to the definition of actual HW T&L Block speed for GeForce cards, with allowance for adverse conditions.

Actual speed

I have selected 800x600 and 1024X768, 1280X1024 resolutions for testing as they are frequently used in different games. Color depth was both 16 and 32 bits, where it was possible.

Following demos and benchmarks were selected for testing:

X-Isle (OpenGL) from Crytek Studios - main benchmark of the article
Screenshots (why there are several of them you'll understand later):

There are no big objects, only landscape

T-Rexes with simple texturing and with DOT3 relief overlay

T-Rexes with CEM reflection and refraction effects, accordingly

Brief description: one of the most advanced demo programs using HW T&L, CEM, DOT3 with a built-in benchmark. An X-Isle island with diverse relief occupied by butterflies and four T-Rexes (with simple texturing, relief overlay, transparent with refraction effect, opaque with reflection). The test is close enough to actual games, therefore it is accepted as a main benchmark in the given browse. Besides I want to note that there should be some games released on CryENGINE engine.

Distinctive features: very composite texturing (high-resolution textures, detailed texturing, lighting maps for all models and dynamic ones for moving, using shaders with the help of OpenGL extensions from NVIDIA, DOT3 bumpmapping), geometry is slightly above the average complexity level (about 58000-60000 polygons per frame), reflection effects on the water, CEM reflection and refraction.

Application: with the help of a built-in benchmark it is possible to find out average, minimum and maximum FPS and also instantaneous for each second of the test. I also wrote approximate values of polygon number per frame in every 10 seconds of running the benchmark to count instantaneous and average number of polygons per frame and, accordingly, the actual perfomance of HW T&L Block in different conditions (separately with DOT3 bumpmapping, CEM).

My patch revealing some settings hidden by developers can be downloaded from here. And while you're doing it, read further about new features it brings ;).
TreeMark (OpenGL) from NVIDIA
Screenshot:

Brief description: a high-quality tree, nothing more to say. One of the few T&L Benchmarks allowing user to set his own testing conditions. Can't be a real game test, but it gives some understanding.

Distinctive features: a very simple texturing and adjustable geometry from very light to very composite, random number of hardware calculated light sources, certainly up to 8.

Application: the benchmark was run with three different settings variants (number of polygons and light sources) and perfomance of T&L Block was calculated on the basis of the obtained FPS.
Alien Technology demo - Gothic Chapel (Direct3D) from FUN labs
Screenshot:

Brief description: the technological demo of a rather qualitative Gothic chapel with interesting light effects, there is even multipolygonal water (it's a must :)). HW T&L, CEM, S3TC are used.

Distinctive features: large-sized compressed textures, average geometrical complexity (53000 polygons per scene), refraction and reflection effects, dynamic light sources.

Application: To calculate the instantaneous FPS Intel Graphics Performance Toolkit was used. At known polygon scene complexity in Gothic Chapel and average FPS, calculated on the basis of the GPT for each videocard, the actual speed of chip's geometrical part was calculated.
3DMark2000 (Direct3D) from MadOnion.com
Screenshots of the first and second game tests:

Brief description: one of the most popular benchmarks for Direct3D, one of the first that appeared with usable HW T&L support.

The game tests show that:
1) Landscape with helicopters, tanks, other engineerings and plenty of trees. Helicopters are at war among themselves - a typical example of an arcade or even a simulator
2) A small piece of a town (port town?) with people, ships, barrels, lanterns etc.. An example of some RPG.
Distinctive features: light enough texturing (3-4 MBytes of textures per scene), low/average (7000-53000 polygons per scene) geometrical complexity, 4-8 light sources.

Application: only the game tests from this benchmark were used, as synthetic ones do not display the actual situation. The benchmark produces average FPS value for known average number of polygons and light sources per scene. It is simple to calculate perfomance after all this :).

Following configurations were used to calculate speed metrics:

Main system based on Pentium III

CPU - Intel Pentium III 1000EB
Motherboard - Chaintech 6OJV (i815)
RAM - 256 MBytes PC133 (2xDIMM Transcend)
HDD - IBM DPTA 20 GBytes UltraDMA/66
Operating system - Windows 98SE

Following videocards were tested with this configuration:

GeForce256 32 MBytes SDR - ELSA ErazorX
GeForce256 32 MBytes DDR - Creative 3D Blaster Annihilator Pro
GeForce2 MX 32 MBytes SDR - NVIDIA GeForce2 MX reference card
GeForce2 GTS 32 MBytes DDR - AOpen PA256 Deluxe
GeForce2 Ultra 64 MBytes DDR - Leadtek WinFast GeForce2 Ultra

The additional system based on Intel Celeron

Videocard - AXLE (almost noname) GeForce with 32 MBytes SDRAM of memory
CPU - Intel Celeron 300A, working on clock rate 450 MHz
Motherboard - Chaintech 6ZIA (i440ZX)
RAM - 128 MBytes PC100 (2xDIMM Micron)
HDD - Quantum FB Plus KA 9 GBytes
Operating system - Windows 98SE

Now let's do some business at last.

I have moved X-Isle, as a main benchmark to the end of the article, I think no offence :). Let's begin from an almost synthetic benchmark. "Almost", because it's still closer to reality than synthetic tests like from 3DMark2000, for example.

TreeMark

Settings:

I have selected three different settings for testing:

"treemark -tltest"
Default settings: 4 light sources, 36 thousand polygons per scene
"treemark -tltest -depth6 -lights4"
Settings: 4 light sources, 61 thousand polygons per scene
"treemark -tltest -depth7 -lights8"
Settings: 8 light sources, 108 thousand polygons per scene

The testing was conducted only in 16-bit color due to complete (and strange) refusal of TreeMark to be tested in TrueColor on the main test system.

Results:

These conditions are quite acceptable for all chips the serie - average number of polygons and rather small number of light sources. In all cases framerate exceeds 35-40 FPS, and only GeForce256 videocard with SDR memory can't handle the resolution of 1280X1024. GeForce2 MX still benefits over GF256 DDR. But don't forget some artificiality of the test don't trust it completely ;).

That's more complex - more than 60000 polygons in a frame, only cards of the average level and above have passed this test without noticeable efforts. The "national" card on the GF2MX chip appreciablly overtakes GeForce256 with DDR memory in this test, that speaks about the greater perfomance of its T&L Block. But still it could pass 35 FPS barrier only in modes lower than 1280X1024. And the cards of the previous generation can not handle such geometrical complexity anymore - they hardly get close to 35 FPS in lowest resolution tested. "Complete" versions of GeForce2 do not show problems.

Well we see that GeForce's T&L engine has no power for such complex conditions. The number of light sources is twice more, and it seriously influences perfomance, polygons are more than 100000. In this mode the speed of all cards is obviously unsufficient. And the limit of actual maximum perfomance of GeForce2 is situated somewhere here - less than 100000 triangles per frame with 6-8 hardware light sources.

We take 1024X768 as main resolution as the most acceptable, both widespread and speed-sufficient in many cases. We calculate actual HW T&L speed using this benchmark:

Setting 1

Videocard	Average FPS	Polygons per scene	Polygons per second
GeForce256 SDR	44,7	36000	1610
GeForce256 DDR	51,4	36000	1850
GeForce2 MX	54,9	36000	1980
GeForce2 GTS	76,8	36000	2760
GeForce2 Ultra	99,5	36000	3580

Setting 2

Videocard	Average FPS	Polygons per scene	Polygons per second
GeForce256 SDR	30,8	61000	1880
GeForce256 DDR	33,6	61000	2050
GeForce2 MX	40,1	61000	2450
GeForce2 GTS	52,3	61000	3190
GeForce2 Ultra	66,7	61000	4070

Setting 3

Videocard	Average FPS	Polygons per scene	Polygons per second
GeForce256 SDR	11,4	108000	1230
GeForce256 DDR	11,7	108000	1260
GeForce2 MX	16,0	108000	1730
GeForce2 GTS	19,0	108000	2050
GeForce2 Ultra	24,0	108000	2590

Explanations:

You can see that adding 4 light sources and globally increasing scene complexity in the third configuration brings appreciable speed fall. The rather low value of T&L-ed polygons per second in the first case can be explained as when the number of polygons is lower than average, the fillrate and memory bandwidth begin to play the major role and the scene complexity influence is lesser.

The doubling of number of light sources brings 1.5X speed decrease. In the third configuration the fillrate almost does not limit the speed and, as a result, GF2 MX considerably benefits over both representatives of the first generation of chips, and their speeds differ from each other unsignificantly. As well, the speeds of "elder" cards are proportional to clock rate of their GPUs.

Alien Technology demo - Gothic Chapel

This demo program does not provide adjustments, therefore it was used "as is". The engine from FUN labs does not give possibility to use 16-bit color, therefore we are limited to 32-bit.

This demo does not have a built-in benchmark (for average FPS) therefore it was necessary to use the Intel Graphics Performance Toolkit utility. With the help of it the instantaneous FPS was measured during one demo cycle, and then the average framerate was calculated. Thus it is necessary to take into account, that FPS calculation with the help of Intel GPT results in lowering the results for some 5-10%. The measurements were conducted in the one most popular resolution - 1024X768X32.

Results:

The darkest column in the diagram displays minimum framerate, the lightest - maximum, and that in the middle - average FPS.

It seems that 3D engine from FUN labs is unsufficiently optimized, as results shown with its help are surprisingly low. More or less sufficient speed is given only with GeForce2 Ultra, the most productive for today! And minimum FPS is very low even in this case. No sense in speaking about other candidates. Most likely, this demo should have shown only practical capabilities of GeForce chips and FUN labs was not going to optimize the engine.

I have also drawn a graph of framerate averaged per each ten seconds of running the demo on the basis of data about FPS.

You can see the slight speed fall at the moment of appoaching the figure of angel, rendered using cubic overlay of environment maps. Almost all videocards show identical tendencies of FPS change, only GeForce2 MX has dropped out of this orderly pattern. Probably, it's due to some novelty of the chip and poor "raw" drivers for it. But, despite all this, the card on the most widespread now and rather inexpensive chip has won over the competitor - the expensive and equipped with much faster memory GeForce256 DDR.

Calculations:

Videocard	Average FPS	Polygons per scene	Polygons per second
GeForce256 SDR	16,8	53000	890
GeForce256 DDR	18,9	53000	1000
GeForce2 MX	21,3	53000	1130
GeForce2 GTS	28,6	53000	1520
GeForce2 Ultra	36,3	53000	1920

Explanations:

Do not forget that data is underestimated for 5-10%.

You can notice the influence of actual fillrate on the example of maximum FPS - GeForce256 with DDR memory and GeForce2 GTS differ approximately by the same value as the memory bandwidths for these videocards. While for GeForce2 MX that showed the best average result in comparison with GeForce256 DDR, the maximum accessible framerate is lower.

The rather low general perfomance can be explained by using of CEM, but effects of refraction and reflection were applied only to a figure of angel, whose number of polygons makes a small part of total. Therefore I say about the poor optimization of 3D-engine of this demo program.

Besides, the number of polygons in the demo almost does not depend on time that together with very different average and maximum FPS speaks that the speed is limited not by HW T&L Block perfomance of tested videocards, but by some other value like fillrate with allowance of bandwidth limiting.

3DMark2000

I have selected of the most complex settings for videocards for testing. Only with them HW T&L Block is actually loaded. High Detail mode was used for both game tests.

Test 1: 53000 polygons, 5 light sources, 2,8 MBytes of textures

Test 2: 37000 polygons, 8 light sources, 3,4 MBytes of textures

Results:

The facts that card on GeForce2 MX lost to every other, except for GeForce256 SDR, and the considerable improvement of speed metrics after lowering test resolutions speaks about that in this test the perfomance is seriously limited to memory bandwidth.

The acceptable speed for "low" cards is reached in low resolution - 800X600, for videocards with DDR memory and 1024x768. GeForce2 Ultra had no problems.

In comparison with 16 bits the perfomance has fallen by 1,5 times in average, that again proves my saying about small influence of HW T&L Block perfomance and the considerable influence of fillrate. The level of gaming resolutions has fallen below 800X600 for the lowest models and only DDR memory helped GeForce256 DDR avoid it. GeForce2 GTS and GeForce2 Ultra show more cheerful results - 800X600 and 1024X768, accordingly.

And now to the second game test.

Interesting that all videocards display close metrics, except for 1280X1024 resolution. Adding that testing on the "alternate" configuration with Celeron 450 processor gave little more than 20 frames per second, and this almost did not depend on resolution, it is possible to say that speed is limited by CPU. The special influence of HW T&L block was not noticed. The speed is sufficient everywhere and always. GeForce2 MX slightly benefits over the GeForce256 DDR competitor at the expense of the greater speed of the hardware geometrical calculations.

Increasing color depth to 32 bits make GeForce2 Ultra metrics reduce very unsignificantly, and for lower cards the speed fall reaches 1.5X. Unsufficiency of local video memory bandwidth also has effect. Here GeForce256 DDR anticipates the main contender - the card on based on GeForce2 MX.

Calculations:

1024X768X16 and 1024X768X32 are accepted as test resolution, as the average and the most acceptable for the given test. The actual speed of HW T&L engine according to 3DMark2000:

Test 1 - 1024X768X16

Videocard	Average FPS	Polygons per scene	Polygons per second
GeForce256 SDR	31,2	53000	1650
GeForce256 DDR	38,1	53000	2020
GeForce2 MX	31,7	53000	1680
GeForce2 GTS	51,1	53000	2710
GeForce2 Ultra	62,6	53000	3320

Test 1 - 1024X768X32

Videocard	Average FPS	Polygons per scene	Polygons per second
GeForce256 SDR	21,4	53000	1130
GeForce256 DDR	26,8	53000	1420
GeForce2 MX	20,6	53000	1090
GeForce2 GTS	33,5	53000	1780
GeForce2 Ultra	47,0	53000	2490

Test 2 - 1024X768X16

Videocard	Average FPS	Polygons per scene	Polygons per second
GeForce256 SDR	42,1	37000	1560
GeForce256 DDR	43,1	37000	1590
GeForce2 MX	44,5	37000	1650
GeForce2 GTS	48,0	37000	1780
GeForce2 Ultra	47,7	37000	1760

Test 2 - 1024X768X32

Videocard	Average FPS	Polygons per scene	Polygons per second
GeForce256 SDR	34,8	37000	1290
GeForce256 DDR	40,2	37000	1490
GeForce2 MX	38,9	37000	1440
GeForce2 GTS	45,4	37000	1680
GeForce2 Ultra	46,8	37000	1730

Explanations:

I think, that the general perfomance lowering in the second test in comparison with first is coupled to increase of light sources. It is also interesting that the smaller number of polygons per scene in the second test does not allow to show full capabilities of hardware T&L.

By results of 3DMark2000 it is possible to say that even at the greatest possible geometrical complexity in these tests the perfomance of HW T&L does not play such an important role. Therefore all GeForce chips and cards based on them, will quite endure the average geometrical detailing. The T&L block will not limit speed even with 8 hardware light sources per scene in average. The thing is that in game tests of this benchmark the light sources do not light all scene triangles at once, but have a restricted operative range as against the majority of synthetic tests.

And the most interesting test at last.

X-Isle

There were no special adjustments for 1.02 version demo (I mean before my patch appeared), except for choice of screen resolution and some other trivialities.

First, the average FPS for all test cycle calculated by the program:

Once more I shall tell about the main limitation by fillrate. But the most interesting thing is that the bandwidth does not influence perfomance so seriously as in the last tests! Or how will you explain close results of GeForce256 SDR and GeForce256 DDR? And the results of "younger" GeForce2 favourably differ from them to better. The thing is in something else. Let's look closer at this later, for now we shall look at results in 16-bit color:

Only GeForce2 Ultra could show normal speed in all tested resolutions. Its younger brother, GeForce2 GTS, could not sustain only 1280X1024. The videocards on GeForce 256 chip have shown awfully low speed, completely poor for the normal game. GeForce2 MX hardly endured 800X600X16. That's what I call a game engine of the new generation!

Nothing to tell here, cards of the lowest level "sank" deeper, and the "national" GeForce2 MX has joined them with results below 30 FPS in average. The "elder" representatives had losses as well - their minimum valid game resolutions were pushed step back down to 800X600 and 1024X768 for GF2 GTS and GF2 Ultra accordingly.

At the first sight these are very low results hardly acceptable for the normal game. But we shall not draw a conclusion yet, and we shall calculate the preliminary results of T&L perfomance:

Calculations (also for the average resolution of 1024x768):

1024X768X16

Videocard	Average FPS	Polygons per scene	Polygons per second
GeForce256 SDR	19,5	58000	1130
GeForce256 DDR	20,5	58000	1190
GeForce2 MX	27,8	58000	1610
GeForce2 GTS	40,0	58000	2320
GeForce2 Ultra	45,1	58000	2620

1024X768X32

Videocard	Average FPS	Polygons per scene	Polygons per second
GeForce256 SDR	15,1	58000	880
GeForce256 DDR	17,0	58000	990
GeForce2 MX	20,6	58000	1200
GeForce2 GTS	32,4	58000	1880
GeForce2 Ultra	39,2	58000	2270

Seem not enough for GeForce256, taking into account data of the last tests where they were always close to GeForce2 MX. Let's try to clear things up.

During testing I was surprised with one interesting fact - at testing systems on GeForce 256 chips, the speed was usually quite decent, but at appearance of T-Rexes with effects of reflection and refraction the speed dropped below 10 FPS at once. Having noticed that the moments with application of cubic texturing in the benchmark occupied almost half of all time I have decided to draw the graph of instantaneous FPS for all cards to reveal pattern fall-offs for GeForce 256 chips.

Voila! At first the speed of GeForce2 MX was even lower than speed of GeForce256 SDR. And only then, after first T-Rex with CEM effect cards based on GeForce256 chips hardly handled those 10 FPS. In traffic interruptions between CEM textured T-Rexes the speed was resetting to normal value.

I can not precisely tell what is a reason of such results, it is hardly a defect of drivers, GF256 exists more than a year and such oversight would have been corrected. Maybe there is something in CryENGINE, but they write in a supporting text file about speed fall-offs on GeForce256 cards, so that it's most likely that chips of GeForce2 line were modified by NVIDIA to increase the CEM speed.

It was necessary to calculate average scene complexity to meter HW T&L perfomance. The benchmark itself didn't give data number of triangles, therefore it was necessary to get it some way. While using Intel GPT, the demo was very slow with many graphic artefacts, and consequently I have decided to meter the average geometrical complexity "manually":

Approximately each 10 seconds I wrote number of polygons in a frame and later calculated the average. These measurements are certainly a little rough, but still there's nothing except them, so they will do. I got 58000 polygons per scene in average.

And here's approximate graph of geometrical scene complexity.

The number of polygons during the test varies in limits 35-90 thousand polygons per frame at their average quantity about 58000.

There were few graphic fall-offs, the complexity of geometry is well distributed throughout all benchmark. Probably, it's due to the good implementation of geometrical Level Of Depth (LOD). The high peak in middle of the graph is well coupled to speed in the previous graph - during decrease of the number of polygons down to 35 thousand speed considerably increases (the high peak for all videocards).

Having the possibility to compare the perfomance of HW T&L with and without CEM separately from each other to receive the actual speed of T&L and its fall-offs during CEM enabling, I have made such table (using the second system based on Celeron CPU in 800X600X16 resolution):

Measurement	KPolygone/Frame	FPS	KPolygone/Frame	Comment
1	80	26	2080
2	70	42	2940	texturing
3	70	37	2590	texturing
4	65	33	2145
5	45	35	1575	DOT3 relief
6	60	30	1800	DOT3 relief
7	50	35	1750
8	50	40	2000
9	60	9	540	CEM refraction
10	60	9	540	CEM refraction
11	60	9	540	CEM refraction
12	35	65	2275
13	80	26	2080
14	60	9	540	CEM refraction
15	50	9	450	CEM refraction
16	55	9	495	CEM refraction
17	50	9	450	CEM refraction
18	50	9	450	CEM refraction
19	60	32	1920
20	40	45	1800
Average	58	26	1448
Without CEM		37	2080
CEM		9	501

Now all is understandable. You can see that the speed on GeForce256 SDR in usual modes was quite normal and made up 30-45 FPS, and in the moments with T-Rexes with CEM reflection and refraction the speed was reduced up to 9 frames per second, that is absolutely unacceptable for any player. From conducted calculation it is visible, that average FPS disregarding CEM effects made up 37 FPS that is even higher than the lowest boundary selected by me. Thus the speed of the HW T&L block makes hardly more than 2000000 polygons per second, and in modes with CEM drops down to 500000 polygons per second, i.e. by 4 times.

To see speed fall-offs and average FPS with allowance for CEM and without it look at this graph:

The legend: instantaneous framerate - red line, average framerate with allowance for CEM - black and without this effect - dark blue.

Hidden X-Isle settings

During testing with the help of X-Isle benchmark I have found some adjustments hidden by developers from users for some unknown reason. Probably, they have put them away until better times that is up to the following version. After some changes in the executable file, I have "opened" these additional settings and has decided to test the increase of perfomance using them.

You can get the patch opening these adjustments hidden by developers in Crytek X-Isle 1.02 directly from here.

But do not forget that these settings are unofficial and also that some of them do not work in general, preventing the test even from starting (they are marked with the exclamation sign), and some simply do not show visible effect (interrogative sign). So I have warned you.

Those settings that work allow to change/disable some graphic features. Additional settings did not vary during usual tests.

This test was conducted in 800X600X16 resolution on the second system based on Celeron CPU.

All visibly inoperable settings display identical results, but those working are rather interesting:

Disabling the detail textures boosts rendering speed by 3-4%, it seems little for such a resource-capacious feature.
Disabling static and dynamic shadows increases average FPS by up to 20%, that indirectly specifies limitation of speed almost uncoupled with the speed of HW T&L.
Disabling fog (glFog), clouds&sun and skybox accelerates demo by 1-2% in each case. It can be an error in measurement, but after change of these adjustments the result on the screen is quite visible.
Disabling T-Rexes (animals) gives the major increase of speed - it increases almost twice! These results again confirm my conclusions about speed the of HW T&L block on GeForce. At a small decrease of number of polygons per scene (these dinosaurs are not too complex objects) speed increases mostly because of low CEM speed on GeForce 256 chips.
The names of geometrical adjustments have confused me:
After enabling Low res textures (why textures?) the sizes of main textures remain the same and the geometrical detailing drops down to 10-25 thousand polygons per frame. The boundaries of geometrical LOD become seriously moved to the camera, much closer than in the standard mode. Under these conditions GF256 SDR displays just twice best results.
After enabling No LOD the polygon complexity become ambiguous and sometimes reaches 120000 polygons, and sometimes is much lower. But the main brake is disabling LOD for geometry in this mode. The speed becomes awfully low - it drops more than twice.

And the last item in graphics is the set of adjustments to support the high, almost maximum speed - it changes the settings that render positive influence on speed to the detriment of quality. By this optimization the average speed on my system has increased by 3,7 times and has made up almost 84 FPS. It is possible to increase FPS more, but this I leave to you for your own experiments :). But I can say that settings for this engine, providing the maximum perfomance for network players, have been found.

Conclusion

On the basis of data about the T&L speed of videocards based on almost all chips of GeForce line I can draw a conclusion about the good perfomance of cards of the average and high level and about the acceptable of cards of the lowest price category for games of 2001.

The summary table of average results

Videocard	TreeMark	3DMark2000	X-Isle	Average	Poly/Frame*	Index **
GF256 SDR	1573	1408	1003	1328	33206	100.0%
GF256 DDR	1721	1630	1088	1480	36993	111.4%
GF2 MX	2050	1464	1404	1639	40985	123.4%
GF2 GTS	2669	1985	2100	2251	56279	169.5%
GF2 Ultra	3414	2326	2445	2728	68210	205.4%

* - the value of average number of polygons per frame was calculated according to 40 FPS in average.

** - perfomance in relation to GeForce256 SDR

At usage of no more than average 40000-50000 number of polygons per frame and 4-5 hardware light sources in the future games of intermediate period the speed of videocards based on GeForce256 and GeForce2 MX GPU will be sufficient for games in the majority of genres. At usage of new effects, such as reflection and refraction with the help of the Cube Environment Mapping, active usage of pixel shaders in OpenGL, the speed for the first GeForce and for the "national" GeForce2 MX can seriously fall and can be acceptable only at 25000-30000 polygons per frame or even lower, depending on the number of applied effects. Some technologies can be applied without fear - for example, DOT3 bumpmapping renders smaller influence on perfomance and allows to use high enough scene complexity.

T&L perfomance of GeForce2 chips theoretically and practically is 1,5-2 times greater, therefore they are capable of processing geometry with the number of polygons simultaneously calculated per frame approximately 1,5-2 times greater. We receive about 75000-90000 polygons at inexact usage of HW T&L possibilities and also with some new effects, and 40000-50000 polygons at their wide application. But at usage of such number of polygons close to 100 thousand, the reduced perfomance of some system parameters can show up, i.e. CPU speed that may simply not be able to prepare data for HW T&L and also little AGP bandwidth through which this geometrical data is transmitted sometimes.

Poor video memory bandwidth seriously limits possibilities of all GeForce chips. Almost in all cases this limitation was well noticable, but especially in 32-bit modes - there the perfomance is limited in the greater degree by effective fillrate rather than by speed of GPU's HW T&L block.

Thus, the actual average speed of HW T&L engine on GeForce256 in modern 3D game engines makes no more than 1,5-2 millions of really textured, lighted and clipped polygons per second. At usage of many light sources and different effects the speed drops almost twice (8 light sources in comparison with 4-5), and in case of CEM - even four times.

The actual T&L speed for GeForce2 reaches 2,5-4 million polygons per second. Major speed fall-offs like CEM were not noticed GeForce256. But the limitation of fillrate, CPU speed and small number of triangles even in the last applications sometimes does not allow HW T&L of GeForce2 to show all its power.

The GeForce2 MX noticably outstands from the line of GeForce2, the cards on its basis are the most inexpensive among all tested, an additional brake for them is the rather low bandwidth of a local video memory, therefore their perfomance is much closer to GeForce256 DDR and GeForce256 SDR than to cards on GeForce2 chips. But in average, the little superiority over older chips is still present.

I will assume that a large part of games of the intermediate period from the beginning of 2001 and to the wide usage of DX8 shaders will be oriented to GeForce256 and GeForce2 MX perfomance as the compulsory minimum.

GeForce2 chips, beginning from GTS, will show themselves better in these conditions and will allow the player to not feel temporary speed fall-offs in gameplay.

Write a comment below. No registration needed!