iXBT Labs - Problems of Testing 3D Performance with FRAPS, Part 2

Next page >>

Conclusions

Let's draw a bottom line of another FRAPS research. Fourfold testing on each computer without processing results is too rough, and its measurement error may reach 10-15%, which is absolutely unacceptable! Even though the frame rate in many games is relatively stable and the graphs show similar peaks, the difference in the average FPS values is sometimes too big. It makes no sense to compare performance of graphics cards using such tests, as performance differences between some cards may sometimes be less than 10%. Even if you carry out 4-10 tests, you are careful with the test procedure, and you process results by discarding anomalous readings, you cannot be certain about the resulting figures.

To say nothing of the huge difference between the highest and the lowest FPS results. Games generate slightly different situations each time, you cannot get identical 3D scenes. For example, the frame rate in many racing games is affected by many things: changing weather, time of the day, traffic density (if it exists), changing behavior of computer players, skills of a player. Each time a tester is presented with a different situation, so the frame rate differs much as well. In this case the only solution is to run multiple tests and discard anomalous results.

What concerns modern first or third person shooters, the situation is even worse. It's even more difficult to repeat the original gaming situation there, each attempt to pass through a level brings something new. Enemies in modern games possess some AI, use complex behavior scripts, acting differently each time. So if testers use the above mentioned method of loading a saved game and going through a certain part of the level to measure performance, we can say for sure that it's impossible to do it several times in the same way, if the test is rather long.

Another solution is to turn around without going anywhere, or just read the instant FPS value right after loading a saved game. In this case the test will make some sense. But it will be much worse than sterling tests in games with convenient built-in benchmarks. However it's still better than nothing.

So here are the main conclusions that agree with the bottom line of the previous article. It's possible to benchmark graphics cards' performance in 3D games with FRAPS, but with the following reservations:

This method works with games that don't have built-in tools to benchmark performance, but which allow to record and play demos. In this case the measurement error may be close to that in games with built-in benchmarks.
FRAPS can also be used with games that can display identical animations using the game engine. But these tests won't reflect gaming performance, they will just show performance of rendering script scenes. That is it works well with interactive games, consisting mostly of such scenes.
If games don't have such features, they must be thoroughly analyzed whether they can be used for benchmarking. Just go through a level a couple dozens of times to find out the maximum measurement error. If it falls within 3%, such a game can be used for benchmarking with FRAPS.
But even in all above-mentioned cases, FRAPS results should be dealt with caution. In order to obtain reliable results, a test should be repeated at least 4-5 times. Results should be processed afterwards - you should thoroughly analyze each pass, find and discard anomalous values, and average the rest. Only in this case you may get reliable results.

All the above-said makes FRAPS tests very time consuming. Add the lack of automation, except for limiting the test to a certain length, so these tests take even more time. Plus a significant error even in case of multiple repetitions of the tests, so credibility of this method of measuring 3D performance is too low for i3DSpeed.

Tests with FRAPS may contain too many tester's mistakes in games that does not allow to play demos. The spread of results may be too big, 9-10% is an unacceptable value. Of course, we can think of ways to soften the effect of a human factor and differences in game scenes. For example, in a racing game we can choose a circle (or even straight) track without traffic or contenders. In FPS games we can find a place without enemies and go in a straight line for a short time. But will such results reflect the real gaming? Such synthetic tests are no better than 3DMark.

We address game developers again to heed the word of people interested in 3D graphics and high-tech games. If modern games had convenient tools to benchmark performance, and could be automated at that, testers would have had much fewer problems. And this article wouldn't have been necessary.

What concerns our tests, we shall stick to applications with built-in benchmarks. The methods described in this article can be used with caution in single cases, repeating the tests many times and thoroughly verifying all results for validity. If you use such tests, you must publish a detailed test procedure.

Write a comment below. No registration needed!

<< Previous page