The
performance tests carried out some time ago for Intel's and AMD's desktop
processors in video encoding, where the XviD showed very low results on
the Pentium 4, left an unpleasant impression. Well, different codes can
be executed differently on different architectures, but the results can't
differ twice! The Athlon XP 3200+ and Pentium 4 3.2 GHz have a comparable
potential, and if one turns out to be twice faster than the other it looks
abnormal. Such situation is impossible if software is correctly written.
There must be something wrong exactly with software.
We decided to study this problem, and the first suggestion that we had
to check was that there was a problem with the source file intended for
encoding. Video compression is a complicated process, and a combination
of a certain source video series and a certain processor architecture can
result in such failure. It wasn't difficult to check it: we took our standard
test packet and simply replaced TEST.MPG with TEST.VOB in the script
for the VirtualDubMod, and then compressed the other source file with the
same XviD version with the codec parameters being the same:
As you can see, the situation doesn't change much: with the standard
MPG file (MPEG2) the Pentium 4 3.2 GHz was twice slower than the Athlon
XP 3200+, now their scores differ by 2.1 times. But it's not that important.
Well, the problem is not in the source file.
The next suggestion was that there was a problem with the given XviD
version. It was also easy to verify: we replaced the codec version in the
same standard test technique, set the same parameters (where it was possible)
and took measurements. In the tests we used two builds - a previous one
from the same builder (Koepi build 24.06.2003) and the latest one from
Nic - Nic's build 16.07.2003. The latter was compiled with the Intel C
Compiler 7.1 which should have a positive effect on performance of Intel's
processors. By the way, we couldn't find out whether Koepi and Nic were
developers at XviD. Koepi
said that he did some work for XviD, but his name wasn't mentioned in the
list of developers. The situation with Nic
is even more vague. But since they are key suppliers of binary files (especially
Koepi, as his builds can often be found in codec packs), we decided that
they had the right to represent XviD in our review. Let's take a look at
the scores.
Nic's build is more loyal to the Pentium 4 indeed. On the contrary,
Koepi's builds are much tougher to the Intel CPU: while the Athlon XP increases
its speed on the new build, the Pentium 4 remains on the same level. But
anyway, we have an impression that there's something wrong with Koepi's
builds, and the problem concerns not only the code.
That is why we decided to look into one more situation: what if the
optimization parameters are incorrectly selected? You might remember that
the Microsoft Windows Media Encoder tried to detect the SSE support by
checking the CPU maker (it considered that if it wasn't Intel, SSE couldn't
be supported). Thankfully, one can manually tick off SIMD instructions
in the XviD settings except the automatic detection. It supports MMX, 3DNow!,
3DNow! 2 (the developer probably means Extended 3DNow! of K7 based processors),
SSE, Integer SSE (?) and SSE2. Since the Athlon XP entirely supports SSE
and 3DNow! 2 we combined options of the 3DNow! and 3DNow! 2 under the name
of 3DNow!, as well as Integer SSE and SSE under the name of SSE. First
of all, let's test the latest build for XviD from Koepi (1.0 beta 2, 05.12.2003)
with different manually selected optimizations for the Athlon XP 3200+.
It's clear that the MMX optimization makes the greatest effect, it's
the determining factor for the Athlon XP. The 3DNow! optimization has a
much weaker effect compared to the SSE, but even both of them yield to
the MMX. What kind of optimization is it if the outdated MMX beats 3DNow!
+ SSE? It's strange... It proves that there is a problem with the optimized
code. But it's only the beginning of the problem.
The MMX makes an effect, but only when all other optimizations are
disabled! If we enable the MMX, for example, together with the SSE
the performance of the Pentium 4 based system will considerably fall down!
By 1.5 times! We can also see that the codec doesn't have any noticeable
optimization for the SSE2: compare the columns named "No Optimization"
and "SSE2", as well as "MMX" and "MMX+SSE2", "SSE" and "SSE+SSE2". Besides,
the optimization for the SSE is good, it's just weaker than the MMX on
the Athlon XP, and it looks more pronounced on the Pentium 4 compared to
the rest. At least, on this processor the SSE is the only kind of SIMD
which brings some effect. Well, performance of the Pentium 4 with the
XviD 1.0 beta 2 (Koepi) codec is artificially decreased in case of the
automatically configured parameters. The reduction reaches 1.6 times!
Let me show you one example: even if we suppose that performance of
the Pentium 4 grows in proportion to its clock speed, it's necessary to
raise it up to 5 GHz to reach the results obtained with the correctly selected
optimization parameters! Now let's take a look at the same data from a
different standpoint.
So, the subject in question is a SIMD optimization in the XviD 1.0 beta
2 Koepi build. The most effect for the Athlon XP is achieved with the oldest
SIMD set - MMX (which at the same time kills the Pentium 4), the SSE is
well realized (the gain is almost double on the Athlon XP compared to the
situation when no optimizations are used, and it makes 1.9 times for the
Pentium 4), the SSE2 is also announced, but we noticed no traces. The 3DNow!
helps the Athlon XP catch up with the Pentium 4 without any optimizations!
Can it be just a peculiarity of the given version of the XviD? To find
it out we again resorted to Koepi build 24.06.2003 and Nic's build 16.07.2003.
Let's see if locking of the MMX optimization in other codec versions produces
the same magical effect on the Pentium 4 based systems.
Well, such a crippled version is probably the exception. Moreover, the
MMX optimization that so awfully affects the Pentium 4 MMX can be considered
an exclusive feature of the XviD 1.0 beta 2 from Koepi. However, the two
bottom lines (Koepi build 24.06.2003) indicate that such trend started
half a year ago. At the same time, Nic's build 16.07.2003 shows normal
results: if MMX is disabled - performance worsens, if enabled - performance
betters.
Summary
-
An additional instruction set do not guarantee any performance gain. With
skillful hands :) one and the same optimization method can become an excellent
tool for achieving an inverse effect. Besides, it can happen only on certain
processor architectures (this sproblem is more difficult to trace).
-
The automatic optimization parameters set by the XviD 1.0 beta 2 (Koepi
build) make a killing effect on the Pentium 4 performance. I do recommend
all users who have systems based on this CPU set optimization parameters
manually and disable MMX.
-
At the same time, the latest XviD build from Koepi
delivers the best compression speed even on Pentium 4 based systems, but
only with the manually selected parameters!
-
The terrible defeat of the Pentium 4 in XviD compression revealed in the
recent
tests is actually not that awful and makes 32% instead of 100%. It's
not that little, but not fatal either.
-
The fight of megahertz and clumsy hands of programmers (or builders) always
ends up with programmer's victory. End-users end up losing because they
can't afford spending a couple of days for searching for bugs in multiple
beta versions.
Write a comment below. No registration needed!