Probably none of the architectural innovations in the x86 line wasn't
discussed and disputed so much as MMX, 3D Now!, SSE and alike. The MMX,
however, wasn't argued so much: it was only discussed whether these instructions
are necessary at all. It is because all manufacturers at that time were
sailing in one boat simultaneously applying new instructions (though AMD
and Cyrix were in a little delay). Later each company chose its own way
and, of course, started promoting its own SIMD instructions. But between
releases of the K6-2 and the Pentium III Intel's fans said that floating
point SIMD instructions were not of great use, though their opinion changed
with arrival of the SSE: their advantages were acknowledged, contrary to the
3D Now! AMD then integrated the SSE support in their new processors. And
after the SSE the SSE2 became a sort of a life saver. Nevertheless, AMD's
fans stated, for example, that it didn't matter that the K6-2 had such
a weak coprocessor as it would definitely win because of the 3D Now! support.
Frankly speaking it wasn't possible to compare efficiency of instructions
from different manufacturers for a long time as the processors differed
not only in it. Today new AMD processors support everything but for the
SSE2, that is why one can carry out comparison on one platform. For the
SSE2 there is not enough software today that is why we will put it off
and take a look at the benefits of the SIMD instructions by the example
of one important task - MP3 encoding of music.
Tests
I used the GOGO codec. It is a real tester's dream as
it incorporates optimization for MMX, 3D Now!, Enchanced 3D Now!, SSE
and multiprocessor configurations;
all this can be enabled/disabled manually. Here you can get information
on the time compression takes and a speed in x'es (1x=150 KBytes/s). In this
case it is called a ratio of reproduction time to coding time. Along with
a high speed the codec provides high quality and excellent compatibility
with various decoders. That is why I've been using the GOGO 2.39c and WinGOGO
for a long time already and I'm not going to change it.
First of all, I tested with everything being disabled to get a sort
of a base result. After that I used only the MMX. Well, it is considered
that integer instructions do not influence a music compression speed but
I'd like to check it. Besides, a lot of codecs are optimized only for these
instructions. All other results starting from this one were obtained with
the MMX enabled. The third line is the original 3D Now! instructions. The
fourth one includes Enchanced 3D Now!/MMX2 which is an extension of the
base set that came with the first Athlons. The fifth one is the SSE from
Intel. Maybe it is not correct to test Intel's instructions on the AMD
processor but there is no other ways. Besides, it is considered that the
SSE support is realized better in the Athlon XP/Duron processors than in
the Intel ones. I don't know whether this suggestion is true but I proceed
from the assumption that the level is approximately the same. And the last
variant is when everything is enabled.
It was interesting to compare the results not only inside the program
but also using some badly optimized codecs. For this purpose I chose the
Lame 3.89 (the GOGO as you know is based on the Lame). The latter doesn't
have much that can be adjusted: only optimization of speed and quality
and its absence. The quality of files obtained with the GOGO with the default
settings is better than with the quality optimization of the Lame and worse
than without it. Besides, we could disable the psychoacoustics in the GOGO
as well but the results wouldn't be of great interest.
As a tested object I used a collection of pop music which was grabbed
into one WAV file of 751 MBytes with the help of the EAC. Then it was compressed
into MP3 at a bitrate of 128 and 320 Kbit/s in the stereo mode. Each experiment
was carried out three times, and the figures were averaged and approximated
to whole seconds and tenth parts of x'es.
Now a little on the test system. Here I used AMD Athlon XP 1800+, Soltek
SL-75DRV (KT266), 256 MBytes PC2100 DDR SDRAM, and Fujitsu MPG3204AH-E HDD
(20 GBytes, UDMA100). The disc was grabbed with the SCSI CD-RW Plextor PlexWriter
1210TS. The speed was quite good - about 20x at the beginning and 26x at
the end. All this worked under the Windows 2000 Professional.
No come the test results.
128 Kbit/s
Coder |
Mode |
Time, min:s |
Speed, X |
GOGO |
No optimization |
4:00
|
18.6
|
MMX |
3:50
|
19.4
|
MMX+3D Now! |
2:46
|
26.9
|
MMX+Enchanced 3D Now!/MMX2 |
2:36
|
28.7
|
MMX+SSE |
2:40
|
27.9
|
Full Optimization |
2:36
|
28.7
|
Lame |
Speed |
2:12
|
|
|
3:30
|
|
Quality |
7:39
|
|
Well, the benefit from the integer instructions is rather small both
in case of the MMX and Enchanced 3D Now!/MMX2. But according to the results
of the original 3D Now! and SSE even this benefit is important for AMD:
these three sets take the same places both in time of appearance and in
efficiency. The rumors that the SSE is more efficient than the Enchanced
3D are exaggerated. Users of AMD processors need SSE only if a program
they use doesn't support 3D Now! (which happens quite often). Besides,
when all optimization versions were used we saw that the result coincided
with the one obtained with the quicker Enchanced 3D Now!, not with the
slower SSE.
The Lame results are much worse. It manages to win only with the speed
optimization enabled and only at the expense of quality. The same result
is achieved by the GOGO at this bitrate with the psychoacoustics off, but
it spends only 50 sec to compress this file. The disabled optimization
gives poorer quality and a lower speed as compared with the GOGO. As for
quality optimization the SSE/3D Now! could give an excellent absolute result:
the low compression speed could make it much more noticeable.
Besides, if you are going to compress music on the fly and use codecs
not optimized for floating point SIMD instructions a processor will be a
bottleneck. Even the Athlon XP 1800+ is not enough to keep up with the
speedy drive if the whole load will be on the x87 FPU. Users of the
Pentium 4 need the SSE optimization to encode music into MP3 like the air
to breathe! For K6-2, K6-III or K6-2+ non-optimized codecs for these processors
are like weights on your legs, while the 3D Now! allows improving the results.
The most important thing is that optimization for floating point SIMD
instructions provides the same quality along with a considerable
acceleration of encoding; this is what it differs in from the speed
optimization in the Lame and other similar software.
320 Kbit/s
Coder |
Mode |
Time, min:s |
Speed, X |
GOGO |
No optimization |
2:51
|
26.1
|
MMX |
2:41
|
27.8
|
MMX+3D Now! |
1:58
|
38.0
|
MMX+Enchanced 3D Now!/MMX2 |
1:51
|
40:1
|
MMX+SSE |
1:54
|
39.4
|
Full Optimization |
1:51
|
40.1
|
Lame |
Speed |
2:24
|
|
|
2:29
|
|
Quality |
4:34
|
|
It is easier to compress four times than 11 times, that is why the results
became better almost in all lines. Besides, there will be no problems with
on-the-fly compression (of course, only on powerful processors). The trends
are the same except one case: speed optimization in the Lame gives no gain
at high bitrates and the coder loses everywhere. Quality now is a little
better but good optimization in the Lame is still needed.
By the way, the GOGO with the psychoacoustics disabled managed this
task at the same 50 seconds as at a lower bitrate. In this case the performance
of the disc subsystem will be a bottleneck. On the one hand, it seems that
15 MBytes/s is not enough for the UDMA100 but... Half a year ago the ITC published
results of performance comparison for real applications for two modes of
operation of a hard drive - UDMA100 and DMA2. In both cases the results
were the same, i.e. DMA2 with a bandwidth of 16 MBytes/s suffices for modern
drives working at 7200 rpm with 2 MBytes of flash memory. They tested on the
IBM DTLA drive and an i815E based mainboard. And I obtained a similar result
on the Fujitsu (the drive of the same class) and on the KT266 board.
Conclusion
Of course, we shouldn't generalize the results for all tasks as in other
applications the situation may differ. But optimized applications will
never lose - at the worst, the benefit will be just several percents. Any
the speed doesn't grow at the expense of quality. In general, the conclusion
is that non-intergral SIMD instructions are required, software optimization
is also necessary, and they provide almost the same performance growth
with the 3D Now! being a bit better. Nevertheless, the third conclusion
may turn out to be unimportant as AMD decided to provide the SSE support,
and the number of programs meant for these instructions may be greater
than for the 3D Now!.
Write a comment below. No registration needed!