iXBT Labs - Computer Hardware in Detail

Platform

Video

Multimedia

Mobile

Other

SIMD instructions in MP3 Encoding



Probably none of the architectural innovations in the x86 line wasn't discussed and disputed so much as MMX, 3D Now!, SSE and alike. The MMX, however, wasn't argued so much: it was only discussed whether these instructions are necessary at all. It is because all manufacturers at that time were sailing in one boat simultaneously applying new instructions (though AMD and Cyrix were in a little delay). Later each company chose its own way and, of course, started promoting its own SIMD instructions. But between releases of the K6-2 and the Pentium III Intel's fans said that floating point SIMD instructions were not of great use, though their opinion changed with arrival of the SSE: their advantages were acknowledged, contrary to the 3D Now! AMD then integrated the SSE support in their new processors. And after the SSE the SSE2 became a sort of a life saver. Nevertheless, AMD's fans stated, for example, that it didn't matter that the K6-2 had such a weak coprocessor as it would definitely win because of the 3D Now! support.

Frankly speaking it wasn't possible to compare efficiency of instructions from different manufacturers for a long time as the processors differed not only in it. Today new AMD processors support everything but for the SSE2, that is why one can carry out comparison on one platform. For the SSE2 there is not enough software today that is why we will put it off and take a look at the benefits of the SIMD instructions by the example of one important task - MP3 encoding of music.

Tests

I used the GOGO codec. It is a real tester's dream as it incorporates optimization for MMX, 3D Now!, Enchanced 3D Now!, SSE and multiprocessor configurations; all this can be enabled/disabled manually. Here you can get information on the time compression takes and a speed in x'es (1x=150 KBytes/s). In this case it is called a ratio of reproduction time to coding time. Along with a high speed the codec provides high quality and excellent compatibility with various decoders. That is why I've been using the GOGO 2.39c and WinGOGO for a long time already and I'm not going to change it.

First of all, I tested with everything being disabled to get a sort of a base result. After that I used only the MMX. Well, it is considered that integer instructions do not influence a music compression speed but I'd like to check it. Besides, a lot of codecs are optimized only for these instructions. All other results starting from this one were obtained with the MMX enabled. The third line is the original 3D Now! instructions. The fourth one includes Enchanced 3D Now!/MMX2 which is an extension of the base set that came with the first Athlons. The fifth one is the SSE from Intel. Maybe it is not correct to test Intel's instructions on the AMD processor but there is no other ways. Besides, it is considered that the SSE support is realized better in the Athlon XP/Duron processors than in the Intel ones. I don't know whether this suggestion is true but I proceed from the assumption that the level is approximately the same. And the last variant is when everything is enabled.

It was interesting to compare the results not only inside the program but also using some badly optimized codecs. For this purpose I chose the Lame 3.89 (the GOGO as you know is based on the Lame). The latter doesn't have much that can be adjusted: only optimization of speed and quality and its absence. The quality of files obtained with the GOGO with the default settings is better than with the quality optimization of the Lame and worse than without it. Besides, we could disable the psychoacoustics in the GOGO as well but the results wouldn't be of great interest.

As a tested object I used a collection of pop music which was grabbed into one WAV file of 751 MBytes with the help of the EAC. Then it was compressed into MP3 at a bitrate of 128 and 320 Kbit/s in the stereo mode. Each experiment was carried out three times, and the figures were averaged and approximated to whole seconds and tenth parts of x'es.

Now a little on the test system. Here I used AMD Athlon XP 1800+, Soltek SL-75DRV (KT266), 256 MBytes PC2100 DDR SDRAM, and Fujitsu MPG3204AH-E HDD (20 GBytes, UDMA100). The disc was grabbed with the SCSI CD-RW Plextor PlexWriter 1210TS. The speed was quite good - about 20x at the beginning and 26x at the end. All this worked under the Windows 2000 Professional.

No come the test results.

128 Kbit/s

Coder Mode Time, min:s Speed, X
GOGO No optimization
4:00
18.6
MMX
3:50
19.4
MMX+3D Now!
2:46
26.9
MMX+Enchanced 3D Now!/MMX2
2:36
28.7
MMX+SSE
2:40
27.9
Full Optimization
2:36
28.7
Lame Speed
2:12
3:30
Quality
7:39

Well, the benefit from the integer instructions is rather small both in case of the MMX and Enchanced 3D Now!/MMX2. But according to the results of the original 3D Now! and SSE even this benefit is important for AMD: these three sets take the same places both in time of appearance and in efficiency. The rumors that the SSE is more efficient than the Enchanced 3D are exaggerated. Users of AMD processors need SSE only if a program they use doesn't support 3D Now! (which happens quite often). Besides, when all optimization versions were used we saw that the result coincided with the one obtained with the quicker Enchanced 3D Now!, not with the slower SSE.

The Lame results are much worse. It manages to win only with the speed optimization enabled and only at the expense of quality. The same result is achieved by the GOGO at this bitrate with the psychoacoustics off, but it spends only 50 sec to compress this file. The disabled optimization gives poorer quality and a lower speed as compared with the GOGO. As for quality optimization the SSE/3D Now! could give an excellent absolute result: the low compression speed could make it much more noticeable.

Besides, if you are going to compress music on the fly and use codecs not optimized for floating point SIMD instructions a processor will be a bottleneck. Even the Athlon XP 1800+ is not enough to keep up with the speedy drive if the whole load will be on the x87 FPU. Users of the Pentium 4 need the SSE optimization to encode music into MP3 like the air to breathe! For K6-2, K6-III or K6-2+ non-optimized codecs for these processors are like weights on your legs, while the 3D Now! allows improving the results. The most important thing is that optimization for floating point SIMD instructions provides the same quality along with a considerable acceleration of encoding; this is what it differs in from the speed optimization in the Lame and other similar software.

320 Kbit/s

Coder Mode Time, min:s Speed, X
GOGO No optimization
2:51
26.1
MMX
2:41
27.8
MMX+3D Now!
1:58
38.0
MMX+Enchanced 3D Now!/MMX2
1:51
40:1
MMX+SSE
1:54
39.4
Full Optimization
1:51
40.1
Lame Speed
2:24
2:29
Quality
4:34

It is easier to compress four times than 11 times, that is why the results became better almost in all lines. Besides, there will be no problems with on-the-fly compression (of course, only on powerful processors). The trends are the same except one case: speed optimization in the Lame gives no gain at high bitrates and the coder loses everywhere. Quality now is a little better but good optimization in the Lame is still needed.

By the way, the GOGO with the psychoacoustics disabled managed this task at the same 50 seconds as at a lower bitrate. In this case the performance of the disc subsystem will be a bottleneck. On the one hand, it seems that 15 MBytes/s is not enough for the UDMA100 but... Half a year ago the ITC published results of performance comparison for real applications for two modes of operation of a hard drive - UDMA100 and DMA2. In both cases the results were the same, i.e. DMA2 with a bandwidth of 16 MBytes/s suffices for modern drives working at 7200 rpm with 2 MBytes of flash memory. They tested on the IBM DTLA drive and an i815E based mainboard. And I obtained a similar result on the Fujitsu (the drive of the same class) and on the KT266 board.

Conclusion

Of course, we shouldn't generalize the results for all tasks as in other applications the situation may differ. But optimized applications will never lose - at the worst, the benefit will be just several percents. Any the speed doesn't grow at the expense of quality. In general, the conclusion is that non-intergral SIMD instructions are required, software optimization is also necessary, and they provide almost the same performance growth with the 3D Now! being a bit better. Nevertheless, the third conclusion may turn out to be unimportant as AMD decided to provide the SSE support, and the number of programs meant for these instructions may be greater than for the 3D Now!.

Write a comment below. No registration needed!


Article navigation:



blog comments powered by Disqus

  Most Popular Reviews More    RSS  

AMD Phenom II X4 955, Phenom II X4 960T, Phenom II X6 1075T, and Intel Pentium G2120, Core i3-3220, Core i5-3330 Processors

Comparing old, cheap solutions from AMD with new, budget offerings from Intel.
February 1, 2013 · Processor Roundups

Inno3D GeForce GTX 670 iChill, Inno3D GeForce GTX 660 Ti Graphics Cards

A couple of mid-range adapters with original cooling systems.
January 30, 2013 · Video cards: NVIDIA GPUs

Creative Sound Blaster X-Fi Surround 5.1

An external X-Fi solution in tests.
September 9, 2008 · Sound Cards

AMD FX-8350 Processor

The first worthwhile Piledriver CPU.
September 11, 2012 · Processors: AMD

Consumed Power, Energy Consumption: Ivy Bridge vs. Sandy Bridge

Trying out the new method.
September 18, 2012 · Processors: Intel
  Latest Reviews More    RSS  

i3DSpeed, September 2013

Retested all graphics cards with the new drivers.
Oct 18, 2013 · 3Digests

i3DSpeed, August 2013

Added new benchmarks: BioShock Infinite and Metro: Last Light.
Sep 06, 2013 · 3Digests

i3DSpeed, July 2013

Added the test results of NVIDIA GeForce GTX 760 and AMD Radeon HD 7730.
Aug 05, 2013 · 3Digests

Gainward GeForce GTX 650 Ti BOOST 2GB Golden Sample Graphics Card

An excellent hybrid of GeForce GTX 650 Ti and GeForce GTX 660.
Jun 24, 2013 · Video cards: NVIDIA GPUs

i3DSpeed, May 2013

Added the test results of NVIDIA GeForce GTX 770/780.
Jun 03, 2013 · 3Digests
  Latest News More    RSS  

Platform  ·  Video  ·  Multimedia  ·  Mobile  ·  Other  ||  About us & Privacy policy  ·  Twitter  ·  Facebook


Copyright © Byrds Research & Publishing, Ltd., 1997–2011. All rights reserved.