Mp3 compression

By Andrei Gule

Introduction

The article is devoted to some fine features which appear when using audio data compression standard MPEG I/II Layer 3 (mp3). There is no any complete work like testing of coders or mp3 players; I just tried to give you what I know on the mentioned standard.

The way of representation of the material supposes that you have already got some knowledge on the topic.

Strategic issues

In this part I will consider some issues that don't concern directly the process of compression.

Does it make sense to use compression with data losses?

I am sure that it doesn't make sense to make audio data archives (samples libraries, record libraries etc.) in mp3 (it concerns MiniDisk as well since there used coding with losses, and other formats). After that sound can't be processed: with many methods of digital processing there appear noticeable distortions, so you shouldn't keep i.e. samples in mp3. There is no way anymore to restore disappeared data or improve sound. That is one-way ticket.

As for myself, I have chosen to keep record library in wav-files. You may use also CD-DA: better compatibility though reliability is less, it doesn't suit me. Another alternative is to compress without losses, i.e. usual archiving (ZIP, RAR) or special programs such as RKA, MonkeyAudio. This method makes many troubles while working with these files: wav are played by the majority of players, and as for exotics like RKA... I know that a plug-in for RKA under WinAmp exists. But in any case WinAmp is not alone. What about other soft players? Hardware players? mp3-CD players? As for me, I don't agree to use only one pair coder/player. Look: only to share some files with my friends I should convince them of usage a new player.

Storage of wav-files, in case of developing a new quality algorithm (let it be mp6), will allow me to convert quickly all my records from exact original copies into a new format. Do you remember that a similar situation occurred when MPEG4 just started its triumph procession, in that case MPEG 1/2 was considered archaism.

By the way, do you know how an audio-disc converted into mp3 can be recorded back on an audio CD, and without any pauses or flicks? You don't? Read for example www.r3mix.net. It's a great mess. I know that in theory it can be done without slightest pauses, but the time spent doesn't worth it.

What compression format with losses is better: mp3, LQT, WMA, MP+ or anything else?

So far none of the mp3 alternatives lied close in terms of quality and compatibility simultaneously.

There are some formats that ensures quality even better than that of mp3, i.e. LQT AAC which is often called mp4. But its bitrate is limited by 192 kbps, and it's exactly what looks awful for fans of mp3 @ 256/320 kbps. Besides, it makes stronger requirements to hardware than mp3. The last problem is temporary, though. Nevertheless, nobody would doubt that mp3 is still a leader in compatibility.

In my opinion it will take too much time to force mp3 out of the market. Look at CD-DA, which was said to be die soon after mp3 appearance, MiniDisk and others.

Mp3 has its own range of application. It's not convenient to put in a new sound disc in the drive every hour when HDD size amounts tens gigabyte. It makes more comfort to record mp3 songs on HDD or CD-ROM and then listen to. There arise mp3 players, mp3-CD, automobile recorders with mp3 support. Remember downloading mp3 from the Internet. As for me, I'm keeping about 5 GBytes mp3, mainly 128 KBytes/s, not because of my stupidity but it's just 90% of these files were created not by me.

What influences the choice of compression parameters?

In my opinion, there are two compression modes: "acceptable quality level with maximum compression" and "no change in quality with some compression degree". But note that thresholds of each mode depend on person, and as for me, they are 128 and 256 kbps correspondingly.

The matter is that psychoacoustic model is developed for an average person. And that's why you can often meet deviations.

So the simplest and the most reliable way is to make everything for yourself. One time you should carry out experiments in order to determine your own parameters, and then, in future, just follow them.

What mp3 player is better?

Amongst soft players it's nearly everything what is made on the Fraunhofer code: some versions of WinAmp (1.6, 2.20, 2.21, 2.22, 2.666, 2.7, the others use their own decoder from NullSoft), WinPlay, AudioActive, Microsoft Windows Media Player... Besides, X-Audio is worth mentioning and everything on this code. There are as well many other players on ISO code, the best are MPG123 and Apollo. Everything on Xing code (Xing player, FreeAmp) is considered the worst - these players emphasize high frequencies in order to indemn for highs which are neglected by other Xing coders.

As for hardware mp3 players, I have little knowledge there. But still, hardware usually uses the same algorithms as soft players. Some players even contain flash-chips in order to update mp3 decoder. Whatever the case, you should pay attention to the code, the decoding algorithm is based on. Usually, everything on Fraunhofer is good, ISO and X-Audio - depends on realization, Xing - bad.

LAME - is this ISO code or not?

Initially LAME was made as a patch (correction or change of some files) for original ISO code, and the special stress was laid on error correction and algorithm improvement (i.e. usage of short blocks). But some half a year ago with the version 3.6 it was noticed that the whole ISO code was changed, and LAME is compiled without original ISO sources (everything for compilation was included in the patch). Today LAME competes against with coders based on Fraunhofer code in terms of quality and speed.

What's better: LAME or Fraunhofer-based coders?

Well... The only I can say here is that all ISO-based coders, moreover based on Xing, makes no sense to use. LAME absorbed everything best from ISO and went on pursuing Fraunhofer. Thus, some prefer to use it, but some still deal only with Fraunhofer: LAME releases versions nearly every day, and they constantly find errors in old versions. But is Fraunhofer better: there are errors as well, but they are not corrected for years. By the way, LAME project coordinator announced release of stable versions, that is without innovations and with just old errors corrected. The current stable version is 3.70.

Data preparation before compression

This part contains some recommendations on how to prepare audio data in digital format for compression process.

Is it necessary decreasing signal level?

Yes, if a peak level of a source signal is about 0 dB, otherwise you might get signal distortion after encoding. And because of losses, the source signal will be restored roughly. Thus, on the segment with peak amplitude you might have got an exceeding of peak level of a signal what will cause distortions. The number of distortions depends on a coder and bitrate (the higher bitrate, the less distortions). That's why a level of a source signal should be undercut before compression.

How much to undercut? It depends. You have to reckon with the fact that when decreasing the level and redigitization the distortions of the source signal might occur as well. Undoubtedly, you will get much less distortions in redigitization when decreasing two times, but it's too strong volume decrease. Some prefer whole numbers in dB, i.e. -3 dB.

Since distortions following exceeding of peak level depend on bitrate and coder I'd like to show you some results obtained by one of my friends: "at 320 + LAME 98% is normal, and at 128 - 85-88% of the maximum level (100% = 0dB) ."

Does one need to use normalization?

Practically no in all cases. Moreover, considering the previous issue the normalization for very high level looks foolish (often 98% or even 100%).

So, normalization when working with data from audio CDs is practically always inapplicable, and when working with other recordings it is but only with too low signal level and only for the whole album.

There you'd better increase by the whole numbers since the normalization is nothing but redigitization with a new signal level.

Details of compression process

As the name indicates in this part we will consider choice of coder parameters, bitrate and other stuff.

Is it necessary to disable psychoacoustics (-f)?

I don't think so. Look at how fast the LAME is developing. I have tested 3.24 version. Today the topical issue is 3.87. In the old version I really heard the difference between a file with enabled psychoacoustics and disabled one. I liked the latter more. Though it mustn't be so as far as mp3 coding theory is concerned. Psychoacoustics is a significant part of compression algorithm. So it might have been a mistake of those versions. The mistake was corrected. But it's still for you to decide what's better.

What coding mode of stereo signal is better: stereo or joint stereo?

It depends. Note that some records carry phase shift between channels what prevents the usage of joint stereo. However, there is special software that corrects such shift.

Note that last versions of LAME can choose automatically for each frame what is better: stereo or joint stereo.

Is it necessary to use variable bitrate (VBR)?

The problem is that VBR requires its own psychoacoustic model which will control bitrate change. Earlier coders used CBR (Constant BitRate) and they followed the principle "to provide maximum quality while packing data in a stream of the established width". VBR means different: "to provide established quality level using a stream of minimum width", that's why compression algorithms for VBR are to be created anew. And the LAME is the best developed prototype for such algorithm. Neither Xing nor Fraunhofer has reached any proper results in the field of VBR - their algorithms modify bitrate in the range of 10-15% from the base one, it's not satisfactory.

The problem is that there is no so far any exact mathematical model of a human ear. The development of psychoacoustic algorithms are carried out by experimental way. That's why it makes difficulties in building an algorithm that receives some "quality level" on its input, since it's still unclear how this quality level depends on other algorithm parameters, including bitrate. No doubt that for a person far from this technology it's easier to deal with a parameter that directly sets compression quality level than with some internal algorithm parameters.

There appears an alternative variant ABR (Average BitRate) realized in LAME. In general, it's the same VBR though with improved old CBR algorithm. When the quality fell down under some threshold - bitrate rises. If a coding signal is rather simple, the bitrate decreases, it ensures an in average the established bitrate. In result we receive a usual file with VBR screen. As a parameter we indicate not an abstract quality level but an average bitrate which we want to obtain - many people working with mp3 CBR will find this way the most convenient.

Besides, we should notice that some players can't play VBR correctly. For example, NAD that is considered very quality player gives horrible distortions when playing such files: when this player was created none of the coders used VBR. Some hardware mp3 players also face such problem.

Nevertheless, it's clear that the principle lying in the base of VBR is much more convenient in practical use. The only we are waiting for now is a hardware realization of algorithms that create VBR files.

Where to take LAME under Windows?

See "References".

How do you code sound yourself?

With LAME. For 128 CBR psychoacoustics is enabled.

References

In this part I will give you a list of references to the sites on the Internet which are the most important to my mind. Some of them contain quite rich lists of references as well.

Apollo (mp3, mp2, wav players).

Audio compression programs without losses: RKA, MonkeyAudio.

Window shell for LAME coder: RazorLame.

What is new on the Net on this topic?

Digital sound quality measurements...
An interesting site devoted mainly to LAME but it contains a lot of information on mp3 as well.

Author: why I decided to test coders

A couple years ago I had an idea to start collecting a record library on CD-R in mp3. From that time a coding technology enchanted me. After I had read hundreds of kilobytes on the Net, different articles, communication with other enthusiasts It seemed to me there was too little information for choice of a coder and a bitrate for myself, for my collection. And right away I stepped on a way of experiment. If for myself, it was sufficient to go just with some coders based on the Fraunhofer code, some on ISO. As for Xing, I determined quite fast what it was - it was just enough listening to one time to see what it did at 320 kbps. I wanted to find a decent coder based on a code from Xing. I failed :)

There arose another question: how to test? First I wanted to subtract one wav from another (wav received from mp3 and wav obtained from a reference CD), but I feared problems with offset. So I stopped at comparison of frequency response functions. With averaging-out every 30 seconds, what should compensate the difference from different offsets. 30 seconds from the beginning of compositions what should have shown work of a coder within wide dynamic range. Undoubtedly, that this method of averaging allows only to reveal some tendencies, proper comparison is not possible here.

The next question is: frequency response function of what stereo channel? I decided to go with everything: both a right channel and a left one and even their average. Further went the technology.

Frequency response function was checked with the help of CoolEdit.

And what did I do with thousand of graphics files with frequency response functions? Well, I wrote a program that processed those pictures.

So that was only dawn of my review on soft mp3 coders.

And my next testing...

I don't think it will appear in the nearest future, especially now when I'm eager to maintain audio information without losses. I have become more analyst than a tester. I'm collecting information on tests of other people, their opinions and then I analyze and make my own conclusions.

Write a comment below. No registration needed!