|
| |
Details of MP3 audio
Because MP3 is a lossy format, it is able to provide a number of different
options for its "bit rate"—that is, the number of bits of encoded data that are
used to represent each second of audio. Typically rates chosen are between 128
and 320 kilobit per second. By contrast, uncompressed audio as stored on a
compact disc has a bit rate of about 1400 kbit/s.
MP3 files encoded with a lower bit rate will generally play back at a lower
quality. With too low a bit rate, "compression artifacts" (i.e., sounds that
were not present in the original recording) may appear in the reproduction. A
good demonstration of compression artifacts is provided by the sound of
applause: it is hard to compress because it is random, therefore the failings of
the encoder are more obvious, and are audible as ringing.
As well as the bit rate of the encoded file, the quality of MP3 files depend on
the quality of the encoder and the difficulty of the signal being encoded. For
average signals with good encoders, many listeners accept the MP3 bit rate of
128 kibit/s as near enough to compact disc quality for them, providing a
compression ratio of approximately 11:1. When CDs are properly compressed at
this ratio, they are far superior in quality to FM radio and cassette tape
audio. In order to achieve similar quality MP3 files could be compressed to a
greater than 20:1 ratio. However, listening tests show that with a bit of
practice many listeners can reliably distinguish 128 kbit/s MP3s from CD
originals; in many cases reaching the point where they consider the MP3 audio to
be of unacceptably low quality. Yet other listeners, and the same listeners in
other environments (such as in a noisy moving vehicle or at a party) will
consider the quality acceptable. Obviously, imperfections in an MP3 encode will
be much less apparent on low-end computer speakers than on a good stereo system
connected to a computer or -- especially -- using high-quality headphones.
Fraunhofer Gesellschaft (FhG) publish on their official webpage the following
compression ratios and data rates for MPEG-1 Layer 1, 2 and 3, intended for
comparison:
Layer 1: 384 kbit/s, compression 4:1
Layer 2: 192...256 kbit/s, compression 8:1...6:1
Layer 3: 112...128 kbit/s, compression 12:1...10:1
The differences between the layers are caused by the different psychoacoustic
models used by them; the Layer 1 algorithm is typically substantially simpler,
therefore a higher bit rate is needed for transparent encoding. However, as
different encoders use different models, it is difficult to draw absolute
comparisons of this kind.
Many people consider these quoted rates as being heavily skewed in favour of
Layer 2 and Layer 3 recordings. They would contend that more realistic rates
would be as follows:
Layer 1: excellent at 384 kbit/s
Layer 2: excellent at 256...384 kbit/s, very good at 224...256 Kbit/s, good at
192...224 Kbit/s
Layer 3: excellent at 224...320 Kbit/s, very good at 192...224 Kbit/s, good at
128...192 Kbit/s
When comparing compression schemes, it is important to use encoders that are of
equivalent quality. Tests may be biased against older formats in favour of new
ones by using older encoders based on out-of-date technologies, or even buggy
encoders for the old format. Due to the fact that their lossy encoding loses
information, MP3 algorithms work hard to ensure that the parts lost cannot be
detected by human listeners by modeling the general characteristics of human
hearing (e.g., due to noise masking). Different encoders may achieve this with
varying degrees of success.
A few possible encoders:
LAME first created by Mike Cheng in early 1998. It is (in contrast to others) a
fully LGPL'd MP3 encoder, with excellent speed and quality, rivaling even MP3's
technological successors.
Fraunhofer Gesellschaft: Some encoders are good, some have bugs.
Many early encoders that are no longer widely used:
ISO dist10 reference code
Xing
BladeEnc
ACM Producer Pro.
Good encoders produce acceptable quality at 128 to 160 Kibit/s and
near-transparency at 160 to 192 kbit/s, while low quality encoders may never
reach transparency, not even at 320 kbit/s. It is therefore misleading to speak
of 128 kbit/s or 192 kbit/s quality, except in the context of a particular
encoder or of the best available encoders. A 128 kbit/s MP3 produced by a good
encoder might sound better than a 192 kbit/s MP3 file produced by a bad encoder.
It is important to note that quality of an audio signal is subjective. A given
bit rate suffices for some listeners but not for others. Individual acoustic
perception may vary, so it is not evident that a certain psychoacoustic model
can give satisfactory results for everyone. Merely changing the conditions of
listening, such as the audio playing system or environment, can expose unwanted
distortions caused by lossy compression. The numbers given above are rough
guidelines that work for many people, but in the field of lossy audio
compression the only true measure of the quality of a compression process is to
listen to the results.
If your aim is to archive sound files with no loss of quality (or work on the
sound files in a studio for example), then you should use Lossless compression
algorithms, currently capable of compressing 16-bit PCM audio to 38% while
leaving the audio identical to the original, such as Lossless Audio LA, Apple
Lossless, FLAC, Windows Media Audio 9 Lossless (wma) and Monkey's Audio (among
others). Lossless formats are strongly preferred for material that will be
edited, mixed, or otherwise processed because the perceptual assumptions made by
lossy encoders may not hold true after processing. The losses produced by
multiple stages of coding may also compound each other, becoming more evident
when the signal is reencoded after processing. Lossless formats produce the best
possible result, at the expense of a lower compression ratio.
Some simple editing operations, such as cutting sections of audio, may be
performed directly on the encoded MP3 data without necessitating reencoding. For
these operations, the concerns mentioned above are not necessarily relevant, as
long as appropriate software (such as mp3DirectCut and MP3Gain) is used to
prevent extra decoding-encoding steps.
Bit rate
The bit rate is variable for MP3 files. The general rule is that more
information is included from the original sound file when a higher bit rate is
used, and thus the higher the quality during play back. In the early days of MP3
encoding, a fixed bit rate was used for the entire file.
Bit rates available in MPEG-1 Layer 3 are 32, 40, 48, 56, 64, 80, 96, 112, 128,
160, 192, 224, 256 and 320 kbit/s, and the available sample frequencies are 32,
44.1 and 48 kHz. 44.1 kHz is almost always used (coincides with the sampling
rate of compact discs), and 128 kbit/s has become the de facto "good enough"
standard, although 192 Kbit/s is becoming increasingly popular over peer-to-peer
file sharing networks. MPEG-2 and [the non-official] MPEG-2.5 includes some
additional bit rates: 8, 16, 24, 32, 40, 48, 56, 64, 80, 96, 112, 128, 144, 160
kbit/s.
Variable bit rates (VBR) are also possible. Audio in MP3 files are divided into
frames (which have their own bit rate) so it is possible to change the bit rate
dynamically as the file is encoded (although not originally implemented, VBR is
in extensive use today). This technique makes it possible to use more bits for
parts of the sound with higher dynamics (more sound movement) and fewer bits for
parts with lower dynamics, further increasing quality and decreasing storage
space. This method compares to a sound activated tape recorder that reduces tape
consumption by not recording silence. Some encoders utilize this technique to a
great extent.
Non-standard bitrates up to 640 kbit/s can be achieved with the LAME encoder and
the --freeformat option, however only few MP3 players can play those files.
Design limitations of MP3
There are several limitations inherent to the MP3 format that cannot be overcome
by using a better encoder.
Newer audio compression formats such as Vorbis and AAC no longer have these
limitations.
In technical terms, MP3 is limited in the following ways:
Bitrate is limited to a maximum of 320 kbit/s
Time resolution can be too low for highly transient signals
No scale factor band for frequencies above 15.5/15.8 kHz
Joint stereo is done on a frame-to-frame basis
Encoder/decoder overall delay is not defined, which means lack of official
provision for gapless playback
Nevertheless, a well-tuned MP3 encoder can perform competitively even with these
restrictions.
Encoding of MP3 audio
The MPEG-1 standard does not include a precise specification for an MP3 encoder.
The decoding algorithm and file format, as a contrast, are well defined.
Implementers of the standard were supposed to devise their own algorithms
suitable for removing parts of the information in the raw audio (or rather its
MDCT representation in the frequency domain). During encoding 576 time domain
samples are taken and is transformed to 576 frequency domain samples. If there
is a transient 192 samples are taken instead of 576. This is done to limit the
temporal spread of quantization noise accompanying the transient.
This is the domain of psychoacoustics: the study of human acoustic perception
(in both the ear and in the brain).
As a result, there are many different MP3 encoders available, each producing
files of differing quality. Comparisons are widely available, so it is easy for
a prospective user of an encoder to research the best choice. It must be kept in
mind that an encoder that is proficient at encoding at higher bitrates (such as
LAME, which is in widespread use for encoding at higher bitrates) is not
necessarily as good at other, lower bitrates.
Decoding of MP3 audio
Decoding, on the other hand, is carefully defined in the standard. Most decoders
are "bitstream compliant", meaning that the uncompressed output they produce
from a given MP3 file will be the same (within a specified degree of rounding
tolerance) as the output specified mathematically in the standard document. The
MP3 file has a standard format which is a frame consisting of 384, 576, or 1152
samples (depends on MPEG version and layer) and all the frames have associated
header information(32 bits) and side information(9, 17, or 32 bytes, depending
on MPEG version and stereo/mono).The header and side information help the
decoder to decode the associated huffman encoded data correctly.
Therefore, for the most part, comparison of decoders is almost exclusively based
on how computationally efficient they are (i.e., how much memory or CPU time
they use in the decoding process).
ID3 and other tags
Main articles: ID3 and APEv2 tag
A "tag" is data stored in an MP3 (as well as other formats) that contains
metadata such as the title, artist, album, track number or other information
about the MP3 file to be added to the file itself. The most widespread standard
tag formats are currently the ID3 ID3v1 and ID3v2 tags, and the more recent
APEv2 tag.
APEv2 was originally developed for the MPC file format (see the APEv2
specification). APEv2 can coexist with ID3 tags in the same file, but it can
also be used by itself.
Volume normalization
As compact discs and other various sources are recorded and mastered at
different volumes, it is useful to store volume information about a file in the
tag so that at playback time, the volume can be dynamically adjusted.
A few standards for encoding the gain of an MP3 file have been proposed. The
idea is to normalize the volume (not the volume peaks) of audio files, so that
the volume does not change between consecutive tracks.
The most popular and widely used solution for storing replay gain is known
simply as "Replay Gain". Typically, the average volume and clipping information
about audio track is stored in the metadata tag.

| |
|