just over the past decade or so, there has been explosive growth in the number of portable devices that incorporate miniature speakers—cell phones, MP3 players, GPS systems, laptops and notebook computers, tablet computers, gaming devices, toys, and so forth. As consumer tastes become more refined, the demand for ever-higher quality sound reproduction in these devices is also growing.
This presents product manufacturers with ever increasing challenges in coaxing louder, higher-quality sound out of these small, lightweight, inexpensive speakers. This paper addresses some of the methods portable electronics makers can and do employ to achieve these goals.
Equalization
The ideal loudspeaker (of any size) would have a “flat” frequency response--that is to say, it could transmit sound to the air around it with the same volume level at all frequencies from 20Hz to 20kHz, without peaks or dips in the amplitude of the spectrum. In practice, no speaker can in fact do this, and the greater the effort made to flatten a speaker’s response the greater the cost and complexity involved. Figure 1 shows a cell-phone speaker's response.
Since the speakers employed in portable devices are both low in cost and simple in construction, they therefore cannot contain the sophistication necessary for truly flat response, and inherently exhibit fairly significant variations in their output level across the audio spectrum—most notably at frequencies below a few hundred Hz.
To compensate for unevenness, a speaker’s spectral response can be measured and characterized, and then compensated for via the use of equalization or filtering circuitry whose frequency response complements the unevenness of the speaker response. That is, at frequencies where the speaker attenuates the audio signal, equalization circuitry can be made to proportionately boost the signal. Commensurately, in areas of audible peaks in the speaker’s response, equalization circuitry can soften the signal. The result is greater flatness in the speaker’s perceived output.
There are at least two drawbacks to equalization. First, it adds complexity to the system. The more uneven the speaker’s response, the more involved the equalization scheme. DSP can be effectively used to implement the equalization curve, but this comes with at least some cost in silicon area and power consumption.
Second, physical limitations in the speaker prevent flatness from being possible across the entire audio spectrum even with equalization added. A single element transducer such as a cell phone speaker whose effective driver-element diameter is often one inch or less is not capable of delivering useful or audible energy across the full audio band (which is why two-way and three-way speakers are common in applications such as home and automotive stereo gear and in public address systems).
This is particularly obvious in the low-frequency area. A diaphragm as small as this simply cannot couple low-frequency energy to the air effectively, and attempts to boost the amplitude of the signal at low frequencies to compensate would push the speaker past its physical and thermal limits. Thus, even in the presence of equalization, low frequency—or bass—response in portable electronics is generally lacking.
Synthetic (psychoacoustic) bass enhancement
The tiny size of speakers in handheld devices, as described above, compromises their ability to deliver low frequencies, and thus the bass portion of the audio program material suffers. Methods do exist, however, to synthetically introduce elements into the sound that make the bass frequencies seem to be present.
This involves tricking the human-auditory system by generating overtones from the bass frequencies that the small speakers cannot deliver—overtones that lie in the region that the speakers are indeed capable of delivering—and inserting them into the audio stream. There are at least two principles of human hearing that make this possible; one is called the “difference tone" and the other is the “missing fundamental” or “residue pitch". [see Reference]
Figure 2: Difference tone--Fo is missing but is implied by 2Fo and 3Fo.
In the case of difference tone, a bass note can be simulated by mixing a tone which is one octave above the intended note along with a tone one musical 5th above the octave tone (see Figure 2). For example, if one wanted to synthesize a C note three octaves below middle C, one could mix a C tone two octaves below middle C with the G note just above the latter C tone.
This practice of using difference tones has been incorporated into pipe and electronic organs for many years to avoid the need for very long pipes or very large speaker systems. In the case of the missing fundamental, the series of naturally occurring overtones from an instrumental note can “imply” the fundamental note to the human ear, even if the fundamental is missing (see Figure 3).
Figure 3: Missing fundamental. Fo is missing but is implied
by its harmonic signature.
Either of these two methods can be used to generate tones within the pass-band of a small speaker that imply or synthesize notes lying within the frequency region below that which the speaker can actually transmit, thus enhancing the apparent low frequency response. To achieve this, the low end of the audio spectrum is separated out from the main signal path, and non-linear processing is applied to produce overtones as described above.
The resulting synthesized bass is reintroduced into the signal path and fed to the speaker. The disadvantages of this method include unpredictable results or audible artifacts that stem from the non-linear processing of complex program material or highly dynamic sources (such as impulse sounds).
Compression
The tiny speakers used in today’s portable electronics are limited not only in frequency range, but also in absolute loudness. The loudness limitation not only involves the small size of the vibrating element that couples energy to the atmosphere, but also the maximum degree of movement or excursion the element can tolerate; it can only go so far before it hits a physical limit or damages its suspension.
One means of increasing the average perceived loudness of the sound without overextending or damaging the speaker is by the use of compression. Compression circuitry constantly monitors the instantaneous loudness of the audio signal, increasing the gain for quieter passages while leaving the louder material more or less unchanged. This is done on a very rapid basis, following the loudness envelope of the program material, and with a relatively smooth compression characteristic.
Figure 4: Dynamic range compression.
Figure 4 shows how the loudness of soft passages is lifted by a significant degree, while the maximum output level for both the compressed and uncompressed curves (intersection point) remains the same, preventing possible overdriving of the system. The average apparent loudness after compression is substantially higher than with the uncompressed signal.
The compression ratio above the knee is perhaps 2:1 (i.e., a 2dB variation in the input signal results in only a 1dB change in the compressed output signal). The portion below the knee is 1:1, setting a maximum amount of boost and thereby easing the overall gain requirement on the compression circuitry while still providing a significant amount of “lift” to the softer signals.
Aural enhancement (high harmonic addition)
A few decades ago, recording studio equipment became available that could “excite” the perceived sound of music. The goal was to brighten the character of the sound by doing more than simply increasing the gain of the upper frequencies (turning up the treble). As discussed in the earlier section on synthetic bass enhancement, the human hearing system can actually find pleasing what might otherwise be considered distortion of the original material, and that aspect is used to advantage here.
In particular, the very-gentle introduction of even-ordered harmonics is said by many to add “warmth” to the sound of amplified music. Vacuum tube systems are well known for this. In the case of aural enhancement, only the high end of the audio spectrum (for example, 1kHz and above) is separated out from the signal path, even order harmonics are generated and included in a controlled amount, and the resulting modified signal is reincorporated into the audio stream in an adjustable amount.
This effect adds “sizzle” or a “crystalline” character to the sound that many find pleasing, depending on the listening material. Also, since the effect is in the mid- to high-frequency area where the ear is more sensitive, the program material also appears to become somewhat louder.
Soft clipping
Many portable-audio devices incorporate techniques that prevent their audio amplifiers from being over-driven or allowed to saturate (clip), a condition that could potentially harm the speaker and at the very least produce a crackly, unpleasant sound. Even with such protections in place, however, audio levels that exceed the amplifier’s output range may still occur.
One method of easing the audible consequences of saturation is the use of soft clipping, a technique that senses when the output voltage of the amplifier is approaching its limit (at the positive or negative rail) and rounds-off the waveform to prevent sharp-edged, hard hits against the rails. This decreases the high frequency energy that would otherwise be produced by flat-topped or sharply cropped output waveforms, easing back the unpleasant crackling effect and reducing the excess high frequency energy that would otherwise be sent to the speaker.
Figure 5: Soft clipping.
Speaker protection
With all the efforts made to maximize the perceived loudness of the sound emanating from portable device speakers, care must be taken to avoid damaging the speaker itself. These little transducers can only take so much. There are two primary areas of concern regarding speaker protection: maximum diaphragm excursion and maximum voice-coil temperature.
Figure 6 shows a cross-sectional view of a typical loudspeaker. One can clearly see physical limitations on the movement of the diaphragm, particularly in the downward direction. No audio signal should be allowed to become strong enough to cause the vibrating elements to come into contact with the fixed chassis assembly, or to cause the suspension materials (surround ring or spider) to become overstressed.
Additionally, the RMS value of the audio signal should not be allowed to become large enough to cause significant heating in the voice coil. Excess heat in the coil can lead to distortion of the circular shape of the coil former, leading to scraping along the edges of the magnet or pole pieces. Also, high temperatures in the coil can cause its electrical insulation to deteriorate, eventually allowing turns of the coil to short together, dropping the coil impedance and overburdening the amplifier. High voice-coil temperature can also lead to heating of the permanent magnet, potentially causing demagnetization.
Techniques employed to prevent speaker damage include automatic gain control (AGC) that is responsive to either the input-signal amplitude and/or the power-supply voltage, dynamic range compression (as discussed above), hard-limiting, soft clipping, and amplifier-output over-current sensing. The disadvantage of these is they are feed-forward methodologies that have no means of sensing the actual speaker-cone excursion, voice-coil temperature, or speaker impedance (which varies proportionately with temperature). More-sophisticated protection schemes will likely become available in the future--such as thermal feedback--but for now, standard practice includes one or more of the protection mechanisms mentioned above.
Reference
"Reproducing Low-Pitched Signals through Small Loudspeakers,"
http://www.aes.org/tmpFiles/JAES/20110422/JAES_V50_3_PG147.pdf
No comments:
Post a Comment