Optimized Architecture®

The latest DSP equipment can handle very large signal routing matrices and mixing, plus a myriad of features, functions and control options. Organizing all this into an intuitive interface is a challenge, to say the least. ASPEN processors and the GUI present this complexity in a simple manner that maximizes the DSP power, eliminates the need to compile and download saved files to the hardware, and provides ALL available signal processing on all inputs, crosspoints and outputs at all times. Connections to the hardware operate in real time, so changes and settings take place immediately while the system is running.

Available Signal Processing

Optimized Architecture uses the full power of the available DSP resources to provide features and functions without wasting DSP power on mundane tasks like managing a constructed signal flow that may or may not be optimal.
The architecture and DSP firmware are characterized by the following features:

Rich signal processing tools in all models:

  • Clipping detectors
  • RMS level meters
  • Active channel detector for every mix
  • Input gain stages
  • Noise reduction filter (NRF) on every input
  • Automatic feedback elimination filters (ADFE) with eight notch filters
  • Four stages of fourth-order input tone control filters
  • 0-100 ms input delay, input compressors
  • 48 automatic mixers with Lectrosonics’ patented automatic mixing algorithm that supports five mixing modes independently controllable for every crosspoint
  • Eight stages of fourth-order equalizers on each output
  • 0-250 ms output delay
  • Output compressors
  • Output limiters
  • Output gain stages
  • Output RMS level meters
  • Four signal generators:  white noise, pink noise, adjustable frequency sine wave and a sweep generator with programmable sweeping options

Additional signal processing the conference models:

  • DTMF signal generator
  • Line acoustic echo canceler
  • Acoustic Echo Canceller (US Patent Pending)

Every signal processing block is available without restrictions. All of them on every input and every output channel can be activated simultaneously and set to any value without running out of DSP resources. There is no DSP resource meter (aka “gas gauge”) because it is not necessary.
Every DSP block can be enabled/disabled during normal operation and every parameter can be adjusted in real time using either the GUI based ASPEN controller or the command terminal interface via RS-232, USB, and ethernet ports. Every filter stage can implement any available filter type and their parameters can be adjusted across the entire frequency range.

All gain stages, i.e. input, output and crosspoint gains are implemented smoothly using crossfading in the logarithmic (dB) domain to prevent abrupt, and thereby audible, level changes.

The entire DSP signal chain is implemented in a single audio sample. At the 48 kHz sampling rate, this equates to 20.833 μs. The A-D converter at the input and the D-A converter at the output have much more latency, several audio frames in length.

The order of the signal processing blocks has been carefully determined according to best practices. For instance, the compressor and limiter are located at the end of the signal processing chain which results in adjusting the signal level after all other signal processing has been applied. The pre-determined order of signal processing tasks is not a restriction of flexibility and is not a weakness. It is, in fact, a strength and key benefit of the architecture because it guarantees optimal operation.

The units are stackable, which provides a practically unlimited number of input channels and 48 system wide available mixes. The DSP capacity scales proportionally with the number of inputs and outputs with minimal latency. Every additional unit adds only 6 audio samples (125 μs) of delay to the single unit’s 1.33 μs base delay (measured from analog input to analog output). 200 inputs are handled with only 4.33 ms latency (1.33 ms for the master PCB plus 24 additional PCBs at .125 ms each).

Inputs and outputs in separate units are automatically time aligned for up to 100 units in a stack.

The DSP firmware is written in assembly language and optimized for speed and audio performance. It takes full advantage of the SHARC® processors’ special features:

  • SIMD architecture - Single Instruction Multiple Data in which the DSP uses two processing elements executing the same instruction on different data
  • Chained DMA transfers (Direct Memory Access)
  • Delay line DMA
  • Register content switching

Whenever possible, the firmware processes two channels simultaneously to take full advantage of the SIMD architecture.

The entire DSP program and most of the data (except for some large arrays) and signal processing parameters are held in the on-chip memory to maximize speed. Intermediate variables are always held in registers which have extended precision (40-bit floating point resolution). This practice helps to keep the digital rounding errors below the audible level. The 40-bit extended precision uses 32 bits of mantissa which is equivalent to 192 dB dynamic range that is further extended by the exponent.
ASPEN processors also use one of the best digital filter implementation techniques:  the resonator based digital filter architecture, with orthogonal state variables that guarantees minimal sensitivity for coefficient rounding as well as minimal rounding noise [1].

Analog Devices SHARC® DSPs

The ASPEN family audio processors were designed using the latest and most powerful SHARC® DSPs available from Analog Devices that were available at the time of product release.


ASPEN Bus Signal Flow

Each processor takes mix bus signals from the unit below it, adds signals from its inputs and passes the updated sub-mix to the next unit above it. These sub-mixes continue to accrue in the Forward Propagation and arrive at the Master unit at the top of the stack. The Master unit adds signals from its own inputs and generates the Final Mixes that are then propagated back to all units below it in the stack. The source for all outputs in all units in the stack are taken from the 48 Final Mixes.


Every unit in the stack has access to all 48 Final Mixes even if the unit itself has only a few physical outputs. This signal flow structure simplifies the setup of the signal routing and matrix assignments and also allows the use of any physical output on any unit in the stack to deliver any of the final mixes to an external device.

One of the advantages of this signal flow structure is realized when multiple physical outputs all deliver the same final mix signal to different locations. Each output can process the audio differently with unique settings for delay, filters, compressor, level and limiter to suit the needs of each location. For example, a particular final mix can be sent to an auditorium sound system from one output, to a small sound system in a lobby from a second output, and to a media feed outlet for recording from a third output, with each output having its own unique signal processing.

Hardware Architecture

The variety of models in this series are created by combining “building block” circuit board assemblies:

  • 8 input, 12 output mixer board
  • 16 channel input only board
  • 8 channel input only board
  • Conference interface board  (standard and wideband versions)

A single board can be enclosed by itself in a stand-alone 1RU chassis, or combined with another board in a 2RU chassis to create a variety of models. The 2RU models include an LCD with comprehensive access to all system settings and activity.

ASPEN Software



The Feature Bundle


The result of Optimized Architecture provides a full suite of signal processing features and functions in even the smallest ASPEN model - the SPN812. This is a single rack space 8 in/12 out processor that can address the entire ASPEN matrix.


Dual board models use two boards of one type or another to create a variety of models to suit the needs of a particular installation. Each additional board adds its own processing blocks. For example, the SPN1624 uses two 812 boards, so the number of processing blocks is twice the list at right, except for the matrix crosspoints (the matrix supports a maximum of 48 outputs).


Additional inputs are added to the matrix when additional processors are added to the stack with no practical limit.

The SPN812 includes the following:


ASPEN Signal Flow Diagram


Noise Reduction Filter

The Problem

Fighting noise is a very old problem because noise cannot be eliminated. All electronic devices, resistors and other passive components generate noise. In audio systems all these can be kept below the audible level but ambient noise may not be avoided even if the audio system is designed according to the best practices. In these cases some noise reduction technique can help. Here we discuss two types of noise reduction techniques:

  • Noise reduction filtering (NRF) is a “blind” method. A signal processing method is called “blind” if the statistical properties of the involved signals are known but the actual values of them are unknown.
  • Noise cancellation (NC) is a reference based method. In this case we have access to the noise source but its actual effect on the signal is unknown.

These two approaches work in different manners and incur significantly different costs. NRF is essentially “free” in that it is an integral part of the signal processing chain on every input channel. NC, on the other hand, is a significantly more complex method that requires additional signal processing resources to implement, therefore, it is much more expensive than NRF. See Table 1 for a more detailed comparison of the two techniques.

Since ASPEN processors apply settings immediately, the amount of NRF applied to any one or more inputs can be adjusted in real time as the system is operating by a control device connected via ethernet, USB or RS-232 ports, using macros, or directly adjusted using the command terminal interface included in the ASPEN software.

Noise Reduction Filtering

The theoretical background of this method is optimal or Wiener filtering after Norbert Wiener who researched this area in the 1940’s and published his results in 1949.[1]
The signal model of optimal filtering is shown in figure 1.



The signal of interest (X) is contaminated by an additive noise (V). In order to improve the signal integrity, the observed noisy signal (Y) is passed through a filter. The output of the filter (Z) is an estimate of the unknown signal (X). The estimation error, the difference of X and Z, has the lowest possible power if the frequency response of the filter (H) is given by:


In ASPEN we use a 30-band 1/3-octave filter bank to implement the noise reduction filter. Figures 2 and 3 on the next page show an example in which the noise (cyan) has equal power in every band, i.e. it is a pink noise and the signal (blue) concentrates its power in the mid-audio range. Figure 2a and 2b show the signal and noise before and after the filtering, respectively. Figures 2 and 3 differ in that in figure 2 the signal is plotted on the top of the noise and in figure 3 the noise is on the top.

We can clearly see the benefit of the filtering. The attenuation is negligible in those bands where the signal has much more power than the noise, i.e. the signal-to-noise ratio is high; and the attenuation is high in the bands with poor signal-to-noise ratio. The result is an improved overall signal-to-noise ratio at the cost of some linear distortion of the signal.


There are two reasons why the Wiener filter cannot be used in its original form as a noise reduction filter in audio systems:

  • The signal and noise spectra are unknown therefore the equation for the frequency response of the optimal filter cannot be evaluated.
  • We may assume the noise spectrum to be quasi stationary (changing slowly), but the audio signals (such voice signals) are highly time variant.

To implement a noise reduction filter we made some assumptions and complemented the Wiener filtering algorithm with an adaptation method that automatically separates the signal and noise spectra and continuously changes the filter  parameters (see figure 4). Proper operation of the noise reduction filters in ASPEN requires that:

  • The noise be quasi stationary but its spectral distribution can be arbitrary.
  • The audio signal changes its spectral distribution rapidly. Stationary components will be misidentified as noise hence they will greatly be attenuated.


Main features of the NRF in ASPEN:

  • Every audio channel has an NRF so it scales with the size of the system.
  • The depth of the noise reduction is adjustable in a wide range: 6 dB - 36 dB.
  • 30 frequency bands all have 1/3-octave bandwidth.
  • Zero latency, minimum phase.
  • Optimal (Wiener) filtering algorithm.
  • Fast adaptation.

Follow these simple rules to use the NRF:

  • Enable the NRF only if it is necessary (the noise is distracting).
  • Enable the NRF only for the noisiest microphones.
  • Use the minimum noise reduction depth that effectively attenuates the noise. This may greatly vary depending on the characteristics of the noise and the acoustical properties of the environment as well as on personal preference.

Noise Cancellation

Noise cancelers (NC) use an adaptive filter to reconstruct the noise that effects the original signal. In this case a reference signal is needed which usually is a microphone placed close to the noise source. In this case an adaptive filter is necessary because the relationship between the noise source and the actual noise that contaminates the audio signal is unknown. 

Note the difference between NRF and NC. While the NRF is placed directly in the signal path, the NC predicts contamination and subtracts it from the observed signal. Therefore the signal is unaltered resulting in a higher sound quality. The computation burden of NC is comparable to an acoustic echo canceler which usually requires a dedicated DSP for every one, or at least every two, audio channels.


Table 1. Comparison of FRF and NC


ASPEN Automatic Mixing

Purpose and Function

An automatic mixer is a hardware/software solution to two fundamental issues that arise when multiple microphones are used in a sound reinforcement system:

  • Acoustic Feedback
  • Intelligibility

Acoustic feedback occurs when the sound from a loudspeaker system re-enters the microphones and is then returned to the loudspeakers. This recirculating loop (oscillation) will produce either sustained “ringing,” or if the gain is high enough, loud howling or squealing when the system goes into runaway feedback.

Intelligibility in a sound system is a measure of how well listeners will understand what is being said. Distortion of the sound, accompanying noise and a mix of several different voices all have a destructive effect on intelligibility.
Automatic mixing attenuates inactive or lesser used microphone channels to address the issues listed above. There are several different approaches to auto mixing ranging from simple gating (turning channels on or off) to more sophisticated and natural sounding techniques using continuous gain modulation.

Correctly implemented, an auto mixer will maintain individual channel gains so the final mix of all channels is equal to one microphone at full gain. This design goal and its result is normally expressed as the Number of Open Microphones = 1, or NOM = 1. When this goal is achieved, a sound system will be just as stable against acoustic feedback with many microphones as it is with just one microphone.

The ASPEN auto mixing algorithm operates in the same manner as a human operator mixing a conference manually on a console. Unused and less active mics are turned down and those in use are turned up.

Auto mixing is also very beneficial in teleconferencing even when there is no sound reinforcement system in place. Background noise gathered by inactive microphones and echo that returns from the far end of a conference are both suppressed, which improves the intelligibility of the system significantly.

Gating vs. Adaptive Proportional Gain

Lectrosonics pioneered adaptive proportional gain automatic mixing algorithms with patents issued in the mid 90’s. The proprietary algorithm employed in ASPEN processors[1] is a seamless process that eliminates abrupt switching (gating), controls acoustic feedback and suppresses background noise and comb filtering.

All active input channels are summed, and then the level of each channel is compared to the total sum. A gain value is applied to all channels so that the sum is equal to one channel at full level (NOM=1). Channels are never turned off, but instead, the gain is adjusted continuously to eliminate abrupt level changes that are audible.

The patented algorithm includes a unique adaptive skewing process that applies a subtle priority to the channel that has been the loudest for the longest period of time. The skewing further reduces the gain on inactive and lesser active channels and prevents comb filtering by never allowing two channels to be mixed at the same level.


Auto Mixing at the Matrix Crosspoints

Conventional auto mixing applies gain control or attenuation at the inputs to the mixer. This is useful in a basic installation, but it imposes additional complexity when a microphone signal is to be used for multiple purposes, such as sound reinforcement, recording and teleconferencing at the same time.

ASPEN auto mixing takes place at the matrix crosspoints, which allows a single input signal to exhibit a different behavior at different outputs. For example, input channel 4 could be configured for Auto behavior (normal auto mixing) in the mix feeding at output 6 for local sound reinforcement, Direct behavior (no attenuation) in the mix feeding output 10 for recording, in the Override mode as the dominant (chairman mic) at another output, and in the unique Phantom mode at another output for mix-minus zoning in sound reinforcement. The input is routed to multiple crosspoints, each with a different mixing mode.

Crosspoint Auto Mixing Modes

The auto mixing mode is set in the matrix display in the control panel software. There are five different behaviors available:

  • Auto - normal gain proportional auto mixing
  • Direct - no attenuation
  • Override - dominant in auto mixing activity
  • Background - subordinate in auto mixing activity
  • Phantom - special mode for mix-minus systems

The desired mode is selected from a dialog box that opens with either a right or left click of the mouse.


The Phantom mode is used to combine the auto mixing signal contribution in multiple zones without delivering the actual audio signal into the zone. This is used in mix-minus reinforcement systems to let every microphone in the overall room participate in the auto mixing gain allocation, yet preserve the audio signal routing defined in the mix-minus setup.

Final mixes are defined for each loudspeaker zone in the setup. The microphones within each loudspeaker zone are set to the phantom mode as shown in the diagram.

Auto Mixing in a Teleconference

This auto mixing algorithm, working in conjunction with the AEC in the ASPEN Conference processor, provides impressive echo cancellation.

Auto mixing in the local reinforcement system suppresses echo returned to the far side by lowering the level of inactive and less active microphones. This reduces the echo return path for far side signals delivered by the local loudspeakers.

The gain proportional algorithm is also applied to the near and far side signals in a teleconference. When the near side is louder in the conversation, the gain for the far side signal is reduced. When the far side signal is louder, the near side level is reduced, which gives the AEC the opportunity to converge even further.

The ASPEN AEC is uniquely able to maintain convergence during the auto mixing activity. It will not diverge during double talk, and it will continue to deepen the convergence at every opportunity when the far side signal is louder than the near side signal.

NOTE:  Refer to the white paper entitled Acoustic Echo Cancellation for details on the background and performance of the ASPEN AEC.


Acoustic Echo Cancellation

The Problem

Teleconferencing with a sound reinforcement system poses a difficult problem caused by coupling between loudspeakers and microphones located in the same room. Sound from the far side of the conference is delivered into the room through local loudspeakers and enters the local microphones along with the sound from the local participant voices causing an echo to be heard at the far side.


In addition to being mixed with local talker’s voices, reflections off of surfaces in the room, some of which are delayed by longer path lengths, also mix with the sound from the loudspeakers. If this echo-contaminated signal is sent to the far side, they will hear themselves as an echo along with the sound of the near side talkers.

The sound from the loudspeakers is attenuated by the loss due to the distance between them and the microphones, and reflections in the room are absorbed by acoustical treatment in the building materials. This loss in level is called ERL (echo return loss).

Even with best practices in building construction materials and sound system design, a significant amount of far side sound will be picked up by the microphones. Digital processes to further remove far side sound are called ERLE (echo return loss enhancement). The most common of these is a digital process called AEC (acoustic echo cancellation).  AEC is a DSP-based process used to remove as much of the sound from the local loudspeakers as possible from the signal that is sent to the far side. When the AEC processing identifies and removes the echo, it is said to have converged.

The total amount of echo suppression is the sum of ERL and ERLE. For example, if a room has a natural ERL of 15 dB and the ERLE (AEC cancellation) averages 25 dB, the total echo suppression is 40 dB. The amount of echo suppression varies as the system is operating when gain values in the local sound system are changed and the echo return paths change when different microphones are used or moved. These changes make is more difficult for the AEC to converge and remain converged at a deep enough level to effectively remove audible echo heard at the far side of the conference.

The AEC uses the far side received audio as a reference signal that is fed to the AEC so that it can be identified and removed from the local signal that is to be sent to the far side. The difficulty with this is the fact that during the coupling from the loudspeakers to the microphones, the signal is modified by reflections in the room and non-signal noise, which is depicted as EPF (echo pass filter) in the diagram.

The echo-contaminated far side signal is mixed with the sound from the local talkers and becomes the input to the AEC. The job of the AEC is to construct a digital filter that can be applied to remove the far side signal (echo) before the signal is sent to the far side. This filter is depicted as ERF (echo reconstruction filter) in the diagram.


The “magic” in the process takes place in the Adaptation Processor where an advanced DSP algorithm is continuously monitoring the effectiveness of the ERF and updating it as needed to remove as much of the echo as possible.


ASPEN conference processors employ a proprietary AEC (US Patent Pending) that is extremely fast converging, will not lose convergence during double-talk (both far and near sides equally active), and will continue to deepen the convergence with every tiny opportunity where the far side audio is dominant over the near side audio. The AEC is so robust, in fact, that it can handle any number of microphone input channels, all mixed with the patented, gain proportional auto mixing algorithm.*

This unique AEC makes an ASPEN system scalable so that any number of inputs can be added without having to purchase additional DSP processing power.

ASPEN AEC Performance

This illustration was created from an actual audio conference recording while the ERLE convergence depth was plotted along with the audio from both sides. The recording is 30 seconds in length and the illustration includes four different segments that demonstrate the effectiveness of the ASPEN AEC in a real world situation.

ERLE composite 11x17

[1] In the first segment, the far side signal is dominant and the AEC converges to an ERLE depth of 24 dB within 1.5 seconds. Then it picks up another 2 dB and maintains the convergence depth for another few seconds.

[2] At 10 seconds into the recording, a microphone is moved, which changes the path length between the loudspeaker and microphone. This requires that the AEC re-converge, which it does to a depth of a little over 20 dB, then maintains the convergence as the conversation moves to the near side being dominant.

[3] At just over 13 seconds into the conversation, the activity moves into what is called double talk where both near and far sides are talking at the same time and at similar levels. The AEC maintains the convergence depth during this period.

[4] At about 24 seconds into the recording, there is a brief pause at both sides, followed by the far side again becoming dominant. This allows the AEC to increase the convergence depth with brief peaks in the far side signal. This attribute of the AEC is evident at 26 seconds into the recording when there is a brief peak in the far side audio that coincides with an increase in the convergence depth.