PFB_32, A 32-pnt Biplex Pipelined Polyphase Filter Bank

Parsons, A., Werthimer, D.

Background

This document is intended to be a guide for implementing and customizing an architecture for performing a polyphase filter bank (PFB) in a Xilinx FPGA using a blockset designed under the Matlab/Simulink/System Generator toolset. These designs are the property of the UC Berkeley SETI group, copyright 2004. The latest revision of the files in question is here.

The PFB was invented to account for the fact that the Discrete Fourier Transform (DFT) is an imperfect approximation of a set of band-pass filters. Ideally, each sample out of a DFT would represent the response of a perfectly rectangular band-pass filter, centered on an integral multiple Fn/N (the Nyquist frequency divided by size of the DFT), with a width of exactly Fn/N. In reality, each sample out of a DFT has a sin(x)/x frequency response. The PFB squares off this filter shape by first convolving the input with a sin(x)/x function over a long time window before following up with an DFT. The length of this window determines the steepness of the edges on the filter; typically, it may be 4 to 8 times the length of the DFT.

View a plot illustrating the improved filter shape of a frequency sample using a PFB vs. a DFT

The FIR Filter

In the architecture implemented here, the convolution is performed in an FIR filter. The taps of this filter draw signal samples spaced out in time in increments of the length of the DFT. That is, for a 32 point DFT, each tap of the corresponding FIR filter is seperated by a 32 sample delay. Each tap is fed by a rotating cycle of coefficients. These coefficients are determined by taking a length pi*T (where T is the number of FIR taps) interval of a sin(x)/x function, centered on zero, and divided into T*L (where L is the length of the DFT window) samples. Each tap is fed by one length of L of these samples, arranged so that the first input sample is multiplied by the sin(x)/x sample corresponding to the highest x, and the last input sample is paired with the lowest-x sin(x)/x sample. Filter response can be modified (generally by trading steepness of rolloff for flatness in the passband) by applying a windowing function to the sin(x)/x coefficients. Two commonly used windows are the Hamming window (1/2 + cos(x)/2) and the Parzen window (1 + x for negative x, 1 - x for positive x). Thus for an 8 tap FIR with a Hamming window, the coefficients correspond to the function:

(1/2 + cos(x/4)/2) * sin(x)/x on the interval [-4*pi, 4*pi).
View a Simulink implementation of an 8 Tap FIR filter, sans coefficient generator.

The windowing function and, to a lesser extent, the number of taps determines signal gain through the FIR filter. The gain varies from sample to sample, but repeats over the window length of the DFT. It makes the most sense to talk about the gain of a signal through the entire PFB. Parcival's theorum states that the power into a DFT is equal to the power out of one (barring any additional rescaling added to it), so the gain of a PFB is equal to the FIR gain, summed over the window length of the DFT. Thus, one can calculate the gain of specific filter to noise by summing the squares of all the FIR coefficients and dividing by the length of the DFT window. An 8 tap FIR without an additional windowing function has a gain of -0.11 dB. Applying a Hamming window reduces gain to -0.47 dB, and a Parzen window further reduces gain to -0.93 dB.

FIR Implementation

When considering trade-offs between hardware resources and performance, it can be useful to evaluate the effects of filter taps and finite precision arithmetic on the filter response of a PFB bin. However, some of the following plots can be misleading. Finite precision arithmetic places a noise floor on plots which can easily be misinterpreted as a floor of the filter response. For example, a plot may show that a high frequency signal causes a -40 dB response it a PFB bin. However, if this related to finite precision arethmentic (i.e. if this is caused by rounding in the least significant bit) then this does not mean that a +40 dB signal at that frequency will cause a response of +0 dB in that bin. Rather, the features of the filter shape below the noise floor have been obscured by rounding error. The features are there--the plots just do not illustrate them.

In considering the effects of various parameters on the filter response of a PFB bin, the standard design with be a 32-pnt PFB with 8 FIR taps, 8 bit input samples, 8 bit FIR coefficients, 18 bit samples into an FFT, and 18 bit samples out of the FFT.

View a plot illustating the dependence of filter shape on the # of FIR taps.
View a plot illustating the dependence of filter shape on the bit width of input samples.
View a plot illustating the dependence of filter shape on the bit width of FIR coefficients

As an interesting aside--the means by which an input signal is converted to a fixed point number can have a substantial effect on quiescent power in th 0th (DC) frequency bin. If an input signal is floored (truncated) to 8 bits, as is typically done by ADC's, a sizeable DC offset is introduced to the signal, and all frequencies tend to leak power into that bin. However, if the input signal is rounded to the nearest bit value, the DC bin displays the expected response. The lesson here is that it may be necessary to account for the flooring nature of the ADC by adding a small DC bias (if the ADC has no other DC bias, this will only be 1/2 the value of the least significant bit (LSB)) back into the signal.

View an overlay of all 32 PFB bins with a floored conversion of the input signal to a fixed point number.
View an overlay of all 32 PFB bins with a rounded conversion of the input signal to a fixed point number.

Implementing the FIR portion of the PFB requires the Simulink library file "pfb_library.mdl". The simplest solution is to instantiate a "pfb_fir" block, which implements a variable tap FIR filter with a windowing function of your choice. Bit widths of the input samples, output samples, and coefficients may be set via this block's mask parameters. This block handles the real and imaginary components of 2 independent input signals, and uses one set of coefficients for all FIR filters. P1_in and P2_in are the complex inputs for two independent data streams. The upper half of the bits of these signals is interpreted as the real component of the complex signal, and the lower half is interpreted as the imaginary component. The "Sync_in" input is used to reset the rotation of coefficients into the FIR filters. This signal is a boolean vector warning signal, which is to say, it should be asserted one clock before valid data appears on the data inputs. (P1_in, P2_in). Only one pulse of this signal is necessary to set the rotation of coefficients for the entire operation of the PFB, but since the rotation of coefficients returns to its initial state every N clks (where N is the length of the DFT), additional pulses can occur every N clks without interrupting the operation of the PFB. "Sync_out" is an echo of the vector warning signal which appears one clock before the output samples are valid.

One implementation quirk is that the "pfb_fir" block breaks its link with the "pfb_library" block in order to modify its own contents. It may be necessary to restore that link (which it will promptly break again) in order to propagate the latest library revisions to the model block. Additionally, the block only updates itself when the number of taps is changed (many of its sub-blocks also have this property), so it is sometimes necessary to change the number of taps to some arbitrary different number, and back again, in order to force the block to reimplement itself.

The Radix-2 Biplex Pipelined FFT

The Fast Fourier Transform is an arithmetically optimized algorithm for performing a DFT. It makes use of intermediate computations to ensure that redundant computations are not performed when integrating an input signal against sine waves of various frequencies. However, there are various mathematically equivalent ways of implementing an FFT in hardware. A typical solution, paralleling a Radix-2 butterfly diagram, is to buffer an entire window of input samples and for each butterfly drawn, implement a separate Radix-2 butterfly in hardware (a Radix-2 butterfly, by the way, takes 2 inputs (a, b) and a coefficient (w) and forms the results (a + b * w) and (a - b * w)). This is an ideal solution if all input samples are arriving simultaneously, but is inefficient if they show up one at a time, as they do when digitizing a signal from a radio telescope. The Biplex Pipelined architecture reuses butterflies in hardware and minimizes the buffering of data. This provides a modest increase in hardware efficiency for a single polarization of data, but the true power of this architecture lies in its ability to perform the FFT of a second polarization at no additional cost in hardware.

View a biplex pipelined implementation of an 8 point FFT.

Implementing a biplex pipelined FFT in Simulink requires the Simulink library file "fft_library.mdl". Instantiate an "fft" block, and set its length via the 'Length of FFT' parameter. As described in the FIR Implementation section, this block breaks its own link to the "fft_library" block, and may require a link restoration to propagate the latest revisions to the implemented block. Also, similar the the FIR tap contingency, it is sometimes necessary to change the FFT length to some different value and back again in order to force the block to reimplement itself. The "fft" block has several parameters available to customize its implementation. 'Bitwidth' adjusts the quiescent data width into and out of each FFT stage. The 'Implement Delays In BRAM' option specifies the cut-off on when a sample delay should be implemented in BRAM vs. Distributed RAM. Similarly, the 'Use BRAM for Coeff LUTs' parameter controls the cut-off for when coefficient look-up tables should be implemented in BRAM. 'Shift Every ? Stages' controls how often samples are divided by 2 in order to prevent overflow in finite precision arithmetic as power is accumulated into frequency bins. Typically, gaussian noise will require a shift every second stage (equivalent to rescaling data by root-two every stage), and a pure frequency will double the power in selected bins every stage, requiring a down-shift every stage to prevent overflow. Finally, 'Cap Coefficient LUT Depth' sets a limit on how many coefficients a butterfly is allowed to use. For large FFTs, later stages may require vast numbers of coefficients which do not differ largely from one another, and usually can perform nearly as well with far fewer coefficients.