DSP-FPGA.com
home
articles
products
newswire
vendors
E-letter
E-cast Schedule
articles > FPGAs
FPGA
RSS Link


FPGAs rapidly replacing high-performance DSP capability

By Paul Ekas
Altera Corporation

Real-time signal processing and the applications that utilize it are continually changing the electronics landscape. This market-driven phenomenon creates a demand for greater speed, effectiveness and portability, and the consumer wants it now. This dynamic continues to propel market growth and the pace of change is accelerating. For the semiconductor provider, that fast adoption means fast time-to-market along with a quick ramp-to-volume.

Signal processing is the link from the real world to the computing world. As increasingly complex algorithms are implemented using digital signal processing, the performance demands of these algorithms rise exponentially. For cost-sensitive, high-volume applications like cellular phones, set-top boxes, and PC graphics cards, this has driven the development of extremely specialized ASSPs. However, for many other applications, the only options for implementing high-performance digital signal processing have been general-purpose Digital Signal Processors (DSPs) and, more recently, Field Programmable Gate Arrays (FPGAs).

DSPs have typically been used to implement many of these applications. Although DSPs are programmable through software, the DSPs’ hardware architecture is not flexible. Therefore, DSPs are limited by fixed hardware architecture such as bus performance bottlenecks, a fixed number of Multiply Accumulate (MAC) blocks, fixed memory, fixed hardware accelerator blocks and fixed data widths. The DSPs’ fixed hardware architecture is not suitable for many applications that require customized DSP function implementations.

FPGAs provide a reconfigurable solution for implementing traditional DSP applications and offer higher DSP throughput and raw data processing power than DSPs. Since FPGAs are reconfigured in hardware, FPGAs offer complete hardware customization while implementing various DSP applications. Therefore, DSP systems implemented in FPGAs can have customized architecture, customized bus structure, customized memory, customized hardware accelerator blocks and a variable number of MAC blocks.

The functionality of today’s FPGA
FPGAs have included dedicated DSP capabilities since near the turn of the new millennium. Since 2000, the evolution of DSP performance offered with FPGAs has increased by 16 times to about 500 Giga Multiply-Accumulate operations per second (GMACS). In the same period, digital signal processors have increased in performance from 1.6 GMACS to today’s 8.0 GMACS. Many applications require only minimal DSP performance, equivalent to the performance provided by a low-cost FPGA (i.e. Altera®’s Cyclone® II devices, Lattice’s ECP2M family or QuickLogic’s Eclipse II family). However, for applications requiring the performance of many digital signal processors, a single high-performance FPGA (Altera’s Stratix® III or Xilinx’s Vertex 5 ) can replace those processors, thereby significantly reducing power, board space and cost as well as delivering more than equivalent DSP performance, as illustrated in Figure 1 and calculated in Table 1.

for applications requiring the performance of many digital signal processors, a single high-performance FPGA (Altera’s Stratix® III or Xilinx’s Vertex 5 ) can replace those processors, thereby significantly reducing power, board space and cost as well as delivering more than equivalent DSP performance
Figure 1

Total 16x16 Multiply Accumulates/Sec.

 

1990

1995

2000

2002

2005

2007

TI DSP

33

160

600

2,400

8,000

17,000

Altera FPGAs

35

436

5,016

47,520

255,960

565,400

 

 

Calculations of MMACs Performance for TI

TI DSPs

5x

5416

6203

64x

64x

Projected

Total 16x16 MMACs

33

160

600

2,400

8,000

17,000

# of 16x16 MAC units

1

1

2

4

8

2 x 8 + ARM

Clock Rate (MHz)

33

160

300

1,000

1,000

1,000

 

 

Calculations of MMACs Performance for Altera

Altera FPGAs

Flex 10K

APEX

APEX II

EP1S80

EP2S180

EP3SE110

Total 18X18 MMACs

35

436

5,016

47,520

255,960

565,400

 

Hard 18X18 MMACs

-

-

-

26,400

172,800

492,800

Soft 18X18 MMACs

35

436

5,016

21,120

83,160

72,600

 

Total 18X18 MAC units

1,056

6.6

50.16

193.6

621.6

1041.2

Hard 18X18 MAC units

0

0

0

88

384

896

Soft 18X18 MAC units
(using 250 LEs)

1,056

6.6

50.16

105.6

237.6

145.2

1/3 LEs available used for DSP

264

1,650

12,540

26,400

59,400

36,300

 

Hard 18X18 Speed

0

0

0

300

450

550

Soft 18X18 Speed

33

66

100

200

350

500

Available LEs

800

5,000

38,000

80,000

180,000

110,000

 Table 2. Calculations for performance evolution – FPGAs vs. DSPs

The performance argument
The key markets driving high-performance DSP requirements are wireless infrastructure; video broadcast equipment, medical imaging and military applications. FPGAs are the preferred programmable DSP platform that addresses these requirements.

FPGAs are being implemented in flexible channel element cards, as in third-generation basestation platforms that include an RF card and a channel card to support various standards and through vertical migration, various channel densities. The basestation can be configured with a minimum of channels or with a large build-out of channels utilizing the same fundamental architecture, while only changing the specific FPGA selection.

Third-generation wireless has largely gone wideband, pushing RF components beyond their linear range of operation. Leading-edge algorithms help address the challenges that require significant processing traditionally beyond the capabilities of DSPs. Mainstream wireless infrastructure equipment now relies mainly on FPGAs to provide the RF linearization processing as shown on the RF card in Figure 2. Of the major functionality on the RF card, all the digital processing shown in blue in Figure 2 can be integrated into a single FPGA. There are two main components in RF linearization: Digital Pre-Distortion (DPD) and Crest Factor Reduction (CFR).

Digital pre-distortion addresses the non-linear transfer function inherent in wide-band amplifiers by pre-distorting the signal to be amplified by an inverse function to that inherent in the power amplifier. This requires a feedback path from the amplifier into an adaptive algorithm that dynamically updates the inverse transfer function to compensate for variations in the amplifier equipment and changes that occur across operating temperature.

Crest factor reduction squelches peak signals to reduce the peak to average ratio of a signal going through a power amplifier. This enables a significant cost reduction in the power amplifier portion of a wireless basestation. Most wireless infrastructure manufacturers develop their own algorithms and thus utilize FPGAs for differentiation.

In addition to DPD and CRF, the RF card includes digital up conversion and digital down conversion. These functions are relatively simple filters, but require huge DSP bandwidth. It is the requirement to support these functions that have driven FPGAs to include many hundreds of multipliers on a single FPGA. The ability of an FPGA to support this huge processing bandwidth while integrating evolving DPD and CFR algorithms that has made FPGAs the defacto solution for the RF card digital processing of ASICs.

Another area where FPGAs are the preferred processing platform is in WiMAX baseband processing where the huge computational requirements of orthogonal frequency division multiplexing (OFDM) can only be met with ASICs or FPGAs based on Berkeley Design Technology, Inc. (BDTI), “FPGAs for DSP,” analysis report. FPGAs are being widely utilized for WiMAX basestation implementation as an alternative to ASICs. This is driven mainly by the early development nature of the WiMAX business with high mask costs and very high verification costs of complex ASIC development.

FPGAs are being widely utilized for WiMAX basestation implementation as an alternative to ASICs. This is driven mainly by the early development nature of the WiMAX business with high mask costs and very high verification costs of complex ASIC development.
Figure 2

FPGAs that feature a large number of multipliers, huge on-chip memory bandwidth, massive I/O bandwidth, and the unique and complete flexibility of an FPGA architecture enabled by programmable logic deliver the same or improved DSP performance, with lower power while reducing system cost and decreasing board space.

System designers can create a board with one or several FPGAs that require dozens of DSPs and possibly multiple boards. Because FPGAs support vertical migration, the inheritant scalability of FPGAs, within the same package, a single board and system design can be easily scaled from low-end capabilities to the highest capabilities without requiring multiple board designs. This flexibility is a major advantage as it reduces product line engineering design and verification costs.

High-performance DSP capability in FPGAs
New levels of DSP functionality have been attained in recently announced FPGAs, with latest generation of Stratix series devices delivering up to 896 18x18 along with associated input and output registers and addition and accumulation logic. The implementation of DSP functionality in high-end FPGAs has been optimized in silicon to maximize performance with low silicon area and low power consumption. A silicon DSP block has two physical constraints: the amount of periphery and the amount of area used.

The DSP block periphery routes 144 input wires and 144 output wires as well as control signals. The area of this DSP block enables four 18x18 multipliers that align with the total number of input and output signals. At the silicon level, varying the ratio of periphery to area of a DSP block enables more I/O or more block-level logic. At the system level filtering and transforming algorithms rely on sum-of-multiplication operations for the majority of processing requirements. Optimizing the core area of a DSP block enables twice the number of sum-of-multiplication operations when required, thereby reducing the periphery I/O requirements relative to the overall computation. By completing more of a DSP algorithm within a DSP block, the overall silicon efficiency increases dramatically.

A DSP block with eight 18x18 multipliers and associated register, accumulator and rounding/saturation circuitry is shown in Figure 3. The use of the multipliers is limited by the output wires from the DSP block, not the area of the logic; this enables approximately 50% greater silicon efficiency.

A DSP block with eight 18x18 multipliers and associated register, accumulator and rounding/saturation circuitry
Figure 3
(click to zoom)

The total DSP capability of the block peaks for standard algorithms using the sum-of-multiplication operations, such as Finite Impulse Response (FIR) filters or complex multiplies, and simultaneously reduces overall power and resource consumption by not requiring the use of the programmable logic fabric. The number of 18x18 multipliers increases dramatically when a sum-of-multiplication operation is included as part of the algorithm.

A major advantage of FPGAs for many system architectures is the availability of package vertical migration which enables a single board design to support flexible processing performance and cost without respinning the board. System architects use this capability to create products with various price points and performance capabilities without significantly affecting development costs or inventory.

Wireless infrastructure applications are an excellent example of how this flexibility is used. FPGAs used in flexible channel element cards support various standards and, through vertical migration, various channel densities. A basestation can be configured with a minimum of channels or with a large build-out of channels using the same fundamental architecture and only changing the specific FPGA selection. In many developing countries, the focus is on more flexible, upgradeable and service-rich equipment that demands this FPGA flexibility. In these far more price-sensitive regions, the same product can use a structured ASIC for very standardized functionality at reduced cost. A vendor employing this type of solution has a formidable technology advantage for increasing business flexibility without increasing engineering costs.

FPGAs provide much greater I/O bandwidth than competing DSPs for driving system processing requirements, while bandwidth is driven by data input/output and off-chip data storage.

FPGAs have evolved rapidly in the last few years as system-level development tools have enabled system architects to realize the flexibility, scalability, maintainability and high-performance signal processing and control architectures. These tools include DSP system modeling tools, system integration tools, control processing IP, automatic C-to-hardware accelerators and DSP-optimized application IP. Utilizing these tools, designers can rapidly build high-performance architectures that are exactly optimized to meet system requirements. Along with vertical migration and structured ASIC support, system architects can build-in scalability across product line requirements and implement many permutations of products to satisfy various market requirements, while realizing a substantial productivity gain.

The tools and IP necessary to develop architecture entirely within an FPGA are available today, but there are other reasons why a standard third-party processor might be required or desired within the system architecture. When used with an FPGA, a third-party processor can significantly increase system performance while reducing system costs, power and board space through an architecture technique called FPGA coprocessing. In FPGA coprocessing, the FPGA offloads processing-intensive algorithms from the third-party processor. Many systems use a control processor, a digital signal processor, and one or more FPGAs (in which the major processing load is executed), where the control and DSPs are used for legacy software, operating system requirements, or general suitability of processing toward the final application (such as Windows GUI control).

More and more, the core signal processing of high-performance systems has moved to FPGAs. FPGAs deliver the highest performance programmable DSP available in any semiconductor device. There is not a more-flexible system architecture solution available that spans performance, low power, low price, and product breadth and lifecycle requirements.

If you found this article valuable, click here for your complimentary subscription to the DSP-FPA.com Product Resource Guide. You can also subscribe to the DSP-FPGA.com E-Letter, our free e-mail newsletter, by clicking here.

other headlines