articles

C synthesis accelerates the creation of technology-independent DSP hardware

By
Shawn McCloud
Mentor Graphics Corporation

Advanced C/C++ algorithmic synthesis methodologies enable a new possibility for custom, high-performance DSP hardware. Starting with a technology-independent sequential description in pure ANSI C/C++, this development process gives DSP hardware designers a safe and fast path to RTL descriptions. A single ANSI C/C++ source can create small compact designs (serial) to fast, high-performance designs (parallel), allowing optimal designs for the exact system requirements (performance and area) and tuned to any ASIC or FPGA technology. In addition, automation of RTL creation yields up to 20x faster design time and up to 60 percent fewer bugs versus hand-coded RTL, providing a safer, faster, and more reliable design flow.

Advanced C/C++ algorithmic synthesis methodologies enable a new possibility for custom, high-performance DSP hardware. Starting with a technology-independent sequential description in pure ANSI C/C++, this development process gives DSP hardware designers a safe and fast path to RTL descriptions. A single ANSI C/C++ source can create small compact designs (serial) to fast, high-performance designs (parallel), allowing optimal designs for the exact system requirements (performance and area) and tuned to any ASIC or FPGA technology. In addition, automation of RTL creation yields up to 20X faster design time and up to 60 percent fewer bugs versus hand-coded RTL, providing a safer, faster and more reliable design flow.

DSPs vs. FPGAs vs. ASICs

Having evolved for decades, the discrete DSP realm now comprises tailored design environments, offering a complete value chain of stable applications and flows. Working with off-the-shelf DSPs is essentially a programming function, relying on the software engineer’s C/C++ or MATLAB coding expertise to achieve desired functionality. Probably the most important factor that draws the bodystream community to these discrete devices is their ease of use and programmability.

This rosy picture is changing, however. Users have a robust design solution but, unfortunately, the massive increase in processing required for next generation compute-intensive applications, such as wireless communication and image processing, has created a gap between off-the-shelf DSP performance and market needs. In many cases, discrete DSPs are simply running out of steam to serve the new communications, multimedia, and consumer applications. In recent years, users have increasingly looked toward alternative solutions ranging from ultra-high performance full-custom ASICs to highly flexible general-purpose CPUs. Somewhere in the middle are FPGAs, providing a cost-effective balance (Figure 1) between programmability and high performance. With their processing flexibility ranging from serial to parallel computing, and now containing highly specialized DSP macros and memories, FPGAs have the potential to become an attractive option in which to implement DSP algorithms.

Figure 1:
(click graphic to zoom by 1.5x)

Each platform has certain benefits and limitations. On one extreme, the pure software approach implemented in discrete DSPs is mature, flexible, and relatively easy to use but offers limited instruction-level parallelism. On the other extreme, ASIC implementations offer custom performance and high volume pricing benefits but traditionally constitute a much greater design effort and soaring NRE costs. Demonstrating some of the value from both extremes, FPGA hardware supports reprogrammability and architecture flexibility in terms of spatial and temporal parallelism (via repetition and pipelining) but lacks ease of programming since design entry is in RTL versus the DSP program dobody of ANSI C/C++.

The catch-22 situation is that designers want the programming flexibility of the discrete DSPs and the performance flexibility available in FPGAs. How can they combine the best of both worlds? And, more importantly, what are their options if the application calls for the use of an ASIC implementation? Optimal implementation of DSP algorithms, therefore, requires a serious rethinking about how to approach the overall design flow when transforming algorithms into hardware, via either the ASIC or FPGA route. In the end, choosing the path of technology independence could mean the difference between success and failure.

Pure ANSI C/C++ algorithmic synthesis bridges design gap
To use Register Transfer Level (RTL) to create hardware implementations for complex DSP algorithms, design teams must iterate through several steps, including micro-architecture definition, handwritten RTL, and area/speed optimization through RTL synthesis. This manual process is slow and induces up to 60 percent of the bugs found in RTL due to design miss-intent from original specification. In the final result, both the micro-architecture and technology characteristics become hard-coded into the RTL description. This effect severely limits the notion of RTL reuse or retargeting for real applications, and leads to overbuilt designs and wasted silicon.

Figure 2:
(click graphic to zoom by 1.5x)
Figure 3:
(click graphic to zoom by 1.5x)

Figure 3 (click to zoom)

New DSP-specific flows enable algorithmic design at a higher level of abstraction than RTL (Figure 2). Although some C synthesis tools have been available for some time, none have delivered the necessary ease-of-use and quality of results until now. The latest breed of advanced C-based tools, such as Catapult™ C Synthesis from Mentor Graphics, takes industry standard, pure ANSI C++ as input and automatically produces RTL based on user-provided design goals. This approach closes the conceptual gap between algorithm designers modeling in pure ANSI C or C++ and hardware designers working at the RTL abstraction level (Figure 3). Both sides benefit because of:

  • Technology-independent source (a critical differentiator, since it enables designers to choose between ASIC or FPGA implementations)
  • Ability to incrementally explore and optimize implementation architecture (providing a design architecture and RTL tuned to system requirements without the need for hand-coded RTL)
  • Automatic, fast creation of RTL code (providing an accurate flow that produces up to 20X faster design time and up to 60 percent fewer bugs)

More importantly, the ability to select fundamentally superior platform-independent micro-architectural alternatives enables designers to create hardware designs of better quality than traditional RTL methods. Using this methodology, hardware designers can easily perform "what if" tradeoffs evaluating area, latency, throughput, and clock frequency for each micro-architecture, all while leaving the original pure ANSI C/C++ source unchanged.

Larger, faster designs are increasingly common in the DSP realm, which implies prolonged simulation and synthesis cycles. It has become imperative to fix as many code errors as possible prior to simulation and synthesis, using the design checking capabilities in interactive HDL visualization tools. Moreover, verification takes significantly longer than design development because of the limited speed of RTL simulators and the time to manually create an RTL test bench. Advanced design verification flows, with support of industry-standard simulation tools, are now addressing rapid algorithm validation and verification by mixing the high-speed characteristics of pure ANSI C/C++ with HDL like modeling benefits found in SystemC and SystemVerilog

Bringing DSP up to speed using FPGA or ASIC
Algorithmic synthesis must also take into consideration technology-specific characteristics of RTL synthesis to be fully effective. For example, algorithmic synthesis must be aware of high-performance operations available in some FPGAs such dedicated block multipliers, multiply/accumulate macros, pipelined operations, and special memory architectures. For ASICs, algorithmic synthesis must leverage the wide range of operator architectures available in RTL synthesis ranging from high-performance booth encoded parallel multipliers to area efficient bit-serial multipliers.

The key is knowledge-based synthesis tailored to the RTL synthesis tool. As such, algorithmic synthesis must be keenly aware of the inherent characteristics of RTL synthesis tools. Tight integration between algorithmic synthesis and RTL synthesis ensures timing closure in the back-end as well as accurate up-front area, performance and power estimates in the front-end.

Challenges and opportunities abound
All said and done, there are still limitations and challenges ahead. While FPGA devices are bigger than ever before, they nonetheless are still constrained by size. The largest algorithms admittedly will not fit onto current FPGAs. FPGA cost and power consumption are still major issues in consumer applications, where DSP applications have major impact. Technology-independent solutions such as algorithmic C synthesis provide the inherent flexibility to target critical DSP algorithms between discrete DSP, ASIC and FPGA implementations, a critical success factor since application segments dictate market cost, performance, and flexibility requirements. Using the innovative, technology-independent solutions now becoming available, the design community can stay ahead of the competitive curve and fully exploit the unprecedented opportunities ahead.

Shawn McCloud received his B.S. degree in electrical and computer engineering from Case Western Reserve University in 1986. From 1986 to 1994, Shawn worked for Motorola as a senior system engineer responsible for RISC and CISC based microprocessor design. Shawn joined Mentor in 1994, where he has held positions in technical and product marketing focused on RTL and high-level synthesis. For the past four years, Shawn has been the product line manager and director for Mentor Graphics’ high-level synthesis technology.

For more information on Mentor Graphics, its products, and services, visit www.mentor.com.

>

©MMIX DSP-FPGA.com. An OpenSystems Media, LLC publication.

About this Magazine and Website | Contact Us | DSP-FPGA.com Media Kits