ReConfigurable
Computing (RCC) For Logic Simulation
The enabling technology for
high performance simulation with advanced debugging capabilities
Table of Contents
The Design Challenge
The electronic industry is undergoing a series of transformations
which will shape how new products are designed, manufactured and used.
Fueled by advancements in semiconductor manufacturing capabilities,
System-On-Chip (SoC) has become a reality with ever increasing demands
for products to possess high performance, capacity and reliability.
While SoC implementation is extremely appealing, the design process
is exceedingly challenging. Traditional design tools have not kept
pace with designs at this level. Particularly the biggest design bottleneck
is in design verification which may consume 50 to 70 percent of total
design time.
Varying Verification
Methodologies
The traditional verification process has combined the usage of various
point tools specific to a fixed task; no single tool has yet penetrated
the entire verification process by providing complete solution from
behavioral simulation to hardware/software co-verification. In addition,
most product development teams are forced to separate into design
and verification groups since the front-end and back-end verification
tools are conceptually different and the vast learning needed to
execute all the tools.
All of this is about to change with the advent of Axis System's
proprietary ReConfigurable Computing (RCC) technology applied to
functional verification.
Current Verification
Technologies
In a traditional event simulation algorithm, each event is processed
by a simulation kernel in a central microprocessor. Each output
event has to be propagated to all the fanout connections until the
circuit stabilizes with no additional changes. This is mainly a
sequential process with performance barriers of having the microprocessor
process tremendous amounts of event evaluations. As a result, when
designs migrate into the one million gate range, event simulation
performing functional verification is extremely time consuming.
Software simulation by far offers the highest level of flexibility
but sacrifice in simulation performance. Hardware accelerators try
to resolve the serial execution problem by designing custom ASICs
dedicated for event processing. However, hardware accelerators are
designed to simulate exclusively at the gate level with timing.
The design process for hardware accelerator ASIC usually lags behind
advancement in microprocessor technology resulting in poor speedup
factor in comparison with software simulation on workstations.
For hardware prototyping, emulation technology provides the technique
to plug designs under test directly into hardware systems. Emulation
uses an array of FPGA chips interconnected by crossbar, partial
crossbar or other interconnect hardware switches to duplicate design
behavior. System speed can reach up to 1Mhz. The emulation step
usually occurs at the end of the design cycle and supports only
at the gate level with loose integration to RTL and behavioral models.
In addition, debugging a circuit in emulation mode requires the
usage of logic analyzers which is difficult to setup and provides
an inefficient debugging environment.
Both hardware accelerator and emulation technology lack support
for RTL and behavioral constructs. For RTL level designs, both technologies
allow a logic synthesis process of translating RTL code into gates.
However, correlation between the RTL and gate level is lost during
the synthesis process and the designer is forced to debug at the
gate level which is extremely ineffective. A similar comparison
would be if software developers write C programs and had to debug
the program at assembly level.
Migrating to the most advanced technology, ReConfigurable Computing
(RCC) accelerates simulation by orders of magnitude with the same
flexibility of software simulators. For each design, a custom RTL
co-processor is constructed using a combination of RCC computing
elements and RCC microsequencer. The RCC computing elements are
custom single instruction processors dedicated for a single cause
and their varying types closely follow the RTL design language constructs.
Control for the computing elements follow the Single Instruction
Multiple Data (SIMD) parallel processing paradigm with high speed
communication to the workstation microprocessor. Thus RCC accelerates
simulation at all language levels without design modifications while
preserving the original simulation debugging environment.
Table A illustrates the simulation technology comparison in a tabular
format
Table A: Comparison Summary of Different Verification Technologies
 |
RCC Simulator
|
Hardware
Emulation |
Hardware
Accelerator |
| Target
Application |
Functional
Simulation&Hardware-Software co-verification |
In circuit
Emulation |
Gate level
simulation with timing capability |
| Usage
Model |
Behavior/RTL/gate
Transparent compile into RCC. |
Gate Level
Emulation/prototype
One-to-two month setup |
Gate Level
Simulation
Tedious Timing library conversion |
| Speed
|
10K-100K
Cycle/sec |
200K-1000K
Cycle/sec |
0.5K-1K
Cycle/sec |
| Underlining
Technology |
ReConfigurable
Computing with tight integration of software/hardware.Accelerate
simulation using hundreds of thousands of RCC computing elements.
Transparently maps RTL/gate into computing elements |
Hardware
prototyping. Wire for wire and gate for gateResolution for setup
and hold timing issues is very complex |
Custom
ASIC processor designed for event simulation processing.
Number of processors is in the hundreds. |
ReConfigurable Computing
(RCC) Technology
ReConfigurable computing had been in the research arena for the
last ten years with varying application for this technology. Early
adopters of reconfigurable technology have been in the military
or US government operations for encryption and decryption.
In a traditional microprocessor based computing model, the user
exploits the microprocessor's static resources to solve a particular
problem. If the instruction is not ideal, efficiency may be lost
with longer execution time. In addition, programs written for microprocessor
based systems are usually executed in a sequential manner with minimum
parallelism.
In contrast to a general purpose microprocessor, RCC configures
the hardware structure to match the algorithm and selects the best
resources for a particular task with maximum parallelism. For example,
if a particular algorithm can take advantage of six arithmetic logic
units with addition and subtraction as its only instruction, RCC
will select the best hardware resource structure with those attributes
for maximum efficiency.
RCC comes in two flavors: static and dynamic. Static RCC refers
to the situation of having predetermined and fixed resources during
execution. The resource allocation is performed at compile time
when the algorithm is being analyzed. On the other hand, dynamic
RCC refers to the situation that different algorithm requires different
resources during execution. Depending on the exact location of execution,
different resources are swapped in on a needed basis. For example,
if during execution of RCC program, a different arithmetic logic
unit is needed to efficiently run the algorithm, dynamic RCC will
swap in the needed resource when running while static RCC will have
loaded the predetermined resources before execution.
Applications for RCC technology have not been widespread until now
mainly due to the previous low capacity of programmable logic devices.
Today with the introduction of high capacity programmable logic
devices such as Altera 10K-250 and new algorithm to map RCC elements
onto multiple programmable logic devices, Axis Systems has applied
this technology to accelerate functional verification by orders
of magnitude while preserving the original debugging environment.
Functional Simulation
using RCC Technology on an RTL Co-Processor
To fully take advantage of the merits of RCC technology, the most
suited algorithms are ones which can be massively parallelized with
construction of a specialized co-processor. Functional verification
naturally falls into this category since evaluation of RTL and gate
level constructs can be accelerated with massively parallelized
RCC co-processor.
Current System-On-Chip (SoC) design methodology involves describing
the system in a language based format (e.g. Verilog or VHDL). The
language is separated into three categories: behavioral, RTL and
gate. Behavioral constructs usually describe the system testbench
and are most efficiently simulated on a microprocessor because they
are serially executed with extensions to network and hard drive
resources. RTL and gate level constructs describe the design and
can be compiled into RCC architecture.
Behavioral constructs are usually written in sequential execution
format with calls to system resources such as the network or hard
disk. They are extremely difficult to parallelize and the microprocessor
is the best resource to simulate the diverse instruction. In contrast,
RTL and gate level constructs are written for parallel execution.
Each RTL or gate level statement can be mapped into a computing
device specifically designed to efficiently execute the instruction.
RCC architecture for functional verification achieves its high speed
by having a co-processor containing a massively parallel structure
of computing elements specially configured for each design. A computing
element is a small compact processor dedicated to perform one function.
For example, Axis Systems has designed a custom computing element
to simulate Verilog RTL "case" and "if" statements.
When executing, the RCC co-processor obtains instructions and data
from the microprocessor, sends the execution command to the SIMD
controller which sequences the evaluation and communication of all
RCC computing elements. The controller next step is to collect all
evaluation result from computing elements, pack them in a data stream
format and send the resulting data back to the microprocessor to
continue simulation.
By mapping the design RTL constructs onto its custom interconnected
computing elements, the RCC hardware is programmed for maximum performance
execution for each design being verified. . Using its proprietary
systolic array interconnection architecture, communication between
computing elements and between multiple devices is fast and efficient.
Figure 1 shows a architecture block diagram of a RTL co-processor
using ReConfigurable Computing. This diagram illustrates the co-processor
controller as well as RCC computing elements.
Figure 1: Architecture of a RTL co-processor using RCC
Axis Systems Xcite
Simulator
First commercial RCC application for Electronic Design Automation
Axis Systems has incorporated the first commercial functional verification
system using RCC architecture. Designed for compactness and high
performance, the Axis Systems XciteŽ engine uses an array of the
highest density Altera programmable chips. The Xcite RCC engine
connects directly with Sun Microsystems workstation including high
bandwidth communication via the PCI bus. With its small form factor,
Xcite does not sacrifice on design capacity. With current capacity
up to10 million gates, Xcite offers the best price/performance simulation
advantage.
To acceleration simulation throughput the design process, the Xcite
product family includes its software simulator, RCC compiler and
RCC hardware engine. Whether the design is described using behavioral,
RTL or gate level constructs, the RCC compiler will directly compile
RTL constructs into RCC computing elements to be directly loaded
onto RCC hardware engine while behavioral constructs are native
compiled into Xcite software simulator. Thus, designers can use
Xcite products throughout the complete design process from architecture
design using Xcite software simulator to software/hardware co-verification
using Xcite RCC engine.
Figure 2: Xcite RTL Language Compiler Architecture
Unlike traditional RTL support, Xcite compiles RTL designs into
a custom built RTL co-processor with RCC computing elements. This
approach is illustrated in Figure 2. By avoiding the costly synthesis
procedure, RTL debugging is preserved without having the designers
to diagnose at the gate level.
Since its inception, Xcite is designed to maximize debuggability.
With its proprietary technique of swapping simulation state between
the software simulator and RCC, the user has the capability of accelerating
as fast as possible using RCC engine to the point of error and swap
the RCC simulation state back into the software simulator. Once
the states are in software simulation, all software control and
internal states are available for inspection. As a result, Xcite
combines the best of two worlds with RCC level hardware acceleration
along with software simulation debuggability.
Figure 3: Xcite Instantaneous Swap Between Software Simulator
and RCC
The Ultimate Verification
System
ReConfigurable Computing is the newest technology to be applied
to design verification. With more than ten years investment in research
and development, the application has not flourish until the advent
of large capacity programmable logic chips.
ReConfigurable computing architecture for functional simulation
is vastly different than the traditional verification methods. RCC
follows the single instruction multiple date (SIMD) parallel processing
paradigm by mapping every design into its own unique interconnection
of hundred of thousands of RCC computing elements. Instead of performing
a logic synthesis process from higher level constructs into logic
gates, RCC compiler maps each RTL statement into RCC computing elements
to preserve debuggability and fast compile time. Further, RCC uses
a proprietary systolic interconnect technology which expands efficiency
in computing element communication.
Axis Systems Xcite is the first commercial functional verification
using RCC technology. With its high capacity compact form, Xcite
connects directly into the workstation. Xcite fits directly into
the existing design methodology with a combination of software simulator
along with the RCC hardware engine. To improve on debugging effectiveness,
Xcite offers instantaneous simulation state swap between software
simulator and RCC hardware engine. With this capability, it offers
the best of two worlds of having software simulation flexibility
along with high speed RCC simulation.
Make the winning choice with RCC technology.
|