027-028-029_EETE-VF

EETE OCTOBER 2012

No efficient ASIP design without tools The key to efficient ASIP design is architectural exploration. Some IP vendors offer configurable ASIP solutions that en- able exploration within the boundaries of a parameterized, yet confined architectural template. Architectural specialization is possible by complementing the template with user-defined extension instructions. Vendors of retargetable ASIP design tools take a fundamentally different approach. Their tools read a formal model of an instruction-set architecture, expressed in a processor description language. The architectural scope of these tools extends beyond parameterized templates, and thus enables true architectural exploration. Fig. 3: Block diagram of high-resolution JPEG encoder (top), Figure 2 pictures iP designer, the retargetable asiP design with profile of average workload per macroblock (bottom). tool from Target compiler Technologies based on the nmL processor description language. From an nmL model, the conversion up to quantization) on JEMA as well. Similarly, the tool automatically builds a complete software development other back-end tasks (zig-zag reordering and output buffering) kit (sdK), including an optimizing c compiler, instruction-set could be compiled onto JemB. Both asiPs operate in a pipe- simulator, and on-chip debugger. The simulator generates ex- line: while JEMA processes a new set of pixel blocks, JEMB tensive profiling reports about instructions, storages and other is encoding the bits for the previous blocks. A FIFO buffer is hardware resources, which indicate the architectural hotspots provided between both asiPs. to the user. Significant design iterations can be made in nML, Target’s asiP design tool, iP designer, enabled the architec- in minutes to hours of time. The instantaneous availability of a tural exploration and design of this dual-ASIP architecture, and production-level c compiler for any architecture described in the mapping of the application code. Target’s new paralleliza- nML is a unique feature of IP Designer. This is accomplished by tion tool, mP designer is applicable to decide on code parti- means of Target’s patented graph-based compilation technol- tioning, for optimal load balancing between the asiPs, and to ogy. iP designer furthermore generates a power-optimized dimension the FiFo communication structure. rTL implementation of each asiP, suited for logic synthesis with The design of the JEMA SIMD with IP Designer required all standard third-party synthesis tools. three main iterations. First, a basic 8-way simd was modelled more recently, Target also announced mP designer, a new with a vector register-file and a general-purpose vector ALU. tool for parallelization and mapping of sequential C code onto The vector ALU only offered limited specialization: next to ad- heterogeneous multi-asiP architectures, and for the generation dition and subtraction instructions on vectors, efficient scaling of communication fabrics between these asiPs. and quantization instructions were supported. The register Case study: architectural exploration for a high-resolution structure was further specialized by adding a transposable reg- JPeG encoder ister file, supporting column-wise writing and row-wise reading Figure 3 shows the block diagram of a high-resolution JPeG of vectors. This transposable register file enabled an efficient encoder system. The input data rate is 1 (red, green or blue) implementation of the DCT task, which requires transposition in pixel per cycle. The silicon area budget is 100K gates. Initial between the column-dcT and row-dcT steps. profiling of the workload shows that discrete cosine transform Compilation, simulation and profiling of the application code (dcT) and variable-length coding (VLc) are time-critical tasks. on the initial Jema asiP however showed that the instruction Given these high workloads, and that the type of operations cycle-count was 50% over budget. In a second iteration, the in DCT and VLC are strongly different, a logical choice is to de- Jema asiP was therefore optimized for better performance. iP sign a different ASIP for each of these tasks. In case of DCT we Designer’s profiling reports indicated opportunities for adding opted for an 8-way simd architecture called Jema. in case of instruction-level parallelism. Therefore a second vector aLu VLc, we opted for a scalar architecture called JemB. interest- was added, operating in parallel with the first one. Also, paral- ingly, once these two asiPs were modelled, we could easily lel load-store and ALU instructions were introduced. Profiling compile all other front-end tasks (from rGB2uyV colour-space reports also hinted at opportunities for arithmetic acceleration. Fig. 4: Dual-ASIP architecture for high-resolution JPEG encoding: 8-way SIMD ASIP for pixel processing tasks (left), scalar ASIP for variable-length coding task (JEMB - right) www.electronics-eetimes.com Electronic Engineering Times Europe October 2012 27


EETE OCTOBER 2012
To see the actual publication please follow the link above