026_EETE-VF

EETE OCTOBER 2012

DESIGN & PRODUCTS DIGITal SIGNal PROCESSING Architectural exploration in the design of application-specific processors By Gert Goossens Today’s semiconducTor indusTry is driven by the rap- idly growing market of smart consumer devices. These products are feature-rich, multi-sensing, wirelessly connected, always-on, and green. Traditional soc design approaches are based on the use of one or multiple embedded microprocessor cores (e.g. arm) complemented with hardwired accelerators. However, to cope with the flexibility requirements of next-generation devic- es, accelerators will need to become software programmable. This trend calls for an increased use of application-specific pro- cessors (asiPs), and turns socs into heterogeneous multicore Fig.1: Application-Specific Processors in Heterogeneous platforms offering significant amounts of multithreaded paral- Multicore SoCs. lelism. in such a platform, each core is an asiP specialized for and higher, this lower instruction-cycle count offers room for a set of tasks – see figure 1. ASIPs bring important benefits in reducing the clock frequency and voltage, and thus the dynamic three dimensions, referred to as the “Three P’s”: Performance, power used by the asiP. in sub-65nm technologies, the lower Power and Programmability. instruction-cycle count allows running the asiP for short duty cycles only, after which the power can be gated until the next Performance invocation of the task. This reduces leakage power. Power gat- asiPs boost performance by combining multiple forms of ing to reduce leakage can also be applied at the system level. parallelism with specialization of the architecture. instruction- in a heterogeneous multicore system, each asiP is dedicated to level parallelism is a key requirement in almost every accelera- certain tasks. depending on the use scenario not all tasks may tor, to meet the performance requirements. Sometimes this is be active, in which case certain asiPs can be powered down implemented in the form of very-long instruction word (VLiW) for a longer time. architectures. However, to reduce the program memory cost and the related power consumption, often only those parallel Programmability instructions are encoded in the instruction set that are actu- While high performance and low power are delivered by hard- ally required by the targeted applications. Data-level parallel- wired accelerators too, the big advantage of an asiP is that ism exploits regularities in the application that require identical software programmability is offered at the same time. The pro- instructions to operate on multiple data elements. This is also grammability of an ASIP is only effective within its application referred to as single-instruction multiple-data (simd) or vector domain: don’t expect to run e.g. an OFDM channel decoding processing. a typical use case is an oFdm modem, in which algorithm efficiently on an ASIP optimized for image encoding. multiple sub-carriers require identical processing. Other ex- Yet, a well-designed ASIP will provide sufficient programmability amples include video, imaging or graphics kernels, in which the to cope with late algorithmic changes and bug fixes, to add new same instructions are applied on all pixels within a neighbour- features for product differentiation, to ship first while the stan- hood. Task-level parallelism is obtained by allocating multiple dard is evolving, or even to extend products to new markets cores, each specialized for their tasks at hand. without requiring a silicon respin. The above forms of parallelism are combined with a special- ization of architectural resources. Exotic hardware operators such as cyclic or bit-reversing adders, error correcting code generators, or even complex patterns of arithmetic operations executing in single or multiple cycles, can be included in the arithmetic and logic units of an asiP. The register and memory structure, and the related addressing operators, have to be customized to support the data bandwidth required by a parallel architecture. Power consumption The architectural tricks for performance optimization discussed above also contribute to reducing power consumption. a key point is that, by virtue of increased parallelism and architec- tural specialization, the same task can be completed in fewer instruction cycles. in case of process geometries of 65nm Gert Goossens is the ceo of Target compiler Technologies - www.retarget.com - He can be reached at Fig. 2: IP Designer tool-suite. gert.goossens@retarget.com 26 Electronic Engineering Times Europe October 2012 www.electronics-eetimes.com


EETE OCTOBER 2012
To see the actual publication please follow the link above