Page 6

EETE DEC 2013

executive interview Extendible processors go head to head backed by EDA giants By Nick Flaherty cadence design systems’ Tensilica and Synopsys ARC are going head to head in the market for embedded processors in system on chip designs. These devices can have as many as 30 controller cores outside the main CPU, handling data movement and signal processing with higher clock rates and higher memory bandwidth. “There’s been a very substantial shift in the market to people that want a lot more programmability in their data operations,” said Chris Rowen, founder of Tensilica and now Fellow at cadence Design Systems following the acquisition in May 2013. “Down on the factory floor where the real work gets done there’s an increasing shift to a smarter data plane to more programmable engines that are adapting more often under software control so you can choose your algorithms after the design of the chip. The data rates and energy budgets keep that out of the reach of the execution CPUs.” “Process technologies are so dense the small premium to make a programmable block is negligible, but it means people want to design it once and tape it out and not have to get back to it if they change the algorithm,” he said. “These kinds of processors come much closer to reconciling the Chris Rowen, founder of Tensilica and now Fellow at cadence Design Systems. gap.” The latest Tensilica Xtensa LX5 is the tenth generation of the extendible architecture, but the first new core since the acquisition. “The acquisition was an important step forward in technology for extensible processors,” said Rowen, “and a big validation of everything that we are working for as it reinforces that this is one of the key technologies.” The move has gone well, he says. “The whole team came across and Tensilica maintains its identity under the IP group of Cadence run by Martin Lund,” he said. “We do a major release every 18 to 24 months so the definition of this processor goes back a couple of years and is pushing on data plane processing and efficiency. It brings leadership in the IP space especially in the need for better data plane processors and Tensilica is engaging with customers early in the design cycle and in product definition as a result.” The LX5 core is configurable over a wide range of pre-verified options including 10 different DSP choices that can also be extended with custom application-specific instructions, execution units, register files, and I/O. The pipeline of the processor is selectable with 5- and 7-stage versions, as well as an extended DSP pipeline up to 11 stages. A lot of work has been done on the memory structures, says Rowen, with ‘virtually unlimited’ I/O bandwidth with multiple, wide, designer-defined FIFO, GPIO and lookup interfaces as well as dual load/stores up to 512-bit wide with data cache support and multi-bank RAM support. “We have DRAM improvements in the data cache performance particularly to reduce the latency for cache misses and improving the cache pre-fetch,” he said. “We have also done some innovation in the banking memory to provide much higher bandwidth. In order to sustain the bandwidth you often need multiple banks for example 512-bit wide and you need to have two ports ready for these wide memories and a DMA channel so you have may have three 512-bit operations per cycle.” “We have also introduced more independent arbitration for banks, coalescing the reads so if it requires locations in the same bank it does a single read and gives the bits back from a single memory width, which makes the effective memory bandwidth higher,” he said. One new element is a semantic engine. “The vector processor operates on certain elements of a data word,” said Rowen “In the past you needed to read the whole word and even if you updated one bit you had to re-write the whole word, so we have added this feature to enable and disable individual bit writes. That’s part of the semantics engine. We have an operation that computes which bits you write and don’t write so you can combine two operations for the same latency and the same power.” There has also been focus on fitting the processor into the rest of the chip, adding support for ARM’s CoreSight debug interface. “One of the things we really work on hard was improving the debug and software integration,” The LX5 core is configurable over a wide range of pre-verified options including 10 different DSP choices. 6 Electronic Engineering Times Europe December 2013 www.electronics-eetimes.com


EETE DEC 2013
To see the actual publication please follow the link above