Page 15

EETE APR 2015

Qualcomm teased about its cognitive-capable platform, which will be a part of the new Snapdragon application processor for mobile devices, but said very little about its building blocks. The company explained that the Zeroth platform is capable of “computer vision, on-device deep learning and smart cameras that can recognize scenes, objects, and read text and handwriting.” Meanwhile, Cognivue (Quebec, Canada) sees the emergence of CNN creating a level playing field for embedded vision SoCs. Cognivue is a designer of its own Image Cognition Processor core, tools and software, used by companies such as Freescale. By leveraging Cognivue’s programmable technology, Freescale provides intelligent imaging and video solutions for automotive vision systems. Tom Wilson, vice present of product management at Cognivue, said, “We are finding our massively parallel image processing architecture and datapath management ideally suited for deep learning.” In contrast, competing approaches have often hand designed their embedded vision SoCs to keep pace with the different vision algorithms that have emerged over time, which they’ve applied to their SoC design and optimized Fig. 3: How deep learning helps a car ‘interpret’ objects on the road. (Source: Nvidia) Fig. 4: Qualcomm pitches its first cognitive computing platform. (Source: Qualcomm) each time. They might find themselves stuck with old architecture ill-suited to CNN, he explained. Robert Laganière, professor at the School of Electrical Engineering and Computer Science at University of Ottawa, told EE Times, “Before the emergence of CNN in computer vision, algorithm designers had to make many design decisions” involving a number of layers and steps with vision algorithms. Such decisions include the type of classifier used for object detection and methods to build an aggregation of features (by using a rigid detector e.g. histogram). More decisions include how to deal with deformable parts of an object and whether to use cascade method (a sequence of small decisions to determine an object) or a support vector machine. “One small specific design decision you make at each step of the way could have a huge impact in object detection accuracy,” said Laganière. In the deep architecture, however, you can integrate all the steps into one, he explained. “You need to make no decision, because deep learning will make decisions for you.” In other words, as Bier summed up: “Traditional computer vision took a very procedural approach in detecting objects.” Deep learning is, however, a radical departure, he said, because “you don’t have to tell computers where to look.” Bier described the process as a two-phase approach. Learning and training done at dedicated facilities, such as data centers, by using super computers. Then, large data sets in the first phase are translated into “settings” and “co-efficient” for embedded systems to use, said Bier. Computer vision expert Fei-Fei Li discusses how we are teaching computers to understand pictures. SoCs optimized for neural networks? No consensus appears to have emerged in terms of the best architecture for CNN in embedded Vision SoCs. Cognivue and the University of Ottawa’s Laganière believe that a massively parallel architecture is the way for efficiently processing a convolutional neural network. In parallel processing, an image to which certain parameters are applied produces another image, and as another filter is applied to the image, it produces another image. “So you may need more internal local memory to store intermediate results in SoCs,” said Laganière. The bad news is that in a big CNN, you could end up with billions of parameters. “But the good news is that there are tricks that we can use to simplify the process and remove some connections that are not needed,” he explained. The challenge, however, remains in handling a number of different nodes in CNN, and you can’t predetermine which node needs to be connected to another node. “That’s why you need a programmable architecture. You can’t hardwire the connections,” said Laganière. Meanwhile, Bier said that in designing a processor for CNN, “You could use a simple, uniform architecture.” Rather than designing a different SoC architecture or optimizing it every time new algorithms pop up, a CNN processor only needs a “fairly simple algorithm that comes with fewer variables,” he explained. In other words, “One could even argue that you can reduce programmability for a neural network processor” if we know the right settings and co-efficient to be fed. “But many companies aren’t ready to make that bet yet, because things are still developing,” added Bier. Chip vendors are using everything from CPU and GPU to FPGA and DSP to enable CNN on vision SoCs. So the debate over CNN architecture has only begun, in Bier’s opinion. While there is no question that deep learning is altering the future of embedded-vision SoC designs, Bier said that a leading vision chip company like Mobileye has accumulated substantial vision-based automotive safety expertise. “I know many rivals want to eat their lunch, but I think an incumbent like Mobileye still has the first mover advantage.” Baidu’s Wu, asked about the challenges of deep learning in smartphones and wearable devices, pointed out three. First, “We are still looking for a killer app,” he said. When the industry developed an MP3 player, for example, people knew exactly what it was for. This made it easy to develop a necessary SoC. While on-device deep learning sounds cool, what is its best application? No one knows yet, according to Wu. Fig. 5: Cognivue’s new Image Cognition Processing technology, called Opus, will leverage APEX architecture (shown above), and enable parallel processing of sophisticated Deep Learning (CNN) classifiers. (Source: Cognivue). Second, “Deep learning needs an ecosystem,” he said. Collaboration among research institutes and companies is critical and “very useful,” he said. Third, “We want to make smaller devices capable of deep learning,” said Wu. “Making it high performance at lower power will be the key.” The topic of bringing deep learning to embedded systems is close to Wu’s heart. He will be a keynote speaker at the Embedded Vision Summit on May 12 in Santa Clara. He’ll speak about “Enabling Ubiquitous Visual Intelligence Through Deep Learning.” www.electronics-eetimes.com Electronic Engineering Times Europe April 2015 15


EETE APR 2015
To see the actual publication please follow the link above