Silicon system design with deep learning has the same challenges with other complex system-on-chip-based designs, only more so
Source: EEWeb
By Lauro Rizzatti (Contributed Content) | Tuesday, January 22, 2019
In a recent presentation to a group of semiconductor industry executives, Chris Rowen, CEO and co-founder of BabbleLabs, stated that there are two important ways to look at the implications of deep learning in the semiconductor space. His talk was given during a recent Mentor (a Siemens Business) Summit hosted by its Emulation Division.
Chris began by saying that deep learning is a new computing model — a different way of conceiving how to describe the functionality of a system. In the past, this meant writing logic equations. More often, it was sequential code to express functionality with the expectation that the system would do exactly what was in the code. Fundamentally, deep learning is statistical and not the right way to solve problems when there is a known right way to get to an unambiguous right answer. Deep learning shouldn’t be used to drive a pocket calculator, for example.
For some types of problems, Chris continued, the right answer can be captured as a software description. Other cases will need to have the problem described as data sets. Everything else about a deep neural network ought to be derived automatically. He cited logic synthesis as an illustration, with input data and corresponding labels used to figure out what hardware is needed, along with mapping and the neural network structure. After all, these are implementation details.
The fundamental description is the data set, and that’s a different way of looking at the problem. One of the intellectual challenges for the IT industry is, now that the new methodology is available, how should it be used? It’s as fundamental as the emergence of the microprocessor when programs to implement embedded functionality were unavailable.
Deep Learning in the Cloud and on the Edge
The second implication is that deep learning is going to use and drive the cloud and the edge differently.
The cloud offers versatility and will become the place to work on the biggest problems and to do most of the training. Inference in the cloud will be across large aggregated data sets where data is flowing from different places in the world.
Conversely, the edge will be used for the real-time, low latency, minimum power, and minimum cost processing of a small number of data streams.
Chris moved on to describe neural networks and noted they are diverse. An example is the vision space with new, high-end hardware. It often takes hundreds of billions to trillions to tens of trillions of operations per second to do everything at those high data rates. This capacity creates interesting problems, such as a neural network’s ability to extract speech from almost complete noise without needing much computing power. It’s huge compute relative to what has been applied in the past. Relative to microcontrollers and low-end DSPs, it’s not that demanding. In fact, he told his audience to expect to see different ways it gets embedded.
Certainly, this will challenge the GPU hold on the market, Chris added. Already, half a dozen companies are planning to build bigger chips with even more phenomenal numbers of zeroes in the operations per second. At the other end of the spectrum, applications that run on existing microcontrollers are coming to market.
In between, neural network subsystems appear widely in these omnibus chips, smartphones being most notable for the implementation of the first CPU. Then came DSP, then a DSP and a GPU, now CPU, DSP, GPU and deep neural processors of some kind all working together in heterogeneous computing.
The heterogeneity at the hardware level is reflected in the heterogeneity in system integration. Now, it is not just plugging together different pieces of software, affirmed Chris. Non-traditional software pieces have to learn to play nicely with conventional code as well. And, that integration is going to be a difficult problem for anyone working to deploy systems to the full potential of the underlying technology.
Neural Network Hardware
On the one hand, Chris commented, neural network hardware or deep learning silicon is easy. At its heart is a complex layer cake of computation, a nonlinear function, and nice characteristics. It’s data-flow dominated without control logic. The coefficients are read-only and heavily reused. It has tremendous locality of reference. The memory reference pattern is regular, static, and often modest in size. Its programmability allows one piece of hardware to solve different kinds of problems. Further, its development frameworks hide the architectural details.
On the other hand, deep learning silicon is hard. After the superficially easy parts come complexities in the rich set of different types of layers such as convolution sizes, different strides, and vector lengths. Oftentimes, designers perform on-the-fly data reorganization as data flows through this layer cake of computation or take large models and exploit the sparsity of the data. Memory bandwidth varies enormously. The memory bandwidth used in training, for example, depends on how large the model is or the size of any intermediate results. Either could cause memory bandwidth problems for some networks.
As products come to market, the biggest gap is mapping software. The struggle is taking a high-level description of one of these problems using TensorFlow, Pie, Torture, Café, or a dozen other frameworks and snapping it efficiently onto the hardware. It is still crude, Chris remarked, for even the top few companies struggling for the most efficient solutions. New entrants are introducing new architectures or coming to the game with more work to do to develop software environments than they have on the silicon. As a result, silicon availability may be getting ahead of the ability to deploy applications.
Chris identified deep learning capabilities in Apple, Samsung, and Huawei products, but asked where the applications are to use these capabilities. We are limited by the immaturity of the software tools, Chris concluded as he answered his own question, adding that the silicon probably works, but there is a bottleneck in developing applications for it.
Deep Learning and Emulation
Chris identified any number of challenges, including bigger chips, the move to heterogeneous computing, and hardware/software integration. In each case, chip design verification engineers would benefit from hardware emulation.
Silicon system design with deep learning has the same challenges with other complex system-on-chip-based designs, only more so. Every major challenge of design at the leading edge — huge logical and arithmetic complexity, big memory footprints and bandwidths, integration with real-time I/O, early software bring-up, co-optimization of hardware and software, certification of robustness in the face of hardware and software faults — is magnified by deep learning. Thus, the productivity leverage offered by emulation-based design is particularly valuable in getting into these smart systems faster and with fewer hiccups.
In closing, Chris posed a challenge — to figure out how to get application developers and tool developers to jointly produce solutions that work in both hardware functionality and in system functionality. Implementing a hardware emulation strategy should be part of the solution.
Author’s Note: To hear more from Chris Rowen, register for DVCon , which will take place February 25-28 at the DoubleTree Hotel in San Jose, California. Chris will be on a panel — Reshaping the Industry or Holding to the Status Quo? — along with spokespersons from Achronix, AMD, Arm, and NVIDIA — discussing Deep Learning. This panel will be held Wednesday, February 27, from 1:30pm until 2:30pm