It just makes sense that we will find a lot of applications where we can use the power of AI to improve our processes and build chips faster
By Lauro Rizzatti (Contributed Content) | Thursday, June 13, 2019
Jean-Marie Brunet, senior director of Marketing at Mentor, a Siemens Business, served as moderator for a well-attended and lively DVCon U.S. panel discussion on the hot topics of Artificial Intelligence (AI) and Machine Learning (ML).
The hour-long session featured panelists Raymond Nijssen, vice president and chief technologist at Achronix; Rob Aitken, fellow and director of technology from Arm; Alex Starr, senior fellow at AMD; Ty Garibay, Mythic’s vice president of hardware engineering; and Saad Godil, director of Applied Deep Learning Research at Nvidia.
What follows is Part 1 of a 4-part mini-series based on the panel transcript. It covers Brunet’s first question about how AI is reshaping the semiconductor industry, specifically chip design verification and panelists’ impressions.
Part Two addresses the design and verification of AI chips, while Parts Three and Four include audience member questions and the panelists’ answers.
Raymond Nijssen: If you look at what changes are occurring, we see a big shift in the FPGA industry. We’re coming from traditional applications where you see DSP applications delivered with DSP processors embedded within FPGAs shifting toward architectures much more geared toward machine learning applications in terms of data movement and the nature of the computations. The root of these computations is fairly simple. There’s a commonality between all these algorithms, and it basically consists of many, many multiplications implemented in various clever ways. Of course, there are many ways in which you can slice and dice the problem. At the end of the day, from a conceptual point of view, it’s a fairly simple chip, and then you repeat that many, many, many times.
The complexity is not coming from the diversity or heterogeneity of the chip. These chips tend to be homogenous with emphasis on data movement and elementary arithmetic. In that sense, verifying those chips from a traditional design verification point of view is not different from what you know and do already. The performance validation of the system is much more complicated because there’s so much software in the systems.
All the software stacks are fragmented and there are different ways in which different people solve the same problem, which leads to completely different solutions. You may find that you put a large number of multipliers in a chip to achieve high TOPS rate. You find that you can’t keep them busy because of some bottlenecks somewhere in your memory subsystem or in the way the data is accessed, especially if you have sparse matrices. It is going to be complicated. At the end, AI chip verification is probably more complicated than verifying a processor.
Rob Aitken: I would argue that it is much like verifying a processor because the machine learning unit itself is doing the processing. It’s not a GPU; it’s not a CPU; but it’s doing lots of multiplications and additions as part of a program. I think the interesting verification problem is exactly as someone described it — it’s how these processing units interact with each other and what the system is doing.
You can see that when you build an accelerator that does 20-zillion TOPS per watt. but you can’t figure out how to connect it to anything because you don’t know how the data gets from the system memory into the box. Then, when it finished its calculation, how results are reported back.
How acceleration fits into the overall control flow. There’s an entire exercise in structuring not just the software to handle the problem but in structuring the hardware, as Raymond said, so that you can eliminate as many bottlenecks as possible. Also, so that you can recognize the bottlenecks that are still there and design your system so that it performs as fast as possible. There’s a huge amount of work that’s already been done, but there’s lots more to do.
From the machine learning verification perspective, I think it’s an interesting subset of the EDA problem in general. If you look at EDA, for the last 30 years people have built in huge numbers of heuristics, and those heuristics are actually really good.
Unlike a lot of other problems, if you just take machine learning and apply it to a typical EDA problem, the results don’t necessarily get any better, your solution might actually be worse. If you look back to the 1990s at a previous wave of AI, there were a whole bunch of papers on machine learning and your favorite aspect of verification and other EDA problems. Most of them went nowhere because the compute power didn’t really exist and because the idea of learning on deep neural networks versus shallower ones hadn’t really taken hold.
What we find now is that, looking at some specific problems, you can find good solutions. As an example, some work done at Arm looked at selecting which test vectors were more or less likely to add value in a verification suite. That problem turned out to be relatively easy to formulate as a machine learning problem and specifically an image recognition problem. A standard set of neural networks can figure out that this set of vectors looks promising, while that set looks less promising, and we were able to get about 2X improvement in overall verification throughput. There are definitely solutions where ML works on the algorithmic side, in addition to the obvious one where we’re all building AI chips in some form now.
Alex Starr: I would start with a baseline of where we’re at today in terms of the complex systemic designs we develop. We have multi-die and multi-socket products that are commonplace, certainly for AMD, so we are already facing the scale challenge, which you certainly see on the AI side of things. Lots of designs have to deal with that. In some sense, we’ve been addressing this problem in many ways already, both through using our hybrid engine-based flows and more abstract modeling in verification and emulation.
When I look at it, as previously mentioned, the compute engines in these designs are fairly simple, relative to some of the existing compute engines we have in GPUs and CPUs today. From an IP level point of view, it is probably fairly straightforward. How do you cope with scaling for large designs? It’s really a software problem — how is that whole system performing?
All of these designs are going to be measured on how fast they can execute these machine learning algorithms. That’s the hardware and software/firmware problem. I think the industry is going to have to get their head around the game of how we optimize for performance, not just the design itself but the whole ecosystem. That’s something I am involved with at AMD. This is one of my passions and what we’ve been working on for many years to really try and improve that whole systemic ecosystem and get performance out of it. There is a huge-performance optimization challenge that we have to address. On the question of “building the complex design” side, I’d characterize that for us it is mostly business as usual, nothing really AI specific.
From the “how do I use AI in verification processes” question, I think that is a big area of expansion. Today, we can run the large designs on hybrid systems and process tons of data. What do we do with that data? Historically, we’ve been looking at waveforms to get microscopic details, but these are big ecosystems with multiple software stacks running. How do I debug my software? Where do I put my engineers when I want to optimize that system? Getting high visibility into that is key, and AI can play a role specifically on how to analyze the data we are getting out of the systems and be more targeted in our approaches.
Ty Garibay: The design team and the verification team working on our chips essentially implemented a generative adversarial network with the people. You have the design, you have efficient people trying to attack it, and the design evolves over time as the designers fix it, make it better, and actually create a new chip.
The challenge is that this chip is unique and there’s no baseline data or limited visibility of baseline data from chip-to-chip, unless you’re doing derivatives or next generation x86 or something like that. In the current environment, we’re building machine learning chips like we’re in the wild west in terms of implementation, where everybody is choosing to do things their own way, claiming to have a special sauce. It’s hard to see where we can capture a lot of information like Arm did, but they were able to capture information from a stream of Arm tests that they use build an Arm core.
This is similar to a system called Trek from Breker Systems that can generate memory system targeted tests. The tool learns in its own way, even though it is a limited learning tool. There are clearly opportunities there as we get deeper into our second-generation product. We learn what we’re looking for, to use our data that we gathered and leverage that for more productivity in verification.
Saad Godil: I agree with a lot of what my colleagues have said. I will add one more consideration on building AI chips. The AI field is changing at a rapid pace, and that is going to pose an interesting challenge to verification teams. Today, we must build verification environments that can adapt quickly to changing specs. That’s just going to be even more important.
Let me illustrate this concept with an example. My colleagues have a paper reading group every week. They read different AI papers every week. For one of the weeks, a person said, “let’s go back and read an old classic paper that holds up well.” That paper was published in 2017 and that’s considered old in the field!
Building a chip takes time, which mean you’re going to live with a moving target. You ought to construct verification environments to allow you to react in a short amount of time, and that’s going to give you a competitive advantage in delivering these AI chips.
On the topic of building AI chips, in general, the industry has been designing chips that have dealt with scaling already. From that angle, I don’t think it’s unique. I do think it’s a property of this environment, and I hope we’re going to get more people working in this area. I look forward to some of the cool solutions that people will come up with in this area, but I don’t think there is anything unique that is only restricted to AI.
The other question is, how will AI impact verification. On that I’m bullish. A lot of people have said that AI is the new electricity, and there is going to be a lot of industries that will be impacted by it. I absolutely believe that our design industry will be one of those. Just look at the amount of data that we have, the complexity of the work that we do. It just makes a lot of sense that we will find a lot of different applications where we can use the power of AI to improve our processes and build chips faster.
In Part 2 of this 4-part miniseries, panelists will discuss whether tool vendors are ready to deliver what they need to verify their chips in a particular domain and if tools providers are ready to help.