It just makes sense that we will find a lot of applications where we can use the power of AI to improve our processes and build chips faster
By Lauro Rizzatti (Contributed Content) | Thursday, June 20, 2019
Jean-Marie Brunet, senior director of Marketing at Mentor, a Siemens Business, served as moderator for a well-attended and lively DVCon U.S. panel discussion on the hot topics of Artificial Intelligence (AI) and Machine Learning (ML).
The hour-long session featured panelists Raymond Nijssen, vice president and chief technologist at Achronix; Rob Aitken, fellow and director of technology from Arm; Alex Starr, senior fellow at AMD; Ty Garibay, Mythic’s vice president of hardware engineering; and Saad Godil, director of Applied Deep Learning Research at Nvidia.
Based on the panel transcript, Part 1 of this 4-part mini-series recounted Brunet’s first question about how AI is reshaping the semiconductor industry, specifically chip design verification and panelists’ answers (see “Experts Weigh in on Artificial Intelligence Reshaping the Semiconductor Industry“).
What follows is Part 2, which addresses the design and verification of AI chips. Parts Three and Four will include audience member questions and the panelists’ answers.
Raymond Nijssen: In my view, we are only just getting started with AI. This is going to be a long road and it’s going to be an exciting time. We have to be ready now or a year from now and be absolutely ready. The approach could be incremental. There are already great little things that we can automate.
If we want to go to the bottom of what are we talking about here. Traditionally, with machine learning, you need to have a data set, and then you train your machine learning network with that data set.
In the case of verification, you could wonder what that training set look like. Is it a big dataset with errors in it that may affect a system and force the question, “Is this good or is this bad?” because you’ve trained with it? Take one more step back. If you look at what a machine learning system is, it is a universal curve fitting function.
If you remember curve fitting functions from high school or university, it’s the process of constructing a curve or mathematical function that has the best fit to a series of data points. Then you can have a bunch of coefficients that went into the curve fitting exercise training data. The coefficients that came out of that curve fitting exercise are going to be your coefficients. The question then is, if you have a new x value, can you predict what the y value would be in that analogy. Then you can say it is good or it is bad.
Now, back to design verification. If you want to extend that analogy, you would have to have a big data set with things that were right, and another data set with things that were wrong.
If you look at another area of design verification — if you can call it that — wafer inspection in a fab. People were using machine learning because there are things it can recognize automatically with this methodology. If I have a number of Verilog insertions in there and my insertions are being completed, how would I know that when I am running simulation of any kind or what I’m looking at is right? We would have to have training sets that could somehow predict what was good and what was not good. In many cases, you could do that if you’re looking at some waveform what comes out of a piece of AI. That is a thing in the future that’s unusual here — I have never seen this sequence of events before that, as an AI designer, you need to look at. I think as a tool with training data, you could get to the point.
Are we ready for that? Yes, I think we are ready for that.
If you look at Google Go, it’s another example where there were rules that were being executed but the system was not told what the rules were. It learned the rules just by shoving in a lot of data. I think the analogy extends to design verification along those lines.
Ty Garibay: It’s true that all the processors use software. It is perhaps truer for some of these AI applications. The chips are just a bunch of multipliers with some amount of programming on them. The verification task of these chips must include firmware and software layers, even more so than just in a normal processor or SoC.
We don’t have good EDA environments for integrated hardware plus software design verification because for so long we’ve said firmware is a separate thing and they’re going to test their own firmware. On what? We’re making hardware literally at the same time as we’re designing the firmware. Of course, that drives the specs for our products, which is great, but we don’t have theories that I’ve seen to make a tool to verify integrated hardware/software products. We’ve learned a lot from modems, RF, DSP, and we carried those forward into our designs, built into our systems, and tried to leverage that.
Rob Aitken: I think there’s another important point regarding curve fitting. It’s useful to think about it in the verification space, especially because curve fitting as a process works really well if your training set bounds your curve. Interpolation is good, extrapolation is notoriously bad in curve fitting. You have to remember that whatever verification problem you think you’re solving with machine learning, if it involves extrapolation, it’s likely that you will be disappointed.
It’s possible to build AI systems that go beyond object detection, but they’re much harder to design than systems that just say, “This is a duck,” or something like that. A system that does image recognition that correctly identifies something that it’s never seen before is a much more challenging problem.
Alex Starr: I want to double down on a lot of things that have been said here. The hardware/software ecosystem that we’ve got to verify these days is not just the design. We pretty much have terrible tools industry wide to address that problem. We’ve got tools that can look into the design and see what the firmware was doing and what the software stack was doing at the same time. And that’s great.
We need a more abstract level of debugging and understanding of all of that with a global view. We need a systemic workflow. I haven’t seen any good tools recently. I think that’s where we need to move to as an industry for AI designs, but also for complex CPUs, GPUs, and so on, incredibly complex systems, tens of microprocessors of different types in a single chip. It’s a complex problem and we don’t have the tools in the EDA industry to address it today.
Saad Godil: As someone who worked on domain-specific chips for a long time, GPUs in particular, one of the things that I’ve always seen at DVCon is a lot of focus on design reuse and third-party IP plug’n’play, and how to enable that easily.
One observation — I see a lot more people working on custom chips, more domain-specific chips and not just for AI. Usually those are paired with their own domain-specific languages.
I think that the verification community is going to have fewer opportunities to rely on common standards and common tools. EDA providers build common solutions for everybody, and that will not help. Instead, I think you’re going to have to invest in it yourself, and you are going to have lot more proprietary designs and standards internally in your company, and a lot of them.
You can’t outsource and rely on someone else’s software. You’re going to have invest internally and build up tools that you need, and that gets harder because most of the hardware/software systems will have proprietary software and languages. It is going to be a pretty big investment.
At this point, Moderator Brunet stepped in to defend EDA: Consider the evolution in the use of the emulator for hardware and software verification. One problem for AI design is capacity. Emulators are well positioned to handle capacity. As for actual hardware/software verification, we’re making progress, but it is not great yet. What’s different in the AI space are new frameworks that are different from mobile framework benchmarks. For example, Cafe tools, TensorFlow, and many different frameworks.
From the EDA perspective, we need help from users to tell us what they need to extract from the design. Not all emulators are the same. In our case, our emulator can extract everything, but what a designer wants to avoid dumping a massive amount of data because it’s too much data.
That’s where we need direction on what metrics are important to extract. We can extract pretty much anything when the design is mapped in the emulator and the software is running, but then it becomes a massive amount of data.
Ty Garibay: We have some common layer that is the neural network input, but after that, it is all unique for each vendor. It’s incumbent on tool vendors to be able to rapidly adapt to each unique implementation. They need to supply the ability to generate scoreboards quickly to track different states within the chip in a visible way and to understand new data types and new types of operations.
We could use help with formal language definitions of our processing elements and languages such that there are architectural descriptions that EDA tools can consume. That’s the kind of infrastructure I think that can make a big difference.
We might all be able to specify something — we just don’t know how to specify.
Saad Godil: I think up until now, the problem was different. From a business model perspective that made sense, build the tools to a certain standard and multiple customers can amortize that.
The problem gets harder when everybody has their own custom proprietary specs. On the other side of the equation, you now have AI available to help you with your problems. AI is good at perception, where it can look at a picture and figure out what the different objects are without knowing what those objects mean or what they are. Or it’s really good at extracting meaning from sentences, even though it doesn’t understand. It can’t reason well but it’s good at perception.
Perhaps the answer is that considering the spec of what you are building, could you discover properties of a proprietary design and figure out what would be important and what would not be important. This is an incredibly hard problem to solve. It is not something where an off-the-shelf solution would work. Theoretically, it is possible to build neural networks that can do this stuff.
To address the curve fitting problem, this will fall into the supervised learning paradigm and that has been successful so far in deep learning. There’s been a lot of great work done in unsupervised learning, which I think is going to drive the next set of revolutions in AI. I think our tools can use these things. There’s a lot of data available that can be used for processing the information. The question of what do I save, what do I present to the neural net? We should think about how we can find patterns and try to determine what would be important and what would not be important.
Rob Aitken: I think there’s something else though. When you’re looking at the actual implementation of how that processing works, if you just have visibility to some random set of edges and some activation function, and you say, “this data appeared here and it fired,” you don’t really know what that means, or why, and that’s a benefit with AI.
That’s why these things work, and why we don’t have to program every aspect of them. It’s also why they’re really hard to debug. At this point, matching output is the closest thing you get to provability. This software version and this hardware version do the same thing. We know they do the same thing. Therefore, we will assume it is right, but there are any number of inherent risks in that approach.”
Raymond Nijssen: I think we’re touching on something important — the distinction between supervised learning and unsupervised learning. In the same sense, when machine learning started two decades ago, people had domain-specific knowledge to say if you want to recognize a cat you have to do edge detection. Then you have to have some overlap and two triangles on top of it. That’s a rule, and you can get pretty far with it. They hit the glass ceiling until they came up with unsupervised learning where basically there were no rules to drive the system. The system had to work by providing a lot of these streaming factors. The tool derived coefficients that made it so that the cat’s tail was recognized without anyone having to specify what the rules were.
Let’s look at applications where there are rules and you want to specify those to the system. That’s definitely one avenue that’s promising. Also, it’s not as intelligent as the self-learning systems that don’t depend on rules but depend on lots of inputs.
At some point, you will also have to make the distinction between what is now artificial intelligence and what is natural intelligence. What is really the difference between those two?
Natural intelligence is the difference between interpolating and extrapolating. If you read Jeff Hawkins’s book “On Intelligence,” to ponder about what does intelligence really mean. Basically, it depicts all of us here as extrapolation engines.
During our childhood, we learn if you have event A followed by event B, and then event C follows, the next time you have an event A and an event B, you know event C is going to follow. We are all extrapolation engines and — as long as we’re doing that — our jobs are secure. If you might have artificial intelligence that somehow affects your job or you find yourself interpolating all the time, then you should focus on trying to find ways where you can change your job to be extrapolating.
When you start extrapolating, it’s no longer clear how perfect it is going to work because now you don’t have data as you have data with a bunch of points here. Now we may have an effect on the x axis way over there, and now we have to extrapolate and you don’t have the same level of confidence as with the previous training data.
For design verification, the risk is that somebody may be saying the self-learning system maybe going to miss something or maybe the training data was just not good enough. And then that would probably not happen with a rule-driven system where rules provide much higher likelihood that the system will catch deviations from rules on an enormous amount of data. These distinctions are important.
In Parts Three and Four of this mini-series series, panelists take questions from audience members.