Software is eating the world today while hardware is feeding it and data is driving it. In 2024, software will be eating the hardware as AI and Generative AI take more and more compute power to execute. GenAI also needs substantial data communication throughput delivered by the underlying hardware infrastructure. These demanding requirements are necessary to process billions of parameters at every clock cycle.
AI is built on complex algorithms whose execution is ruled by data movement and not data processing that taxes data throughput. Transformers, the latest algorithms implemented in dense or large language models (LLM), are neural networks that learn context by tracking relationships in sequential data like the words in a sentence.
According to McKinsey, Google search in 2022 processed 3.3 trillion queries or ~100,000 queries/sec at a cost of ¢0.2 per query for annual cost of $6.6 billion. The cost covered through Google advertising revenue. The same report found ChatGPT-3 cost per query hovers around ¢3 per query, 15x larger than the benchmark or, at 100,000 queries/sec annually, the total cost would exceed $100 billion. Moving beyond AI to GenAI means vast amounts more processing power and considerable data communication throughput supported and delivered by the underlying hardware infrastructure. The latest algorithms such as GPT-4 test the current processing hardware and GenAI accelerators are not keeping up. No hardware on the market can run the full GPT-4.
To date, LLM development focused on creating smaller, more specialized LLMs to run on existing hardware, a mere diversion from solving the program.
Semiconductor industry innovations in computing methods are needed in 2024 for an improved hardware infrastructure built on next-generation processing architectures to accommodate a broad range of algorithms and future LLMs enhancements. They must be capable of delivering multiple PetaFlops performance with efficiency greater than 50%, reduce latency to less than 2 second per query, reduce energy consumption and shrink cost to 0.2 cents per query. The latest manufacturing advancements should be adopted that make use of the lowest process technology nodes and multi-chip stacking.