The landscape of AI technology is undergoing a significant transition as focus shifts from model training to deployment. Startups aiming to compete in this sector face critical challenges, particularly in the realm of inference workloads, which require diverse computational resources. Unlike training, inference can utilize a variety of hardware, creating opportunities for specialized chip manufacturers.
Nvidia recently exemplified this trend with its $20 billion acquisition of Groq, leveraging the startup's unique SRAM-heavy architecture to enhance performance. However, Groq's limited scalability posed challenges that Nvidia addressed by transitioning compute-intensive tasks to its GPUs, while utilizing Groq's chips for bandwidth-sensitive operations. This strategy reflects a broader industry movement, with companies like AWS and Intel also developing hybrid solutions that combine various hardware for improved performance.
In a notable advancement, Lumai has introduced an optical inference accelerator that employs light for matrix computations, significantly reducing power consumption compared to traditional digital systems. Lumai anticipates that its upcoming Iris Tetra systems will achieve an impressive exaOPS of AI performance within a 10 kW power envelope by 2029, utilizing a hybrid electro-optical design to optimize inference processing.