The current resurgence of artificial intelligence is due to advances in deep learning. Systems based on deep learning now exceed human capability in speech recognition, object classification, and playing games like Go. Deep learning has been enabled by powerful, efficient computing hardware. The algorithms used have been around since the 1980s, but it has only been in the last decade – when powerful GPUs became available to train networks – that the technology has become practical.
Advances in DL are now gated by hardware performance. With the end of Moore’s Law and Dennard Scaling, continued performance scaling will come primarily from specialization. Graphics processing units are an ideal platform on which to build domain-specific accelerators. They provide very efficient, high performance communication and memory subsystems – which are needed by all domains. Specialization is provided via “cores”, such as tensor cores or ray-tracing cores that accelerate specific applications. High-performance, low-overhead networks provide the scale-up needed to train very large models.