Cerebras slay GPUs and break the record for largest AI models trained on a single machine

Cerebras slay GPUs and break the record for largest AI models trained on a single machine

Cerebras, the company behind the world’s largest accelerator chip, the CS-2 Wafer Scale Engine, has announced a major achievement: training the world’s largest AI (Natural Language Processing) (NLP) model on a single machine. While that in and of itself could mean many things (it wouldn’t be much of a record-breaking if the previous largest model was trained in a smartwatch, for example), the AI ​​model Cerebras trained has soared toward a staggering – and unprecedented – 20 billion Teacher. All without having to scale your workload across multiple accelerators. That’s enough to fit into the Internet’s newest sense, the image-of-text generator, and OpenAI’s 12 billion DALL-E parameter. (Opens in a new tab).

The most important part of achieving Cerebras is reducing infrastructure requirements and software complexity. Sure enough, one CS-2 is like a supercomputer on its own. The Wafer Scale Engine-2 — which, as the name suggests, is etched into a single, 7nm chip, and is usually enough for hundreds of main chips — features 2.6 trillion 7nm transistors, 850,000 cores, and 40GB of built-in cache. A package consumes about 15 kW.

Cerebras Chip Scale Engine

Cerebras’s Wafer Scale Engine-2 with all its thinness. (Image credit: Cerebras)

Keeping up to 20 billion NLP model variants in a single chip significantly reduces overhead in training costs across thousands of GPUs (and their associated hardware and scaling requirements) while eliminating the technical difficulties of segmenting models across. This is “one of the more painful aspects of NLP workloads,” Cerebras says, “sometimes taking months to complete.”

It’s a specific and unique issue not just for each neural network being processed, the specification for each GPU, and the network that ties it all together – items that must be laid out beforehand before the first ever training begins. It cannot be transferred across systems.

Cerebras CS-2

Cerebras’ CS-2 is a standalone computing giant that includes not only the Wafer Scale Engine-2, but all of its associated power, memory, and storage subsystems. (Image credit: Cerebras)

The net numbers might make Cerebras’ achievement seem disappointing – OpenAI’s GPT-3, a NLP model that can write whole articles that can sometimes fool readers, features a staggering 175 billion metrics. DeepMind’s Gopher, which launched late last year, brings that number to 280 billion. The brains at Google Brain have announced the training of a trillion-plus-factor model, the Switch Transformer.

#Cerebras #slay #GPUs #break #record #largest #models #trained #single #machine