Efficiency of language models (in bananas)

by Don Mahurin, 2024-07-28

To see how efficient language model processing is, let’s compare – with the human brain. To make the comparison relatable, let’s use energy contained in a banana (100 Calories) to make such a comparison.

Bananas used per day:

Human brain 3
AMD Radeon RX 6600 32
Nvidia RTX 3090 72
Nvidia DGX 8xH100 1156

That's a lot of bananas.

And though this may not be an apples to apples comparison, there is potentially still a large efficiency gap when compared to the brain. And an even larger gap if we consider learning/training.

The gap will likely continue to narrow, and with this will come many changes in application, devices and models.

Why LLMs must get more efficient

Efficiency will guide AI development, especially as we leave the server room.

Short term advances (software)

Even in the near term, language models processing efficiency can be significantly better, even with existing technology, such as:

Consider running a model similar to Llama 3.1 405B in 80G or 40G of memory or less.

Conclusions

There are real use cases for embedded/personal language models, and this will drive more efficient hardware and software. There is plenty of room for efficiency improvements, especially comparing to the human brain. Even focusing on the near term, significant improvements can already be made with existing technology. Such advances can help AI be more useful at a personal and human level.