用KV缓存加速GPT模型的推理过程，用KV(Key-Value)缓存来提高Transformer模型推理的速度 | link | 黑洞资源笔记

11:41 · Aug 9, 2023 · Wed

用KV缓存加速GPT模型的推理过程，用KV(Key-Value)缓存来提高Transformer模型推理的速度 | link

Becoming The Unbeatable against AGI

Speeding up the GPT - KV cache

The common optimization trick for speeding up transformer inference is KV caching 1 2. This technique is so prominent that huggingface library has use_cache flag is enabled by default 6. A few days ago, I read an awesome blog post on GPT in 60 Lines of NumPy.…