黑洞资源笔记
11:41 · Aug 9, 2023 · Wed
用KV缓存加速GPT模型的推理过程,用KV(Key-Value)缓存来提高Transformer模型推理的速度 |
link
Becoming The Unbeatable against AGI
Speeding up the GPT - KV cache
The common optimization trick for speeding up transformer inference is KV caching 1 2. This technique is so prominent that huggingface library has use_cache flag is enabled by default 6. A few days ago, I read an awesome blog post on GPT in 60 Lines of NumPy.…
Home