GateGPT: 56k tokens per second Transformer (KV cache) on FPGA at 80 MHz
ai
According to a post by researcher @fguzmanai, GateGPT is an FPGA-based transformer inference system achieving 56,000 tokens per second at 80 megahertz clock speed. By optimizing key-value cache operations in specialized hardware, the work demonstrates an alternative path to GPU-scale language model inference with potentially lower power consumption.
Source: https://twitter.com/fguzmanai/status/2065832668172845209
Listen to this story
Hear this and more stories in a personalized audio briefing.
Open The Chonkerton