FFN-SkipLLM: A Hidden Gem for Autoregressive Decoding with Adaptive Feed
Forward Skipping
FFN-SkipLLM: A Hidden Gem for Autoregressive Decoding with Adaptive Feed
Forward Skipping
Autoregressive Large Language Models (e.g., LLaMa, GPTs) are omnipresent achieving remarkable success in language understanding and generation. However, such impressive capability typically comes with a substantial model size, which presents significant challenges for autoregressive token-by-token generation. To mitigate computation overload incurred during generation, several early-exit and layer-dropping strategies have been …