Ask a Question

Prefer a chat interface with context about you and your work?

TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters

TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters

Transformers have become the predominant architecture in foundation models due to their excellent performance across various domains. However, the substantial cost of scaling these models remains a significant concern. This problem arises primarily from their dependence on a fixed number of parameters within linear projections. When architectural modifications (e.g., channel …