New Preprint: Born a Transformer - Always a Transformer? πŸ€–

Our paper investigates how architectural limitations of Transformers manifest after pretraining. Read it here.

New Preprint: Contextualize-then-Aggregate: Circuits for In-Context Learning in Gemma-2 2B

We mechanistically explore the information flow in ICL tasks in Gemma model family. Read more here.

Paper Accepted at LoResLM workshop at COLING 2025 πŸ‡°πŸ‡ΏπŸ‡ΊπŸ‡ΏπŸ‡°πŸ‡¬πŸ‡ΉπŸ‡²

Our paper is a survey on current state of the art technologies for Turkic Central Asian languages. Read more here.

Paper Accepted at NeurIPS 2024 πŸŽ‰

We study expressivity of SSMs in comparison to Transformers and RNNs. Read the full paper here.