New Preprint: Born a Transformer - Always a Transformer? π€
Our paper investigates how architectural limitations of Transformers manifest after pretraining. Read it here.
Our paper investigates how architectural limitations of Transformers manifest after pretraining. Read it here.
We mechanistically explore the information flow in ICL tasks in Gemma model family. Read more here.
Our paper is a survey on current state of the art technologies for Turkic Central Asian languages. Read more here.
We study expressivity of SSMs in comparison to Transformers and RNNs. Read the full paper here.