The Transformer architecture, introduced by Vaswani et al. in 2017, serves as the backbone of contemporary language models. Over the years, numerous modifications to this architecture have been ...
Large Language Models (LLMs) have become indispensable tools for diverse natural language processing (NLP) tasks. Traditional LLMs operate at the token level, generating output one word or subword at ...
In a new paper Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2, a Google DeepMind research team introduces Gemma Scope, a comprehensive suite of JumpReLU SAEs.
An NVIDIA research team proposes Hymba, a family of small language models that blend transformer attention with state space models, which outperforms the Llama-3.2-3B model with a 1.32% higher average ...
Language models (LMs) based on transformers have become the gold standard in natural language processing, thanks to their exceptional performance, parallel processing capabilities, and ability to ...
An NVIDIA research team proposes Hymba, a family of small language models that blend transformer attention with state space models, which outperforms the Llama-3.2-3B model with a 1.32% higher average ...
In a new paper Time-Reversal Provides Unsupervised Feedback to LLMs, a research team from Google DeepMind and Indian Institute of Science proposes Time Reversed Language Models (TRLMs), a framework ...
Recent advancements in large language models (LLMs) have primarily focused on enhancing their capacity to predict text in a forward, time-linear manner. However, emerging research suggests that ...
In a new paper Navigation World Models, a research team from Meta, New York University and Berkeley AI Research proposes a Navigation World Model (NWM), a controllable video generation model that ...