Poke | Architectures

Architectures

49 videos • 72 views • by Tunadorable

Hierarchical GPT Decoders for Token to Concept Prediction

Tunadorable
Download

Making AI More Dynamic and Adaptable (Paper Breakdown)

Tunadorable
Download

Re-visiting AlphaFold - The AI Breakthrough in Biology (Paper Breakdown)

Tunadorable
Download

Looking back at Mixture of Experts in Machine Learning (Paper Breakdown)

Tunadorable
Download

China's New Meta-Transformer Architecture for Multimodal Learning (Paper Breakdown)

Tunadorable
Download

Introducing RWKV - An RNN with the Advantages of a Transformer (Paper Breakdown)

Tunadorable
Download

Modular Learning - BETTER than BACKPROP??? (Paper Breakdown)

Tunadorable
Download

Injecting Transformer Models with Steroids (Paper Breakdown)

Tunadorable
Download

HypersphereLayer instead of LayerNorm

Tunadorable
Download

Teaching Language Models to Think Before They Speak (paper breakdown)

Tunadorable
Download

Traveling Brainwaves Improve Neural Networks (Paper Breakdown)

Tunadorable
Download

Google's Gemini is Surprisingly LAME

Tunadorable
Download

Amateur Building Novel Edit to Transformer Architecture

Tunadorable
Download

Transformer Next-Concept Prediction: Better Than Next-Token Prediction???

Tunadorable
Download

Next-Concept Prediction In-depth Math Walkthrough

Tunadorable
Download

Accelerating LLM Inference: Medusa's Uglier Sisters (WITH CODE)

Tunadorable
Download

Embedding Vectors Inside Embedding Vectors (WITH CODE)

Tunadorable
Download

Emergent vs Imposed Hierarchy in Nested GPT Embedding Vectors (WITH CODE)

Tunadorable
Download

GPTs inside GPTs like Russian Nesting Dolls (WITH CODE!)

Tunadorable
Download

Interview w/ AI Researcher at Meta - Transformers are Multi-State RNNs

Tunadorable
Download

Interview w/ Harvard Scientist - Traveling Waves in Recurrent Neural Nets

Tunadorable
Download

Let's build Google's Gemma: from scratch, in code, spelled out

Tunadorable
Download

Are 1-Bit Weights The Future of Matrix Multiplication?!?!!?

Tunadorable
Download

FractalFormer: A WIP Transformer Architecture Inspired By Fractals

Tunadorable
Download

Let's Code Elon's Grok Model in Pytorch Step-by-Step, From Scratch, Spelled Out

Tunadorable
Download

LLMs Can Now Teach Themselves to Think Before Speaking

Tunadorable
Download

Synergizing Multiple Expert LLMs via Expert Token Routing

Tunadorable
Download

Embarrassingly Parallel Training of MoE LLMs

Tunadorable
Download

Teaching Old LLMs New Tricks (Tokens)

Tunadorable
Download

Hierarchical Concept Decoders - A Failed Attempt At Improving GPTs

Tunadorable
Download

LASER: Improving LLMs with Layer-Selective Rank Reduction

Tunadorable
Download

Diffusion Models can Compose Images and Sounds on a Single Canvas

Tunadorable
Download

Evolutionary Optimization of Model Merging Recipes

Tunadorable
Download

Sigma-GPTs: A New Approach to Autoregressive Models

Tunadorable
Download

SpaceByte: Deleting Tokenization from Large Language Modeling

Tunadorable
Download

Multi-Head Mixture-of-Experts

Tunadorable
Download

MoE-Level Performance Without The Added Computation

Tunadorable
Download

Exponentially Faster Language Modeling

Tunadorable
Download

MoE LLMs with Dense Training for Better Performance

Tunadorable
Download

Better & Faster Large Language Models via Multi-token Prediction

Tunadorable
Download

Shorter Sequence Lengths Using Matryoshka Models

Tunadorable
Download

Omni-modal Pretraining at Scale

Tunadorable
Download

Mixture of Sparse Attention for Automatic LLM Compression

Tunadorable
Download

What happens when you take MoE scaling laws seriously?

Tunadorable
Download

The END of RAG? Episodic memory for infinite context length

Tunadorable
Download

MaskMoE: Forcing rare tokens to only use one expert

Tunadorable
Download

If early layers don't need tons of experts, can we save compute?

Tunadorable
Download

Do we really need to use every single transformer layer?

Tunadorable
Download

Models inside models inside models

Tunadorable
Download