Architectures
49 videos • 72 views • by Tunadorable
1
Hierarchical GPT Decoders for Token to Concept Prediction
Tunadorable
Download
2
Making AI More Dynamic and Adaptable (Paper Breakdown)
Tunadorable
Download
3
Re-visiting AlphaFold - The AI Breakthrough in Biology (Paper Breakdown)
Tunadorable
Download
4
Looking back at Mixture of Experts in Machine Learning (Paper Breakdown)
Tunadorable
Download
5
China's New Meta-Transformer Architecture for Multimodal Learning (Paper Breakdown)
Tunadorable
Download
6
Introducing RWKV - An RNN with the Advantages of a Transformer (Paper Breakdown)
Tunadorable
Download
7
Modular Learning - BETTER than BACKPROP??? (Paper Breakdown)
Tunadorable
Download
8
Injecting Transformer Models with Steroids (Paper Breakdown)
Tunadorable
Download
9
HypersphereLayer instead of LayerNorm
Tunadorable
Download
10
Teaching Language Models to Think Before They Speak (paper breakdown)
Tunadorable
Download
11
Traveling Brainwaves Improve Neural Networks (Paper Breakdown)
Tunadorable
Download
12
Google's Gemini is Surprisingly LAME
Tunadorable
Download
13
Amateur Building Novel Edit to Transformer Architecture
Tunadorable
Download
14
Transformer Next-Concept Prediction: Better Than Next-Token Prediction???
Tunadorable
Download
15
Next-Concept Prediction In-depth Math Walkthrough
Tunadorable
Download
16
Accelerating LLM Inference: Medusa's Uglier Sisters (WITH CODE)
Tunadorable
Download
17
Embedding Vectors Inside Embedding Vectors (WITH CODE)
Tunadorable
Download
18
Emergent vs Imposed Hierarchy in Nested GPT Embedding Vectors (WITH CODE)
Tunadorable
Download
19
GPTs inside GPTs like Russian Nesting Dolls (WITH CODE!)
Tunadorable
Download
20
Interview w/ AI Researcher at Meta - Transformers are Multi-State RNNs
Tunadorable
Download
21
Interview w/ Harvard Scientist - Traveling Waves in Recurrent Neural Nets
Tunadorable
Download
22
Let's build Google's Gemma: from scratch, in code, spelled out
Tunadorable
Download
23
Are 1-Bit Weights The Future of Matrix Multiplication?!?!!?
Tunadorable
Download
24
FractalFormer: A WIP Transformer Architecture Inspired By Fractals
Tunadorable
Download
25
Let's Code Elon's Grok Model in Pytorch Step-by-Step, From Scratch, Spelled Out
Tunadorable
Download
26
LLMs Can Now Teach Themselves to Think Before Speaking
Tunadorable
Download
27
Synergizing Multiple Expert LLMs via Expert Token Routing
Tunadorable
Download
28
Embarrassingly Parallel Training of MoE LLMs
Tunadorable
Download
29
Teaching Old LLMs New Tricks (Tokens)
Tunadorable
Download
30
Hierarchical Concept Decoders - A Failed Attempt At Improving GPTs
Tunadorable
Download
31
LASER: Improving LLMs with Layer-Selective Rank Reduction
Tunadorable
Download
32
Diffusion Models can Compose Images and Sounds on a Single Canvas
Tunadorable
Download
33
Evolutionary Optimization of Model Merging Recipes
Tunadorable
Download
34
Sigma-GPTs: A New Approach to Autoregressive Models
Tunadorable
Download
35
SpaceByte: Deleting Tokenization from Large Language Modeling
Tunadorable
Download
36
Multi-Head Mixture-of-Experts
Tunadorable
Download
37
MoE-Level Performance Without The Added Computation
Tunadorable
Download
38
Exponentially Faster Language Modeling
Tunadorable
Download
39
MoE LLMs with Dense Training for Better Performance
Tunadorable
Download
40
Better & Faster Large Language Models via Multi-token Prediction
Tunadorable
Download
41
Shorter Sequence Lengths Using Matryoshka Models
Tunadorable
Download
42
Omni-modal Pretraining at Scale
Tunadorable
Download
43
Mixture of Sparse Attention for Automatic LLM Compression
Tunadorable
Download
44
What happens when you take MoE scaling laws seriously?
Tunadorable
Download
45
The END of RAG? Episodic memory for infinite context length
Tunadorable
Download
46
MaskMoE: Forcing rare tokens to only use one expert
Tunadorable
Download
47
If early layers don't need tons of experts, can we save compute?
Tunadorable
Download
48
Do we really need to use every single transformer layer?
Tunadorable
Download
49
Models inside models inside models
Tunadorable
Download