Interpretability
40 videos • 365 views • by Tunadorable
MechInterp, Theoretic Interpretability, and Functional Interpretability
1
Turns out the AIs actually UNDERSTAND the world (Paper Breakdown)
Tunadorable
Download
2
Do Transformers Travel Through an Embedding Vector Landscape? (Paper Breakdown)
Tunadorable
Download
3
The AI Hydra Effect - Is DeepMind WRONG??? (Paper Breakdown)
Tunadorable
Download
4
Understanding Deep Learning (Still) Requires Rethinking Generalization (Paper Breakdown)
Tunadorable
Download
5
How to Win the AI Lottery (Paper Breakdown)
Tunadorable
Download
6
Explaining Emergent Abilities in AI (Paper Breakdown)
Tunadorable
Download
7
Neural Networks Meet Holographic Brain Theory - FUTURE of AI??? (Paper Breakdown)
Tunadorable
Download
8
AI Activation Functions Magnify Infinite-Dimensional Space (Paper Breakdown)
Tunadorable
Download
9
Local minima aren't a problem in ML training (Paper Breakdown)
Tunadorable
Download
10
Inceptionism - How AIs Dream and Compute (Paper Breakdown)
Tunadorable
Download
11
Critical Learning Periods Emerge Even in AI (Paper Breakdown)
Tunadorable
Download
12
What is meant by "Explainable Model or XAI" (Paper Breakdown)
Tunadorable
Download
13
Does the AI Loss landscape Have Fractal Crevices?
Tunadorable
Download
14
Turns Out Language Models Represent Space and Time (paper breakdown)
Tunadorable
Download
15
Discovering Moral Dimensions in ChatGPT (paper breakdown)
Tunadorable
Download
16
Loss Landscapes DO Exhibit Fractal Dynamics (paper breakdown)
Tunadorable
Download
17
Explaining Grokking Through Circuit Efficiency (Paper Breakdown)
Tunadorable
Download
18
Understanding LLMs Through The Problem They're Trained To Solve (paper breakdown)
Tunadorable
Download
19
Computational Consciousness vs the Feeling of Being Conscious
Tunadorable
Download
20
Structured World Representations in Maze-Solving Transformers
Tunadorable
Download
21
They Found Universal Neurons in GPT-2?!?!?!!?!
Tunadorable
Download
22
Interview w/ AI Researcher at Meta - Transformers are Multi-State RNNs
Tunadorable
Download
23
Uncovering Hidden Geometry in Transformers
Tunadorable
Download
24
Can (Looped) Transformers Learn Iterative Algorithms?
Tunadorable
Download
25
Hallucination in Language Models is NEVER Going Away
Tunadorable
Download
26
Deep Networks Always Grok and Here is Why
Tunadorable
Download
27
Peeking into Residual States
Tunadorable
Download
28
Mamba has HUNDREDS of Implicit Attention Heads?!?!
Tunadorable
Download
29
Generalization Benefits of Late Learning Rate Decay
Tunadorable
Download
30
Is Cosine Similarity Really About Similarity?
Tunadorable
Download
31
Anisotropy Is Inherent to Self-Attention in Transformers
Tunadorable
Download
32
What does AI have to do with Plato's Allegory of the Cave?
Tunadorable
Download
33
LASER: Improving LLMs with Layer-Selective Rank Reduction
Tunadorable
Download
34
Retrieval Heads Mechanistically Explain Long-Context Factuality
Tunadorable
Download
35
Anthropic's New Mech-Interp Paper, A Deep Dive
Tunadorable
Download
36
Transformers Represent Belief State Geometry in their Residual Stream
Tunadorable
Download
37
Information over-squashing in language tasks
Tunadorable
Download
38
Underlying Mechanisms Behind Learning Rate Warmup's Success
Tunadorable
Download
39
parallel processes in multi-hop LLM reasoning
Tunadorable
Download
40
What would it mean for an AI to "understand"?
Tunadorable
Download