Channel Avatar

Tunadorable @UCeQhm8DwHBg_YEYY0KGM1GQ@youtube.com

10K subscribers - no pronouns :c

Oi! I am become change, confuser of subscribers. πŸ‡΅πŸ‡ΈπŸ‡¨πŸ‡©


24:55
auto-regressive decoders "think ahead" with embedding diffusion
35:07
Bulk skimming AI paper abstracts - Sept 13, 2024
27:33
Autoregressive decoding of sentence vectors as opposed to tokens
20:13
Does AI have any chance predicting chaotic systems?
32:02
Why is AI so bad at multiplication?
17:49
Hate video editing? Check out my automatic video-editing suite
50:11
Skimming through hella new AI papers - Sept 6, 2024
26:40
Models inside models inside models
21:43
Do we really need to use every single transformer layer?
18:00
If early layers don't need tons of experts, can we save compute?
22:03
purposely "pre-caching" features or inadvertently leaving "breadcrumbs" for future timesteps?
43:42
Hella New AI Papers - Sept 1, 2024
08:49
Some training tokens are more valuable than others
14:53
The hackiest way of making AI models self-aware
26:07
A casual intro to recommendation models
09:11
A new way to compare high dimensional vectors
21:48
Making some embedding vectors functions of each other
47:41
Hella New AI Papers - Aug 24, 2024
10:52
Messing with tokenization of the prompt leads to superior reasoning
19:53
MaskMoE: Forcing rare tokens to only use one expert
10:38
What's the difference between 'Inside' OOD and 'Outside' OOD?
18:10
The END of RAG? Episodic memory for infinite context length
01:00:58
Hella Brand New AI Paper Abstracts - Aug 18, 2024
31:57
Are our perceptual systems structured to view the world truthfully?
39:28
Which is in charge, consciousness or the brain?
16:09
Can concatenated small networks compete with large ones?
15:30
What happens when you take MoE scaling laws seriously?
17:37
Trade-off between world modeling (predicting) vs agent modeling (acting)
53:56
Hella New AI Papers - Aug 9, 2024
01:00:28
What would it mean for an AI to "understand"?
23:00
Goldfish Loss for Mitigating Memorization in LLMs
28:56
Can LLMs Learn by Teaching Other LLMs?
25:45
Multi-Head Mixture-of-Experts
35:53
Bulk Skimming Hella New AI Paper Abstracts - Aug 2, 2024
30:40
Mixture of Sparse Attention for Automatic LLM Compression
15:12
Effect of Warm Restarts on Stochastic Gradient Descent
27:38
Exponentially Faster Language Modeling
19:09
Evolutionary Optimization of Model Merging Recipes
29:46
MoE-Level Performance Without The Added Computation
19:11
Hidden Pitfalls of Cosine Similarity Loss
35:48
Open-Endedness is Essential for Artificial Superhuman Intelligence
23:47
Sigma-GPTs: A New Approach to Autoregressive Models
16:25
Information over-squashing in language tasks
12:59
parallel processes in multi-hop LLM reasoning
10:40
The Structured Task Hypothesis
16:42
SpaceByte: Deleting Tokenization from Large Language Modeling
18:12
Cultural Accumulation in Reinforcement Learning
21:12
Transformers Represent Belief State Geometry in their Residual Stream
01:00:02
Brand New AI Papers This Week - July 12, 2024
10:26
MoE LLMs with Dense Training for Better Performance
13:45
Better & Faster Large Language Models via Multi-token Prediction
31:45
Underlying Mechanisms Behind Learning Rate Warmup's Success
19:53
LASER: Improving LLMs with Layer-Selective Rank Reduction
01:20:37
Hella Brand New AI Papers - July 5, 2024
40:57
The Illusion of State in State-Space Models (like Mamba)
30:02
Exploring Learning Dynamics in Concept Space
23:17
An Exactly Solvable Model for Emergence and Scaling Laws
05:58
Diffusion Models can Compose Images and Sounds on a Single Canvas
01:17:04
Hella New AI Papers This Week - June 29, 2024
01:31:27
A conversation with my audience 2024-06-28