Tunadorable | Poke

Tunadorable @UCeQhm8DwHBg_YEYY0KGM1GQ@youtube.com

10K subscribers - no pronouns :c

Oi! I am become change, confuser of subscribers. 🇵🇸🇨🇩

Videos Live Posts Playlists

Recently Uploaded Popular Oldest

auto-regressive decoders "think ahead" with embedding diffusion

Bulk skimming AI paper abstracts - Sept 13, 2024

Autoregressive decoding of sentence vectors as opposed to tokens

Does AI have any chance predicting chaotic systems?

Why is AI so bad at multiplication?

Hate video editing? Check out my automatic video-editing suite

Skimming through hella new AI papers - Sept 6, 2024

Models inside models inside models

Do we really need to use every single transformer layer?

If early layers don't need tons of experts, can we save compute?

purposely "pre-caching" features or inadvertently leaving "breadcrumbs" for future timesteps?

Hella New AI Papers - Sept 1, 2024

Some training tokens are more valuable than others

The hackiest way of making AI models self-aware

A casual intro to recommendation models

A new way to compare high dimensional vectors

Making some embedding vectors functions of each other

Hella New AI Papers - Aug 24, 2024

Messing with tokenization of the prompt leads to superior reasoning

MaskMoE: Forcing rare tokens to only use one expert

What's the difference between 'Inside' OOD and 'Outside' OOD?

The END of RAG? Episodic memory for infinite context length

Hella Brand New AI Paper Abstracts - Aug 18, 2024

Are our perceptual systems structured to view the world truthfully?

Which is in charge, consciousness or the brain?

Can concatenated small networks compete with large ones?

What happens when you take MoE scaling laws seriously?

Trade-off between world modeling (predicting) vs agent modeling (acting)

Hella New AI Papers - Aug 9, 2024

What would it mean for an AI to "understand"?

Goldfish Loss for Mitigating Memorization in LLMs

Can LLMs Learn by Teaching Other LLMs?

Multi-Head Mixture-of-Experts

Bulk Skimming Hella New AI Paper Abstracts - Aug 2, 2024

Mixture of Sparse Attention for Automatic LLM Compression

Effect of Warm Restarts on Stochastic Gradient Descent

Exponentially Faster Language Modeling

Evolutionary Optimization of Model Merging Recipes

MoE-Level Performance Without The Added Computation

Hidden Pitfalls of Cosine Similarity Loss

Open-Endedness is Essential for Artificial Superhuman Intelligence

Sigma-GPTs: A New Approach to Autoregressive Models

Information over-squashing in language tasks

parallel processes in multi-hop LLM reasoning

The Structured Task Hypothesis

SpaceByte: Deleting Tokenization from Large Language Modeling

Cultural Accumulation in Reinforcement Learning

Transformers Represent Belief State Geometry in their Residual Stream

Brand New AI Papers This Week - July 12, 2024

MoE LLMs with Dense Training for Better Performance

Better & Faster Large Language Models via Multi-token Prediction

Underlying Mechanisms Behind Learning Rate Warmup's Success

LASER: Improving LLMs with Layer-Selective Rank Reduction

Hella Brand New AI Papers - July 5, 2024

The Illusion of State in State-Space Models (like Mamba)

Exploring Learning Dynamics in Concept Space

An Exactly Solvable Model for Emergence and Scaling Laws

Diffusion Models can Compose Images and Sounds on a Single Canvas

Hella New AI Papers This Week - June 29, 2024

A conversation with my audience 2024-06-28