Gabriel Mongaras

01:13:10

Deterministic Image Editing with DDPM Inversion, DDIM Inversion, Null Inversion and Prompt-to-Prompt

42:25

Attending to Topological Spaces: The Cellular Transformer

35:52

Learning to (Learn at Test Time): RNNs with Expressive Hidden States

52:39

WARP: On the Benefits of Weight Averaged Rewarded Policies

28:52

CoDeF: Content Deformation Fields for Temporally Consistent Video Processing

01:14:43

Mamba 2 - Transformers are SSMs: Generalized Models and Efficient Algorithms Through SSS Duality

38:55

CoPE - Contextual Position Encoding: Learning to Count What's Important

45:48

NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models

43:26

xLSTM: Extended Long Short-Term Memory

37:09

KAN: Kolmogorov-Arnold Networks

30:07

LADD: Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion Distillation

37:00

Visual AutoRegressive Modeling:Scalable Image Generation via Next-Scale Prediction

32:49

Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

40:14

Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

04:54

Q* AGI Achieved (Apr Fools)

01:02:30

Stable Diffusion 3: Scaling Rectified Flow Transformers for High-Resolution Image Synthesis

37:08

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

46:25

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits and BitNet

31:15

DoRA: Weight-Decomposed Low-Rank Adaptation

01:02:38

OpenAI Sora and DiTs: Scalable Diffusion Models with Transformers

33:55

A Decoder-only Foundation Model For Time-series Forecasting

37:30

Lumiere: A Space-Time Diffusion Model for Video Generation

28:56

Exphormer: Sparse Transformers for Graphs

25:56

Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads

40:23

Boundary Attention: Learning to Find Faint Boundaries at Any Resolution

29:38

Cached Transformers: Improving Transformers with Differentiable Memory Cache

39:02

Translatotron 3: Speech to Speech Translation with Monolingual Data

44:02

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

47:32

Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference

28:39

Adversarial Diffusion Distillation

40:51

Unsupervised Discovery of Semantic Latent Directions in Diffusion Models

18:45

DALL-E 3 - Improving Image Generation with Better Captions

38:18

LRM: Large Reconstruction Model for Single Image to 3D

30:46

CodeFusion: A Pre-trained Diffusion Model for Code Generation

22:14

Matryoshka Diffusion Models Explained

36:04

UniAudio: An Audio Foundation Model Toward Universal Audio Generation

57:43

QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models

33:27

StreamingLLM - Efficient Streaming Language Models with Attention Sinks Explained

28:51

FreeU: Free Lunch in Diffusion U-Net Explained

26:26

InstaFlow: One Step is Enough for High-Quality Diffusion-Based Text-to-Image Generation Explained

50:20

Llama/Wizard LM Finetuning with Huggingface on RunPod

50:14

2x Faster Language Model Pre-training via Masked Structural Growth

53:53

Bayesian Flow Networks (BFN) Explained

33:54

WizardLM: Empowering Large Language Models to Follow Complex Instructions Explained

43:59

From Sparse to Soft Mixtures of Experts Explained

42:16

BK-SDM: Architecturally Compressed Stable Diffusion for Efficient T2I Generation Explained

36:25

Direct Preference Optimization (DPO): Your Language Model is Secretly a Reward Model Explained

31:51

Universal and Transferable Adversarial Attacks on Aligned Language Models Explained

45:45

SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis Explained

47:16

SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations Explained

35:57

ReLoRA: Stack More Layers Differently: High-Rank Training Through Low-Rank Updates Explained

43:49

MiniLLM: Knowledge Distillation of Large Language Models

01:09:57

RetNet: A Successor to Transformer for Large Language Models Explained

54:21

HyperDreamBooth: HyperNetworks for Fast Personalization of Text-to-Image Explained

39:17

Mixture-of-Experts Meets Instruction Tuning: A Winning Combination for LLMs Explained

01:00:14

Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale Explained

37:21

LongNet: Scaling Transformers to 1,000,000,000 Tokens Explained

29:17

Extending Context Window of Large Language Models via Positional Interpolation Explained

39:52

RoFormer: Enhanced Transformer with Rotary Position Embedding Explained

37:47

RoboCat: A Self-Improving Foundation Agent for Robotic Manipulation Explained