Channel Avatar

AI Papers Decoded Podcast @UCgHPldShnqB-L7q17rwqN0A@youtube.com

12 subscribers - no pronouns :c

Join our hosts as they break down the most fascinating AI re


12:26
Unleashing Reasoning Capability of LLMs via Scalable Question Synthesis from Scratch
18:58
Can Knowledge Editing Really Correct Hallucinations?
19:02
LOGO -- Long cOntext aliGnment via efficient preference Optimization
12:34
Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss
07:34
1-bit AI Infra: Part 1.1, Fast and Lossless BitNet b1.58 Inference on CPUs
13:33
MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models
14:20
PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction
15:43
FrugalNeRF: Fast Convergence for Few-shot Novel View Synthesis without Learned Priors
12:40
CompassJudger-1: All-in-one Judge Model Helps Model Evaluation and Evolution
31:38
SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree
09:27
UCFE: A User-Centric Financial Expertise Benchmark for Large Language Models
18:08
Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation
08:29
MobA: A Two-Level Agent System for Efficient Mobile Task Automation
10:03
Movie Gen: A Cast of Media Foundation Models
13:17
MixEval-X: Any-to-Any Evaluations from Real-World Data Mixtures
12:44
The Curse of Multi-Modalities: Evaluating Hallucinations of LLM across Language, Visual, and Audio
11:02
VidEgoThink: Assessing Egocentric Video Understanding Capabilities for Embodied AI
11:03
HumanEval-V: Evaluating Visual Understanding and Reasoning Abilities of LMM Through Coding Tasks
08:42
What Matters in Transformers? Not All Attention is Needed
13:40
MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation
10:46
MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models
07:48
LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models
11:06
MLLM as Retriever: Interactively Learning Multimodal Retrieval for Embodied Agents
12:41
From Generalist to Specialist: Adapting Vision LM via Task-Specific Visual Instruction Tuning
07:48
Meissonic: Revitalizing Masked Generative Transformers for Efficient HR Text-to-Image Synthesis
12:27
Baichuan-Omni Technical Report
07:29
WALL-E: WORLD ALIGNMENT BY RULE LEARNING IMPROVES WORLD MODEL-BASED LLM AGENTS (Agents Part 1)
09:04
SFTMix: Elevating Language Model Instruction Tuning with Mixup Recipe
09:09
Executing Arithmetic: Fine-Tuning Large Language Models as Turing Machines