AI Papers Decoded Podcast | Poke

AI Papers Decoded Podcast @UCgHPldShnqB-L7q17rwqN0A@youtube.com

12 subscribers - no pronouns :c

Join our hosts as they break down the most fascinating AI re

Videos Playlists

Recently Uploaded Popular Oldest

Unleashing Reasoning Capability of LLMs via Scalable Question Synthesis from Scratch

Can Knowledge Editing Really Correct Hallucinations?

LOGO -- Long cOntext aliGnment via efficient preference Optimization

Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss

1-bit AI Infra: Part 1.1, Fast and Lossless BitNet b1.58 Inference on CPUs

MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models

PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction

FrugalNeRF: Fast Convergence for Few-shot Novel View Synthesis without Learned Priors

CompassJudger-1: All-in-one Judge Model Helps Model Evaluation and Evolution

SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree

UCFE: A User-Centric Financial Expertise Benchmark for Large Language Models

Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation

MobA: A Two-Level Agent System for Efficient Mobile Task Automation

Movie Gen: A Cast of Media Foundation Models

MixEval-X: Any-to-Any Evaluations from Real-World Data Mixtures

The Curse of Multi-Modalities: Evaluating Hallucinations of LLM across Language, Visual, and Audio

VidEgoThink: Assessing Egocentric Video Understanding Capabilities for Embodied AI

HumanEval-V: Evaluating Visual Understanding and Reasoning Abilities of LMM Through Coding Tasks

What Matters in Transformers? Not All Attention is Needed

MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation

MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models

LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models

MLLM as Retriever: Interactively Learning Multimodal Retrieval for Embodied Agents

From Generalist to Specialist: Adapting Vision LM via Task-Specific Visual Instruction Tuning

Meissonic: Revitalizing Masked Generative Transformers for Efficient HR Text-to-Image Synthesis

Baichuan-Omni Technical Report

WALL-E: WORLD ALIGNMENT BY RULE LEARNING IMPROVES WORLD MODEL-BASED LLM AGENTS (Agents Part 1)

SFTMix: Elevating Language Model Instruction Tuning with Mixup Recipe

Executing Arithmetic: Fine-Tuning Large Language Models as Turing Machines