Arxiv Papers

08:06

[QA] A Preliminary Study of o1 in Medicine: Are We Closer to an AI Doctor ?

09:01

A Preliminary Study of o1 in Medicine: Are We Closer to an AI Doctor ?

08:58

[QA] Logic-of-Thought: Injecting Logic into Contexts for Full Reasoning in Large Language Models

16:23

Logic-of-Thought: Injecting Logic into Contexts for Full Reasoning in Large Language Models

16:33

Making Text Embedders Few-Shot Learners

08:02

[QA] Making Text Embedders Few-Shot Learners

06:53

[QA] Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale

09:06

Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale

08:36

[QA] Infer Human's Intentions Before Following Natural Language Instruction

28:04

Infer Human's Intentions Before Following Natural Language Instruction

07:18

[QA] MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models

15:25

MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models

08:11

[QA] Counterfactual Token Generation in Large Language Models

15:05

Counterfactual Token Generation in Large Language Models

08:04

[QA] Characterizing stable regions in the residual stream of LLMs

05:34

Characterizing stable regions in the residual stream of LLMs

07:48

[QA] Watch Your Steps: Observable and Modular Chains of Thought

30:06

Watch Your Steps: Observable and Modular Chains of Thought

07:58

[QA] Seeing Faces in Things: A Model and Dataset for Pareidolia

11:05

Seeing Faces in Things: A Model and Dataset for Pareidolia

29:22

Rule Extrapolation in Language Models: A Study of Compositional Generalization on OOD Prompts

08:38

[QA] Rule Extrapolation in Language Models: A Study of Compositional Generalization on OOD Prompts

07:59

[QA] Style over Substance: Failure Modes of LLM Judges in Alignment Benchmarking

11:54

Style over Substance: Failure Modes of LLM Judges in Alignment Benchmarking

08:00

[QA] LLM Surgery: Efficient Knowledge Unlearning and Editing in Large Language Models

14:13

LLM Surgery: Efficient Knowledge Unlearning and Editing in Large Language Models

07:53

[QA] Embedding Geometries of Contrastive Language-Image Pre-Training

15:42

Embedding Geometries of Contrastive Language-Image Pre-Training

15:25

Kolmogorov–Arnold Transformer

06:56

[QA] Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think

08:32

[QA] Kolmogorov–Arnold Transformer

12:02

Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think

07:20

[QA] Re-Introducing LayerNorm: Geometric Meaning, Irreversibility and Comparative Study with RMSNorm

12:41

Re-Introducing LayerNorm: Geometric Meaning, Irreversibility and a Comparative Study with RMSNorm

08:07

[QA] Is Tokenization Needed for Masked Particle Modelling?

21:01

Is Tokenization Needed for Masked Particle Modelling?

07:01

[QA] Finetuning Language Models to Emit Linguistic Expressions of Uncertainty

12:57

Finetuning Language Models to Emit Linguistic Expressions of Uncertainty

26:52

To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning

07:38

[QA] To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning

08:31

[QA] On the limits of agency in agent-based models

20:16

On the limits of agency in agent-based models

07:28

[QA] Promptriever: Instruction-Trained Retrievers Can Be Prompted Like Language Models

15:49

Promptriever: Instruction-Trained Retrievers Can Be Prompted Like Language Models

07:51

[QA] Finetuning CLIP to Reason about Pairwise Differences

17:08

Finetuning CLIP to Reason about Pairwise Differences

09:21

[QA] Think Twice Before You Act: Improving Inverse Problem Solving With MCMC

11:27

Think Twice Before You Act: Improving Inverse Problem Solving With MCMC

07:41

[QA] Explaining Datasets in Words: Statistical Models with Natural Language Parameters

19:01

Explaining Datasets in Words: Statistical Models with Natural Language Parameters

07:54

[QA] LLMs Will Always Hallucinate, and We Need to Live With This

43:42

LLMs Will Always Hallucinate, and We Need to Live With This

07:12

[QA] PingPong: A Benchmark for Role-Playing LLMs with User Emulation and Multi-Model Evaluation

07:31

PingPong: A Benchmark for Role-Playing LLMs with User Emulation and Multi-Model Evaluation

08:06

[QA] LLaMA-Omni: Seamless Speech Interaction with Large Language Models

21:40

LLaMA-Omni: Seamless Speech Interaction with Large Language Models

08:21

[QA] WINDOWS AGENT ARENA: Evaluating Multi-Modal OS Agents at Scale

17:12

WINDOWS AGENT ARENA: Evaluating Multi-Modal OS Agents at Scale

08:15

[QA] What Makes a Maze Look Like a Maze?

21:33

What Makes a Maze Look Like a Maze?