πŸ“„ Notable* Recent AI/ML arXiv Papers

Last updated just now...

πŸ“„ MoRight: Motion Control Done Right
πŸ—“οΈ Published: 4/8/2026
πŸ”— http://arxiv.org/abs/2604.07348v1
πŸ‘₯ Authors: Shaowei Liu, Xuanchi Ren, Tianchang Shen, Huan Ling, Saurabh Gupta (possible past University Of California, Berkeley affiliation), Shenlong Wang (possible past University Of Toronto affiliation), Sanja Fidler (possible past University Of Toronto affiliation), Jun Gao (possible past Nvidia (United States) affiliation)
Abstract

Generating motion-controlled videos--where user-specified actions drive physically plausible scene dynamics under freely chosen viewpoints--demands two capabilities: (1) disentangled motion control, allowing users to separately control the object motion and adjust camera viewpoint; and (2) motion causality, ensuring that user-driven actions trigger coherent reactions from other objects rather than merely displacing pixels. Existing methods fall short on both fronts: they entangle camera and obje...

πŸ“„ FP4 Explore, BF16 Train: Diffusion Reinforcement Learning via Efficient Rollout Scaling
πŸ—“οΈ Published: 4/8/2026
πŸ”— http://arxiv.org/abs/2604.06916v1
πŸ‘₯ Authors: Yitong Li, Junsong Chen, Shuchen Xue, Pengcuo Zeren, Siyuan Fu, Dinghao Yang, Yangyang Tang, Junjie Bai, Ping Luo (possible past Shanghai Artificial Intelligence Laboratory affiliation), Song Han (possible past Stanford University affiliation), Enze Xie
Abstract

Reinforcement-Learning-based post-training has recently emerged as a promising paradigm for aligning text-to-image diffusion models with human preferences. In recent studies, increasing the rollout group size yields pronounced performance improvements, indicating substantial room for further alignment gains. However, scaling rollouts on large-scale foundational diffusion models (e.g., FLUX.1-12B) imposes a heavy computational burden. To alleviate this bottleneck, we explore the integration of FP...

πŸ“„ On the Step Length Confounding in LLM Reasoning Data Selection
πŸ—“οΈ Published: 4/8/2026
πŸ”— http://arxiv.org/abs/2604.06834v1
πŸ‘₯ Authors: Bing Wang, Rui Miao, Chen Shen (possible past Tencent (China) affiliation), Shaotian Yan, Kaiyuan Liu, Ximing Li, Xiaosong Yuan, Sinan Fan, Jun Zhang (possible past Tencent (China) affiliation), Jieping Ye
Abstract

Large reasoning models have recently demonstrated strong performance on complex tasks that require long chain-of-thought reasoning, through supervised fine-tuning on large-scale and high-quality datasets. To construct such datasets, existing pipelines generate long reasoning data from more capable Large Language Models (LLMs) and apply manually heuristic or naturalness-based selection methods to filter high-quality samples. Despite the proven effectiveness of naturalness-based data selection, wh...

πŸ“„ OmniTabBench: Mapping the Empirical Frontiers of GBDTs, Neural Networks, and Foundation Models for Tabular Data at Scale
πŸ—“οΈ Published: 4/8/2026
πŸ”— http://arxiv.org/abs/2604.06814v1
πŸ‘₯ Authors: Dihong Jiang, Ruoqi Cao, Zhiyuan Dang, Li Huang, Qingsong Zhang, Zhiyu Wang, Shihao Piao, Shenggao Zhu, Jianlong Chang, Zhouchen Lin (possible past Peking University affiliation), Qi Tian (possible past Huawei Technologies (China) affiliation)
Abstract

While traditional tree-based ensemble methods have long dominated tabular tasks, deep neural networks and emerging foundation models have challenged this primacy, yet no consensus exists on a universally superior paradigm. Existing benchmarks typically contain fewer than 100 datasets, raising concerns about evaluation sufficiency and potential selection biases. To address these limitations, we introduce OmniTabBench, the largest tabular benchmark to date, comprising 3030 datasets spanning divers...

πŸ“„ AI-Driven Research for Databases
πŸ—“οΈ Published: 4/8/2026
πŸ”— http://arxiv.org/abs/2604.06566v1
πŸ‘₯ Authors: Audrey Cheng, Harald Ng, Aaron Kabcenell, Peter Bailis (possible past University Of California, Berkeley affiliation), Matei Zaharia (possible past University Of California, Berkeley affiliation), Lin Ma (possible past Tencent (China) affiliation), Xiao Shi, Ion Stoica (possible past University Of California, Berkeley affiliation)
Abstract

As the complexity of modern workloads and hardware increasingly outpaces human research and engineering capacity, existing methods for database performance optimization struggle to keep pace. To address this gap, a new class of techniques, termed AI-Driven Research for Systems (ADRS), uses large language models to automate solution discovery. This approach shifts optimization from manual system design to automated code generation. The key obstacle, however, in applying ADRS is the evaluation pip...

πŸ“„ Neural Computers
πŸ—“οΈ Published: 4/7/2026
πŸ”— http://arxiv.org/abs/2604.06425v1
πŸ‘₯ Authors: Mingchen Zhuge, Changsheng Zhao, Haozhe Liu, Zijian Zhou, Shuming Liu, Wenyi Wang, Ernie Chang, Gael Le Lan, Junjie Fei, Wenxuan Zhang, Yasheng Sun, Zhipeng Cai, Zechun Liu, Yunyang Xiong, Yining Yang, Yuandong Tian (possible past Openai (United States) affiliation), Yangyang Shi, Vikas Chandra (possible past Meta (United States) affiliation), JΓΌrgen Schmidhuber
Abstract

We propose a new frontier: Neural Computers (NCs) -- an emerging machine form that unifies computation, memory, and I/O in a learned runtime state. Unlike conventional computers, which execute explicit programs, agents, which act over external execution environments, and world models, which learn environment dynamics, NCs aim to make the model itself the running computer. Our long-term goal is the Completely Neural Computer (CNC): the mature, general-purpose realization of this emerging machine ...

πŸ“„ DiffHDR: Re-Exposing LDR Videos with Video Diffusion Models
πŸ—“οΈ Published: 4/7/2026
πŸ”— http://arxiv.org/abs/2604.06161v1
πŸ‘₯ Authors: Zhengming Yu, Li Ma, Mingming He, Leo Isikdogan, Yuancheng Xu, Dmitriy Smirnov, Pablo Salamanca, Dao Mi, Pablo Delgado, Ning Yu, Julien Philip, Xin Li (possible past Google (United States) affiliation), Wenping Wang, Paul Debevec (possible past Google (United States) affiliation)
Abstract

Most digital videos are stored in 8-bit low dynamic range (LDR) formats, where much of the original high dynamic range (HDR) scene radiance is lost due to saturation and quantization. This loss of highlight and shadow detail precludes mapping accurate luminance to HDR displays and limits meaningful re-exposure in post-production workflows. Although techniques have been proposed to convert LDR images to HDR through dynamic range expansion, they struggle to restore realistic detail in the over- an...

πŸ“„ Claw-Eval: Toward Trustworthy Evaluation of Autonomous Agents
πŸ—“οΈ Published: 4/7/2026
πŸ”— http://arxiv.org/abs/2604.06132v1
πŸ‘₯ Authors: Bowen Ye, Rang Li, Qibin Yang, Yuanxin Liu, Linli Yao, Hanglong Lv, Zhihui Xie, Chenxin An, Lei Li (possible past Carnegie Mellon University affiliation), Lingpeng Kong (possible past Google (United States) affiliation), Qi Liu (possible past Tencent (China) affiliation), Zhifang Sui (possible past Peking University affiliation), Tong Yang (possible past Peking University affiliation)
Abstract

Large language models are increasingly deployed as autonomous agents executing multi-step workflows in real-world software environments. However, existing agent benchmarks suffer from three critical limitations: (1) trajectory-opaque grading that checks only final outputs, (2) underspecified safety and robustness evaluation, and (3) narrow modality coverage and interaction paradigms. We introduce Claw-Eval, an end-to-end evaluation suite addressing all three gaps. It comprises 300 human-verified...

πŸ“„ Does Pass Rate Tell the Whole Story? Evaluating Design Constraint Compliance in LLM-based Issue Resolution
πŸ—“οΈ Published: 4/7/2026
πŸ”— http://arxiv.org/abs/2604.05955v1
πŸ‘₯ Authors: Kai Yu (possible past Baidu (China) affiliation), Zhenhao Zhou, Junhao Zeng, Ying Wang (possible past Tsinghua University affiliation), Xueying Du, Zhiqiang Yuan, Junwei Liu, Ziyu Zhou, Yujia Wang, Chong Wang (possible past Google (United States) affiliation), Xin Peng
Abstract

Repository-level issue resolution benchmarks have become a standard testbed for evaluating LLM-based agents, yet success is still predominantly measured by test pass rates. In practice, however, acceptable patches must also comply with project-specific design constraints, such as architectural conventions, error-handling policies, and maintainability requirements, which are rarely encoded in tests and are often documented only implicitly in code review discussions. This paper introduces \textit{...

πŸ“„ HybridKV: Hybrid KV Cache Compression for Efficient Multimodal Large Language Model Inference
πŸ—“οΈ Published: 4/7/2026
πŸ”— http://arxiv.org/abs/2604.05887v1
πŸ‘₯ Authors: Bowen Zeng, Feiyang Ren, Jun Zhang (possible past Tencent (China) affiliation), Xiaoling Gu, Ke Chen (possible past Tencent (China) affiliation), Lidan Shou, Huan Li
Abstract

Multimodal Large Language Models (MLLMs) have advanced unified reasoning over text, images, and videos, but their inference is hindered by the rapid growth of key-value (KV) caches. Each visual input expands into thousands of tokens, causing caches to scale linearly with context length and remain resident in GPU memory throughout decoding, which leads to prohibitive memory overhead and latency even on high-end GPUs. A common solution is to compress caches under a fixed allocated budget at differ...

πŸ“„ The Illusion of Stochasticity in LLMs
πŸ—“οΈ Published: 4/8/2026
πŸ”— http://arxiv.org/abs/2604.06543v1
πŸ‘₯ Authors: Xiangming Gu, Soham De (possible past Deepmind (United Kingdom) affiliation), Michalis Titsias, Larisa Markeeva (possible past Google (United States) affiliation), Petar VeličkoviΔ‡ (possible past University Of Cambridge affiliation), Razvan Pascanu (possible past Google (United States) affiliation)
Abstract

In this work, we demonstrate that reliable stochastic sampling is a fundamental yet unfulfilled requirement for Large Language Models (LLMs) operating as agents. Agentic systems are frequently required to sample from distributions, often inferred from observed data, a process which needs to be emulated by the LLM. This leads to a distinct failure point: while standard RL agents rely on external sampling mechanisms, LLMs fail to map their internal probability estimates to their stochastic outputs...

πŸ“„ HaloProbe: Bayesian Detection and Mitigation of Object Hallucinations in Vision-Language Models
πŸ—“οΈ Published: 4/7/2026
πŸ”— http://arxiv.org/abs/2604.06165v1
πŸ‘₯ Authors: Reihaneh Zohrabi, Hosein Hasani, Akshita Gupta, Mahdieh Soleymani Baghshah, Anna Rohrbach (possible past University Of California, Berkeley affiliation), Marcus Rohrbach (possible past University Of California, Berkeley affiliation)
Abstract

Large vision-language models can produce object hallucinations in image descriptions, highlighting the need for effective detection and mitigation strategies. Prior work commonly relies on the model's attention weights on visual tokens as a detection signal. We reveal that coarse-grained attention-based analysis is unreliable due to hidden confounders, specifically token position and object repetition in a description. This leads to Simpson's paradox: the attention trends reverse or disappear wh...

πŸ“„ A deep learning framework for jointly solving transient Fokker-Planck equations with arbitrary parameters and initial distributions
πŸ—“οΈ Published: 4/7/2026
πŸ”— http://arxiv.org/abs/2604.06001v1
πŸ‘₯ Authors: Xiaolong Wang (possible past Carnegie Mellon University affiliation), Jing Feng, Qi Liu (possible past Tencent (China) affiliation), Chengli Tan, Yuanyuan Liu, Yong Xu (possible past Tencent (China) affiliation)
Abstract

Efficiently solving the Fokker-Planck equation (FPE) is central to analyzing complex parameterized stochastic systems. However, current numerical methods lack parallel computation capabilities across varying conditions, severely limiting comprehensive parameter exploration and transient analysis. This paper introduces a deep learning-based pseudo-analytical probability solution (PAPS) that, via a single training process, simultaneously resolves transient FPE solutions for arbitrary multi-modal i...

πŸ“„ QiMeng-PRepair: Precise Code Repair via Edit-Aware Reward Optimization
πŸ—“οΈ Published: 4/7/2026
πŸ”— http://arxiv.org/abs/2604.05963v1
πŸ‘₯ Authors: Changxin Ke, Rui Zhang, Jiaming Guo, Yuanbo Wen, Li Ding, Shuo Wang (possible past Nvidia (United States) affiliation), Xuyuan Zhu, Xiong Peng, Di Huang (possible past Google (United States) affiliation), Zidong Du, Xing Hu (possible past Baidu (China) affiliation), Qi Guo, Yunji Chen
Abstract

Large Language Models (LLMs) achieve strong program repair performance but often suffer from over-editing, where excessive modifications overwrite correct code and hinder bug localization. We systematically quantify its impact and introduce precise repair task, which maximizes reuse of correct code while fixing only buggy parts. Building on this insight, we propose PRepair, a framework that mitigates over-editing and improves repair accuracy. PRepair has two components: Self-Breaking, which gene...

*Notable papers are those with at least two authors from a "big" AI/ML lab.