📄 Notable* Recent AI/ML arXiv Papers

Last updated just now...

📄 MIMIC: A Generative Multimodal Foundation Model for Biomolecules
🗓️ Published: 4/27/2026
🔗 http://arxiv.org/abs/2604.24506v1
👥 Authors: Siavash Golkar, Jake Kovalic, Irina Espejo Morales, Samuel Sledzieski, Minhuan Li, Ksenia Sokolova, Geraud Krawezik, Alberto Bietti, Claudia Skok Gibbs, Roman Klypa, Shengwei Xiong, Francois Lanusse, Liam Parker, Kyunghyun Cho (possible past Meta (United States) affiliation), Miles Cranmer, Tom Hehir, Michael Mccabe, Lucas Meyer, Rudy Morel, Payel Mukhopadhyay, Mariel Pettee, Helen Qu, Jeff Shen, David Fouhey, Hadi Sotoudeh, Vikram Mulligan, Pilar Cossio, Sonya M. Hanson, Alisha N. Jones, Olga G. Troyanskaya, Shirley Ho (possible past Carnegie Mellon University affiliation)
Abstract

Biological function emerges from coupled constraints across sequence, structure, regulation, evolution, and cellular context, yet most foundation models in biology are trained within one modality or for a fixed forward task. We present MIMIC, a generative multimodal foundation model trained on our newly curated and aligned dataset, LORE, linking nucleic acid, protein, evolutionary, structural, regulatory, and semantic/contextual modalities within partially observed biomolecular states. MIMIC use...

📄 Kwai Summary Attention Technical Report
🗓️ Published: 4/27/2026
🔗 http://arxiv.org/abs/2604.24432v1
👥 Authors: Chenglong Chu, Guorui Zhou, Guowang Zhang, Han Li, Hao Peng (possible past Tsinghua University affiliation), Hongtao Cheng, Jian Liang, Jiangxia Cao, Kun Gai, Lingzhi Zhou, Lu Ren, Qi Zhang (possible past Tencent (China) affiliation), Ruiming Tang (possible past Huawei Technologies (China) affiliation), Ruitao Wang, Xinchen Luo, Yi Su, Zhiyuan Liang, Ziqi Wang, Boyang Ding, Chengru Song, Dunju Zang, Hui Wang, Jiao Ou, Jiaxin Deng, Jijun Shi, Jinghao Zhang, Junmin Chen, Lejian Ren, Minxuan Lv, Qianqian Wang (possible past Google (United States) affiliation), Qigen Hu, Shiyao Wang, Siyang Mao, Tao Wang (possible past Stanford University affiliation), Xingmei Wang, Zhixin Ling, Ziming Li, Zixing Zhang
Abstract

Long-context ability, has become one of the most important iteration direction of next-generation Large Language Models, particularly in semantic understanding/reasoning, code agentic intelligence and recommendation system. However, the standard softmax attention exhibits quadratic time complexity with respect to sequence length. As the sequence length increases, this incurs substantial overhead in long-context settings, leading the training and inference costs of extremely long sequences deteri...

📄 Scaling Properties of Continuous Diffusion Spoken Language Models
🗓️ Published: 4/27/2026
🔗 http://arxiv.org/abs/2604.24416v1
👥 Authors: Jason Ramapuram, Eeshan Gunesh Dhekane, Amitis Shidani, Dan Busbridge, Bogdan Mazoure, Zijin Gu, Russ Webb, Tatiana Likhomanenko (possible past Meta (United States) affiliation), Navdeep Jaitly (possible past University Of Toronto affiliation)
Abstract

Speech-only spoken language models (SLMs) lag behind text and text-speech models in performance, with recent discrete autoregressive (AR) SLMs indicating significant computational and data demands to match text models. Since discretizing continuous speech for AR creates bottlenecks, we explore whether continuous diffusion (CD) SLM is more viable. To quantify the SLMs linguistic quality, we introduce the phoneme Jensen-Shannon divergence (pJSD) metric. Our analysis reveals CD SLMs, mirroring AR b...

📄 SeaEvo: Advancing Algorithm Discovery with Strategy Space Evolution
🗓️ Published: 4/27/2026
🔗 http://arxiv.org/abs/2604.24372v1
👥 Authors: Sichun Luo, Yi Huang, Haochen Luo, Fengyuan Liu, Guanzhi Deng, Lei Li (possible past Carnegie Mellon University affiliation), Qinghua Yao, Zefa Hu, Junlan Feng, Qi Liu (possible past Tencent (China) affiliation)
Abstract

LLM-guided evolutionary search has emerged as a promising paradigm for automated algorithm discovery, yet most systems track search progress primarily through executable programs and scalar fitness. Even when natural-language reflection is used, it is often used locally in mutation prompts or stored without an explicit population-level organization of strategic directions. As a result, evolutionary search can struggle to distinguish syntactically different implementations of the same idea, prese...

📄 RAS: a Reliability Oriented Metric for Automatic Speech Recognition
🗓️ Published: 4/27/2026
🔗 http://arxiv.org/abs/2604.24278v1
👥 Authors: Wenbin Huang, Yuhang Qiu, Bohan Li (possible past Google (United States) affiliation), Yiwei Guo, Jing Peng, Hankun Wang, Xie Chen, Kai Yu (possible past Baidu (China) affiliation)
Abstract

Automatic speech recognition systems often produce confident yet incorrect transcriptions under noisy or ambiguous conditions, which can be misleading for both users and downstream applications. Standard evaluation based on Word Error Rate focuses solely on accuracy and fails to capture transcription reliability. We introduce an abstention-aware transcription framework that enables ASR models to explicitly abstain from uncertain segments. To evaluate reliability under abstention, we propose RAS,...

📄 Rewarding the Scientific Process: Process-Level Reward Modeling for Agentic Data Analysis
🗓️ Published: 4/27/2026
🔗 http://arxiv.org/abs/2604.24198v1
👥 Authors: Zhisong Qiu, Shuofei Qiao, Kewei Xu, Yuqi Zhu, Lun Du, Ningyu Zhang (possible past Tencent (China) affiliation), Huajun Chen (possible past Alibaba Group (China) affiliation)
Abstract

Process Reward Models (PRMs) have achieved remarkable success in augmenting the reasoning capabilities of Large Language Models (LLMs) within static domains such as mathematics. However, their potential in dynamic data analysis tasks remains underexplored. In this work, we first present a empirical study revealing that general-domain PRMs struggle to supervise data analysis agents. Specifically, they fail to detect silent errors, logical flaws that yield incorrect results without triggering inte...

📄 MultiDx: A Multi-Source Knowledge Integration Framework towards Diagnostic Reasoning
🗓️ Published: 4/27/2026
🔗 http://arxiv.org/abs/2604.24186v1
👥 Authors: Yimin Deng, Zhenxi Lin, Yejing Wang, Guoshuai Zhao, Pengyue Jia, Zichuan Fu, Derong Xu, Yefeng Zheng (possible past Tencent (China) affiliation), Xiangyu Zhao, Li Zhu, Xian Wu (possible past Tencent (China) affiliation), Xueming Qian
Abstract

Diagnostic prediction and clinical reasoning are critical tasks in healthcare applications. While Large Language Models (LLMs) have shown strong capabilities in commonsense reasoning, they still struggle with diagnostic reasoning due to limited domain knowledge. Existing approaches often rely on internal model knowledge or static knowledge bases, resulting in knowledge insufficiency and limited adaptability, which hinder their capacity to perform diagnostic reasoning. Moreover, these methods foc...

📄 AdapTime: Enabling Adaptive Temporal Reasoning in Large Language Models
🗓️ Published: 4/27/2026
🔗 http://arxiv.org/abs/2604.24175v1
👥 Authors: Yimin Deng, Yejing Wang, Zhenxi Lin, Zichuan Fu, Guoshuai Zhao, Derong Xu, Yefeng Zheng (possible past Tencent (China) affiliation), Xiangyu Zhao, Xian Wu (possible past Tencent (China) affiliation), Li Zhu, Xueming Qian
Abstract

Large language models have demonstrated strong reasoning capabilities in general knowledge question answering. However, their ability to handle temporal information remains limited. To address this limitation, existing approaches often involve external tools or manual verification and are tailored to specific scenarios, leading to poor generalizability. Moreover, these methods apply a fixed pipeline to all questions, overlooking the fact that different types of temporal questions require distinc...

📄 MUSIC: Learning Muscle-Driven Dexterous Hand Control
🗓️ Published: 4/26/2026
🔗 http://arxiv.org/abs/2604.23886v1
👥 Authors: Pei Xu, Yufei Ye (possible past Carnegie Mellon University affiliation), Shuchun Sun, Yu Ding, Elizabeth Schumann, C. Karen Liu (possible past Stanford University affiliation)
Abstract

We present a data-driven approach for physics-based, muscle-driven dexterous control that enables musculoskeletal hands to perform precise piano playing for novel pieces of music outside the reference dataset. Our approach combines high-frequency muscle-level control with low-frequency latent-space coordination in a hierarchical architecture. At the low level, general single-hand policies are trained via reinforcement learning to generate dynamic muscle-tendon activations while tracking trajecto...

📄 From Noisy Historical Maps to Time-Series Oil Palm Mapping Without Annotation in Malaysia and Indonesia (2020-2024)
🗓️ Published: 4/26/2026
🔗 http://arxiv.org/abs/2604.23776v1
👥 Authors: Nuttaset Kuapanich, Juepeng Zheng (possible past Tsinghua University affiliation), Bohan Shi, Jiaying Liu (possible past Peking University affiliation), Jiayin Jiang, Jiatao Huang, Shenghan Tan, Qingmei Li, Haohuan Fu (possible past Tsinghua University affiliation)
Abstract

Accurate monitoring of oil palm plantations is critical for balancing economic development with environmental conservation in Southeast Asia. However, existing plantation maps often suffer from low spatial resolution and a lack of recent temporal coverage, impeding effective surveillance of rapid land-use changes. In this study, we propose a deep learning framework to generate 10-meter resolution oil palm plantation maps for Indonesia and Malaysia from 2020 to 2024, utilizing Sentinel-2 imagery ...

📄 Agri-CPJ: A Training-Free Explainable Framework for Agricultural Pest Diagnosis Using Caption-Prompt-Judge and LLM-as-a-Judge
🗓️ Published: 4/26/2026
🔗 http://arxiv.org/abs/2604.23701v1
👥 Authors: Wentao Zhang (possible past Mila - Quebec Artificial Intelligence Institute affiliation), Qi Zhang (possible past Tencent (China) affiliation), Mingkun Xu, Mu You, Henghua Shen, Zhongzhi He, Keyan Jin, Derek F. Wong (possible past Tencent (China) affiliation), Tao Fang
Abstract

Crop disease diagnosis from field photographs faces two recurring problems: models that score well on benchmarks frequently hallucinate species names, and when predictions are correct, the reasoning behind them is typically inaccessible to the practitioner. This paper describes Agri-CPJ (Caption-Prompt-Judge), a training-free few-shot framework in which a large vision-language model first generates a structured morphological caption, iteratively refined through multi-dimensional quality gating, ...

📄 Tandem: Riding Together with Large and Small Language Models for Efficient Reasoning
🗓️ Published: 4/26/2026
🔗 http://arxiv.org/abs/2604.23623v1
👥 Authors: Zichuan Fu, Xian Wu (possible past Tencent (China) affiliation), Guojing Li, Yejing Wang, Yijun Chen, Zihao Zhao (possible past Tsinghua University affiliation), Yixuan Luo, Hanyu Yan, Yefeng Zheng (possible past Tencent (China) affiliation), Xiangyu Zhao
Abstract

Recent advancements in large language models (LLMs) have catalyzed the rise of reasoning-intensive inference paradigms, where models perform explicit step-by-step reasoning before generating final answers. While such approaches improve answer quality and interpretability, they incur substantial computational overhead due to the prolonged generation sequences. In this paper, we propose Tandem, a novel collaborative framework that synergizes large and small language models (LLMs and SLMs) to achie...

📄 PhysCodeBench: Benchmarking Physics-Aware Symbolic Simulation of 3D Scenes via Self-Corrective Multi-Agent Refinement
🗓️ Published: 4/26/2026
🔗 http://arxiv.org/abs/2604.23580v1
👥 Authors: Tianyidan Xie, Peiyu Wang, Yuyi Qian, Yuxuan Wang (possible past Google (United States) affiliation), Rui Ma, Ying Tai (possible past Tencent (China) affiliation), Song Wu, Qian Wang, Lanjun Wang, Zili Yi
Abstract

Physics-aware symbolic simulation of 3D scenes is critical for robotics, embodied AI, and scientific computing, requiring models to understand natural language descriptions of physical phenomena and translate them into executable simulation environments. While large language models (LLMs) excel at general code generation, they struggle with the semantic gap between physical descriptions and simulation implementation. We introduce PhysCodeBench, the first comprehensive benchmark for evaluating ph...

📄 DLM: Unified Decision Language Models for Offline Multi-Agent Sequential Decision Making
🗓️ Published: 4/26/2026
🔗 http://arxiv.org/abs/2604.23557v1
👥 Authors: Zhuohui Zhang, Bin Cheng (possible past Tencent (China) affiliation), Bin He (possible past Baidu (China) affiliation)
Abstract

Building scalable and reusable multi-agent decision policies from offline datasets remains a challenge in offline multi-agent reinforcement learning (MARL), as existing methods often rely on fixed observation formats and action spaces that limit generalization. In contrast, large language models (LLMs) offer a flexible modeling interface that can naturally accommodate heterogeneous observations and actions. Motivated by this, we propose the Decision Language Model (DLM), which formulates multi-a...

📄 The Last Human-Written Paper: Agent-Native Research Artifacts
🗓️ Published: 4/27/2026
🔗 http://arxiv.org/abs/2604.24658v1
👥 Authors: Jiachen Liu (possible past Baidu (China) affiliation), Jiaxin Pei, Jintao Huang, Chenglei Si, Ao Qu, Xiangru Tang (possible past University Of Cambridge affiliation), Runyu Lu, Lichang Chen, Xiaoyan Bai, Haizhong Zheng, Carl Chen, Zhiyang Chen, Haojie Ye, Yujuan Fu, Zexue He, Zijian Jin, Zhenyu Zhang, Shangquan Sun, Maestro Harmon, John Dianzhuo Wang, Jianqiao Zeng, Jiachen Sun, Mingyuan Wu, Baoyu Zhou, Yuchen You, Shijian Lu, Yiming Qiu, Fan Lai, Yuan Yuan, Yao Li, Junyuan Hong, Ruihao Zhu, Beidi Chen, Alex Pentland (possible past Massachusetts Institute Of Technology affiliation), Ang Chen, Mosharaf Chowdhury (possible past University Of California, Berkeley affiliation), Zechen Zhang
Abstract

Scientific publication compresses a branching, iterative research process into a linear narrative, discarding the majority of what was discovered along the way. This compilation imposes two structural costs: a Storytelling Tax, where failed experiments, rejected hypotheses, and the branching exploration process are discarded to fit a linear narrative; and an Engineering Tax, where the gap between reviewer-sufficient prose and agent-sufficient specification leaves critical implementation details ...

📄 Fed-DLoRA: Efficient Wireless Federated Learning with Dynamic Low-Rank Adaptation
🗓️ Published: 4/27/2026
🔗 http://arxiv.org/abs/2604.24103v1
👥 Authors: Huaicheng Li (possible past Carnegie Mellon University affiliation), Junhui Zhao, Haoyu Quan, Xiaoming Wang (possible past Google (United States) affiliation)
Abstract

Federated learning (FL) offers a promising distributed learning paradigm for internet of vehicles (IoV) applications. However, it faces challenges from communication overhead and dynamic environments. Model compression techniques reduce computing and communication burden yet create trade-offs between compression ratios and vehicle participation strategies. In this paper, we propose a lightweight FL algorithm named federated learning with dynamic low-rank adaptation (Fed-DLoRA), which is combined...

📄 Stabilizing Efficient Reasoning with Step-Level Advantage Selection
🗓️ Published: 4/27/2026
🔗 http://arxiv.org/abs/2604.24003v1
👥 Authors: Han Wang (possible past Peking University affiliation), Xiaodong Yu, Jialian Wu, Jiang Liu, Ximeng Sun, Mohit Bansal, Zicheng Liu (possible past Microsoft (United States) affiliation)
Abstract

Large language models (LLMs) achieve strong reasoning performance by allocating substantial computation at inference time, often generating long and verbose reasoning traces. While recent work on efficient reasoning reduces this overhead through length-based rewards or pruning, many approaches are post-trained under a much shorter context window than base-model training, a factor whose effect has not been systematically isolated. We first show that short-context post-training alone, using standa...

📄 A General Representation-Based Approach to Multi-Source Domain Adaptation
🗓️ Published: 4/26/2026
🔗 http://arxiv.org/abs/2604.23790v1
👥 Authors: Ignavier Ng, Yan Li (possible past Tencent (China) affiliation), Zijian Li, Yujia Zheng, Guangyi Chen, Kun Zhang (possible past Google (United States) affiliation)
Abstract

A central problem in unsupervised domain adaptation is determining what to transfer from labeled source domains to an unlabeled target domain. To handle high-dimensional observations (e.g., images), a line of approaches use deep learning to learn latent representations of the observations, which facilitate knowledge transfer in the latent space. However, existing approaches often rely on restrictive assumptions to establish identifiability of the joint distribution in the target domain, such as ...

📄 Agentic Fusion of Large Atomic and Language Models to Accelerate Materials Discovery
🗓️ Published: 4/26/2026
🔗 http://arxiv.org/abs/2604.23758v1
👥 Authors: Mingze Li, Yu Rong (possible past Tencent (China) affiliation), Songyou Li, Lihong Wang, Jiacheng Cen, Liming Wu, Anyi Li, Zongzhao Li, Qiuliang Liu, Rui Jiao, Tian Bian (possible past Tsinghua University affiliation), Pengju Wang, Hao Sun, Jianfeng Zhang, Ji-Rong Wen, Deli Zhao, Shifeng Jin, Tingyang Xu (possible past Tencent (China) affiliation), Wenbing Huang (possible past Tsinghua University affiliation)
Abstract

The discovery of novel materials is critical for global energy and quantum technology transitions. While deep learning has fundamentally reshaped this landscape, existing predictive or generative models typically operate in isolation, lacking the autonomous orchestration required to execute the full discovery process. Here we present ElementsClaw, an agentic framework for materials discovery that synergizes Large Atomic Models (LAMs) with Large Language Models (LLMs). In response to varied human...

📄 V-GRPO: Online Reinforcement Learning for Denoising Generative Models Is Easier than You Think
🗓️ Published: 4/25/2026
🔗 http://arxiv.org/abs/2604.23380v1
👥 Authors: Bingda Tang, Yuhui Zhang, Xiaohan Wang (possible past Baidu (China) affiliation), Jiayuan Mao (possible past Tsinghua University affiliation), Ludwig Schmidt (possible past University Of Washington affiliation), Serena Yeung-Levy
Abstract

Aligning denoising generative models with human preferences or verifiable rewards remains a key challenge. While policy-gradient online reinforcement learning (RL) offers a principled post-training framework, its direct application is hindered by the intractable likelihoods of these models. Prior work therefore either optimizes an induced Markov decision process (MDP) over sampling trajectories, which is stable but inefficient, or uses likelihood surrogates based on the diffusion evidence lower ...

*Notable papers are those with at least two authors from a "big" AI/ML lab.