πŸ“„ Notable* Recent AI/ML arXiv Papers

Last updated just now...

πŸ“„ SymptomAI: Towards a Conversational AI Agent for Everyday Symptom Assessment
πŸ—“οΈ Published: 5/5/2026
πŸ”— http://arxiv.org/abs/2605.04012v1
πŸ‘₯ Authors: Joseph Breda, Fadi Yousif, Beszel Hawkins, Marinela Cotoi, Miao Liu, Ray Luo, Po-Hsuan Cameron Chen (possible past Google (United States) affiliation), Mike Schaekermann (possible past Google (United States) affiliation), Samuel Schmidgall, Xin Liu, Girish Narayanswamy, Samuel Solomon, Maxwell A. Xu, Xiaoran Fan, Longfei Shangguan, Anran Wang, Bhavna Daryani, Buddy Herkenham, Cara Tan, Mark Malhotra, Shwetak Patel, John B. Hernandez, Quang Duong, Yun Liu (possible past Google (United States) affiliation), Zach Wasson, Dimitrios Antos, Bob Lou, Matthew Thompson, Jonathan Richina, Anupam Pathak, Nichole Young-Lin, Jake Sunshine, Daniel Mcduff (possible past Google (United States) affiliation)
Abstract

Language models excel at diagnostic assessments on currated medical case-studies and vignettes, performing on par with, or better than, clinical professionals. However, existing studies focus on complex scenarios with rich context making it difficult to draw conclusions about how these systems perform for patients reporting symptoms in everyday life. We deployed SymptomAI, a set of conversational AI agents for end-to-end patient interviewing and differential diagnosis (DDx), via the Fitbit app i...

πŸ“„ A Benchmark for Interactive World Models with a Unified Action Generation Framework
πŸ—“οΈ Published: 5/5/2026
πŸ”— http://arxiv.org/abs/2605.03941v1
πŸ‘₯ Authors: Jianjie Fang, Yingshan Lei, Qin Wan, Ziyou Wang, Yuchao Huang, Yongyan Xu, Baining Zhao, Weichen Zhang, Chen Gao, Xinlei Chen (possible past Tsinghua University affiliation), Yong Li (possible past Tsinghua University affiliation)
Abstract

Achieving Artificial General Intelligence (AGI) requires agents that learn and interact adaptively, with interactive world models providing scalable environments for perception, reasoning, and action. Yet current research still lacks large-scale datasets and unified benchmarks to evaluate their physical interaction capabilities. To address this, we propose iWorld-Bench, a comprehensive benchmark for training and testing world models on interaction-related abilities such as distance perception an...

πŸ“„ RoboAlign-R1: Distilled Multimodal Reward Alignment for Robot Video World Models
πŸ—“οΈ Published: 5/5/2026
πŸ”— http://arxiv.org/abs/2605.03821v1
πŸ‘₯ Authors: Hao Wu (possible past Tencent (China) affiliation), Yuqi Li, Yuan Gao (possible past Tencent (China) affiliation), Fan Xu, Fan Zhang, Kun Wang, Penghao Zhao, Qiufeng Wang, Yizhou Zhao, Weiyan Wang, Yingli Tian, Xian Wu (possible past Tencent (China) affiliation), Xiaomeng Huang
Abstract

Existing robot video world models are typically trained with low-level objectives such as reconstruction and perceptual similarity, which are poorly aligned with the capabilities that matter most for robot decision making, including instruction following, manipulation success, and physical plausibility. They also suffer from error accumulation in long-horizon autoregressive prediction. We present RoboAlign-R1, a framework that combines reward-aligned post-training with stabilized long-horizon in...

πŸ“„ Agentic-imodels: Evolving agentic interpretability tools via autoresearch
πŸ—“οΈ Published: 5/5/2026
πŸ”— http://arxiv.org/abs/2605.03808v1
πŸ‘₯ Authors: Chandan Singh, Yan Shuo Tan, Weijia Xu, Zelalem Gero, Weiwei Yang, Michel Galley (possible past Microsoft (United States) affiliation), Jianfeng Gao (possible past Microsoft (United States) affiliation)
Abstract

Agentic data science (ADS) systems are rapidly improving their capability to autonomously analyze, fit, and interpret data, potentially moving towards a future where agents conduct the vast majority of data-science work. However, current ADS systems use statistical tools designed to be interpretable by humans, rather than interpretable by agents. To address this, we introduce Agentic-imodels, an agentic autoresearch loop that evolves data-science tools designed to be interpretable by agents. Spe...

πŸ“„ ProgramBench: Can Language Models Rebuild Programs From Scratch?
πŸ—“οΈ Published: 5/5/2026
πŸ”— http://arxiv.org/abs/2605.03546v1
πŸ‘₯ Authors: John Yang, Kilian Lieret, Jeffrey Ma, Parth Thakkar, Dmitrii Pedchenko, Sten Sootla, Emily Mcmilin, Pengcheng Yin (possible past Google (United States) affiliation), Rui Hou, Gabriel Synnaeve (possible past Meta (United States) affiliation), Diyi Yang (possible past Stanford University affiliation), Ofir Press
Abstract

Turning ideas into full software projects from scratch has become a popular use case for language models. Agents are being deployed to seed, maintain, and grow codebases over extended periods with minimal human oversight. Such settings require models to make high-level software architecture decisions. However, existing benchmarks measure focused, limited tasks such as fixing a single bug or developing a single, specified feature. We therefore introduce ProgramBench to measure the ability of soft...

πŸ“„ Local Truncation Error-Guided Neural ODEs for Large Scale Traffic Forecasting
πŸ—“οΈ Published: 5/5/2026
πŸ”— http://arxiv.org/abs/2605.03386v1
πŸ‘₯ Authors: Xiao Zhang (possible past Tsinghua University affiliation), Yafei Li, Ruixiang Wang, Wei Wei (possible past Google (United States) affiliation), Shuo He, Mingliang Xu
Abstract

Spatiotemporal forecasting in physical systems, such as large-scale traffic networks, requires modeling a dual dynamic: continuous macroscopic rhythms and discrete, unpredictable microscopic shocks. While Neural Ordinary Differential Equations (ODEs) excel at capturing smooth evolution, their inherent Lipschitz continuity constraints inevitably cause severe over-smoothing when confronting abrupt anomalies. Recent physics-informed methods attempt to bypass this by penalizing numerical integration...

πŸ“„ GeoDecider: A Coarse-to-Fine Agentic Workflow for Explainable Lithology Classification
πŸ—“οΈ Published: 5/5/2026
πŸ”— http://arxiv.org/abs/2605.03383v1
πŸ‘₯ Authors: Jiahao Wang, Mingyue Cheng, Yitong Zhou, Qingyang Mao, Xiaoyu Tao, Qi Liu (possible past Tencent (China) affiliation), Enhong Chen (possible past Baidu (China) affiliation)
Abstract

Lithology classification aims to infer subsurface rock types from well-logging signals, supporting downstream applications like reservoir characterization. Despite substantial progress, most existing methods still treat lithology classification as a single-pass classification task. In contrast, practical experts incorporate geological principles, external knowledge, and tool-use capabilities to perform accurate classification. In this work, we propose GeoDecider, a coarse-to-fine agentic workflo...

πŸ“„ RAG over Thinking Traces Can Improve Reasoning Tasks
πŸ—“οΈ Published: 5/5/2026
πŸ”— http://arxiv.org/abs/2605.03344v1
πŸ‘₯ Authors: Negar Arabzadeh, Wenjie Ma, Sewon Min (possible past University Of Washington affiliation), Matei Zaharia (possible past University Of California, Berkeley affiliation)
Abstract

Retrieval-augmented generation (RAG) has proven effective for knowledge-intensive tasks, but is widely believed to offer limited benefit for reasoning-intensive problems such as math and code generation. We challenge this assumption by showing that the limitation lies not in RAG itself, but in the choice of corpus. Instead of retrieving documents, we propose retrieving thinking traces, i.e., intermediate thinking trajectories generated during problem solving attempts. We show that thinking trace...

πŸ“„ From Knowledge to Action: Outcomes of the 2025 Large Language Model (LLM) Hackathon for Applications in Materials Science and Chemistry
πŸ—“οΈ Published: 5/4/2026
πŸ”— http://arxiv.org/abs/2605.03205v1
πŸ‘₯ Authors: Aritra Roy, Kevin Shen, Andrew Macbride, Awwal Oladipupo, Mudassra Taskeen, Wojtek Treyde, Ruaa A. E. A. Abakar, Ahmad D. Abbas, Elsayed Abdelfatah, Abbas A. Abdullahi, Seham S. Abyah, Chahd Rahyl Adjmi, Fariha Agbere, Savyasanchi Aggarwal, Muhammad Ahmed, Tasnim Ahmed, Motasem Ajlouni, Mattias Akke, Hussein Aladwan, Anwaar S. Alazani, Zahra A. Alharbi, Wajd A. Aljulyhi, Mohammed A. Alkubaish, Fatima A. Almahri, Sayed A. Almohri, David Obeh Alobo, Mohammed Alouni, Azizah S. Alqahtani, Omar Alsaigh, Husain Althagafi, Md. Aqib Aman, Lena Ara, Arifin, Ignacio Arretche, Abdulaziz Ashy, Syeda A. Asim, Amro Aswad, Adeel Atta, SΓΆren Auer, Abdullah Al Azmi, Toheeb Balogun, Suvo Banik, Viktoriia Baibakova, Shakira A. Baksh, Neus G. BastΓΊs, Christina J. Bayard, Adib Bazgir, Louis Beal, Lejla BiberiΔ‡, Wahid Billah, Ankita Biswas, Joshua Bocarsly, Montassar T. Bouzidi, Esma B. Boydas, Youssef Briki, Cailin Buchanan, Mauricio Cafiero, Damien Caliste, Yi Cao, Rafael E. CastaΓ±eda, Sruthy K. Chandy, Benjamin Charmes, Shayantan Chaudhuri, Yiming Chen, Alexander Chen (possible past Google (United States) affiliation), Jieneng Chen, Min-Hsueh Chiu, Defne Circi, Cinthya H. Contreras, Yoann Cure, Nathan Daelman, Roshini Dantuluri, Thomas Davy, William Dawson, Leonid Didukh, Rui Ding, Aminu R. Doguwa, Claudia Draxl, Sathya Edamadaka, Oulaya Elargab, Christina Ertural, Matthew L. Evans, Edvin Fako, Hossam Farag, Nur A. Fathurrahman, Merve Fedai, Rodrigo P. Ferreira, Giuseppe Fisicaro, Thomas Frank, Sasi K. Gaddipati, Abhijeet Gangan, Jennifer Garland, James Garrick, Luigi Genovese, Maryam Ghadrdran, Sandip Giri, Maxime Goulet, Jeremy Goumaz, Sara U. Gracia, Jacob Graham, Gabriel Graves, Kevin P. Greenman, Tim Greitemeier, Cameron Gruich, Sophie Gu, SalomΓ© Guilbert, Hans Gundlach, Muriel F. Gusta, Mourad El Haddaoui, Alexander J. Haibel, Anubhab Haldar, Vehaan Handa, Hassan Harb, Nathan D. Harms, Abdullah Al Hasan, Abir Hassan, Qiyao He, AndrΓ©s Henao-AristizΓ‘bal, Bram Hoex, Sungil Hong, Alexander J. Horvath, Md. Shaib Hossain, Yanqi Huang, Yuqing Huang, Kostiantyn Hubaiev, Donald Intal, Katherine Inzani, Kevin Ishimwe, Tugba Isik, Gopal R. Iyer, Katharina Jager, Jan Janssen, Hyewon Jeong, Michael Jirasek, Tyler R. Josephson, Nisarg Joshi, Yassir Ben Kacem, Remya A. M. Kalapurakal, Rakesh R. Kamath, Sugan Kanagasenthinathan, Dohun Kang, Jason Kantorow, KΓΌbra Kaygisiz, Murat Keceli, Farhana Keya, Muhammad U. Khan, Sartaaj Takrim Khan, Hyungjun Kim, Alexander Kister, Sascha Klawohn, Collin Kovacs, Pranav Krishnan, Maurycy Kryzanowski, Ritesh Kumar, Suman Kumari, Gourav Kumbhojkar, Ryo Kuroki, Shashank Kushwaha, Magdalena Lederbauer, Jaejun Lee, Seunghan Lee, Jeonghwan Lee, Bingcan Li, Calvin Li, Zhanzhao Li, Shi Li, Shicheng Li, Chengyan Liu, Hao Liu (possible past Tencent (China) affiliation), Tung Yan Liu, Yutong Liu, Lucia Vina-Lopez, Chayaphol Lortaraparsert, Andre K. Y. Low, Saffron Luxford, Carlos Madariaga, Rishikesh Magar, Piyush R. Maharana, Rahul Mallela, Shoaib Mahmud, Natesan Mani, Umair Mansoor, Omar B. Mansour, Cassandra Masschelein, Kinga O. Mastej, Ankit Mathanker, Jeffrey Meng, Omran Mezghani, Yidong Ming, Rishav Mitra, Michail Mitsakis, Matthew Miyagishima, Ravikumar Mohan, Naveen R. Mohanraj, Trupti Mohanty, Bernadette Mohr, Francisco A. Molina-Bakhos, Jeremy Monat, Seyed Mohamad Moosavi, Shayan Mousavi, Arman Moussavi, Rubel Mozumber, Muhammad J. Mufti, Diyana Muhammed, Ram Munde, Mrigi Munjal, JosΓ© A. MΓ‘rquez, Shankha Nag, Giacomo Nagaro, Juno Nam, Jose M. Napoles-Duarte, Ry Nduma, Xuan-Vu Nguyen, Ebrahim Norouzi, Oluwatosin Ohiro, Ryotaro Okabe, Viejay Ordillo, Shuichiro Ozawa, Sebastian Pagel, Daniel Palmer, Angela Pan, Akash Pandey, Vivek Pandit, Prakul Pandit, Chiku Parida, Jaehee Park, Hyunsoo Park, Hemangi Patel, Shakul Pathak, Taradutt Pattnaik, Elena Patyukova, Noah Paulson, Deepak S. Pendyala, Erick S. Pepek, Martin H. Petersen, Thang D. Pham, Aniket Phutane, Sabila K. Pinky, Γ‰tienne Polack, Alison Polasik, Maria Politi, Tim Pongratz, Akhila Ponugoti, Fabio Priante, Thomas Michael Pruyn, Sai S. Puppala, Mohammad A. Qazi, Heike Quosdorf, Gollam Rabby, Mohammad J. Raei, Md. Habibur Rahman, A. B. M. Ashikur Rahman, Subhashree Rajasekaran, Tawfiqur Rakib, Hemanth N. Ramesh, Vrushali Ranadive, Karnamohit Ranka, Bojana Rankovic, Adwaith Ravichandran, Ilija RaΕ‘oviΔ‡, Sergei Rigin, Tatem Rios, Varun Rishi, Victor Naden Robinson, Lucas S. Rodrigues, Oswaldo Rodriguez, Mahule Roy, Diptendu Roy, Subhas Roy, Arokia Anto Royan M, Joseph F. Rudzinski, Muhammad Sabih, Subramanyam Sahoo, Srusti Bheem Sain, Thahira Saliya, Vignesh Sampath, Jesus Diaz Sanchez, Arthur S. S. Santos, Muliady Satria, Hasan M. Sayeed, JΓΆrg Schaarschmidt, Philippe Schwaller, Nofit Segal, Abhishec Senthilvel, Sherjeel Shabih, Devanshu Shah, Faezeh Shahmoradi, Samiha Sharlin, Killian Sheriff, Qiuyu Shi, Abubakar D. Shuaibu, Ayesha Siddiqua, M. A. Shadab Siddiqui, Darian Smalley, Benjamin Smith, Taylor D. Sparks, Daniel T. Speckhard, Elena Stojanovska, Akshay Subramanian, Jiwon Sun, Yunkai Sun, Abdul W. Syed, Souvik Ta, Izumi Takahara, Kelly Tallau, Guannan Tang, Ans B. Tariq, Sui X. Tay, Nurlybek Temirbay, Surya P. Tiwari, Febin Tom, Tajah Trapier, Kasidet J. Trerayapiwat, Samanvya Tripathi, Hawra H. Tuhaifa, Mustafa Unal, Mohammad Uzair, Vallabh Vasudevan, Estefania Vazquez, Victor Venturi, Rahul Verma, Ashwini Verma, Alvaro Vazquez-Mayagoitia, Nicholas Wagner, Araki Wakiuchi, Hao Wan, Liaoyaqi Wang, Wolfgang Wenzel, Alexander Wieczorek, Sze H. Wong, Yue Wu, Tong Xie, Andrew Yi, Ziqi Yin, Jodie A. Yuwono, Nahed A. Zaid, Mohd Zaki, Shehtab Zaman, Maimuna U. Zarewa, Mahtab Zehtab, Baosen Zhang, Wenyu Zhang (possible past Tencent (China) affiliation), Melody Zhang, Yangfan Zhang, Yuwen Zhang, Runze Zhang (possible past Tencent (China) affiliation), Zongmin Zhang, Huanhuan Zhao, Yuanlong Bill Zheng, Ramzi Zidani, Xue Zong, Ian Foster, Ben Blaiszik
Abstract

Large language models (LLMs) are rapidly changing how researchers in materials science and chemistry discover, organize, and act on scientific knowledge. This paper analyzes a broad set of community-developed LLM applications in an effort to identify emerging patterns in how these systems can be used across the scientific research lifecycle. We organize the projects into two complementary categories: Knowledge Infrastructure, systems that structure, retrieve, synthesize, and validate scientific ...

πŸ“„ OphMAE: Bridging Volumetric and Planar Imaging with a Foundation Model for Adaptive Ophthalmological Diagnosis
πŸ—“οΈ Published: 5/4/2026
πŸ”— http://arxiv.org/abs/2605.02714v1
πŸ‘₯ Authors: Tienyu Chang, Zhen Chen, Renjie Liang, Jinyu Ding, Jie Xu, Sunu Mathew, Amir Reza Hajrasouliha, Andrew J. Saykin, Ruogu Fang, Yu Huang (possible past Tencent (China) affiliation), Jiang Bian (possible past Baidu (China) affiliation), Qingyu Chen
Abstract

The advent of foundation models has heralded a new era in medical artificial intelligence (AI), enabling the extraction of generalizable representations from large-scale unlabeled datasets. However, current ophthalmic AI paradigms are predominantly constrained to single-modality inference, thereby creating a dissonance with clinical practice where diagnosis relies on the synthesis of complementary imaging modalities. Furthermore, the deployment of high-performance AI in resource-limited settings...

πŸ“„ SAIL: Structure-Aware Interpretable Learning for Anatomy-Aligned Post-hoc Explanations in OCT
πŸ—“οΈ Published: 5/4/2026
πŸ”— http://arxiv.org/abs/2605.02707v1
πŸ‘₯ Authors: Tienyu Chang, Tianhao Li, Ruogu Fang, Jiang Bian (possible past Baidu (China) affiliation), Yu Huang (possible past Tencent (China) affiliation)
Abstract

Optical coherence tomography (OCT), a commonly used retinal imaging modality, plays a central role in retinal disease diagnosis by providing high-resolution visualization of retinal layers. While deep learning (DL) has achieved expert-level accuracy in OCT-based retinal disease detection, its "black box" nature poses challenges for clinical adoption, where explainability is essential for clinical trust and regulatory approval. Existing post-hoc explainable AI (XAI) methods often struggle to deli...

πŸ“„ AcademiClaw: When Students Set Challenges for AI Agents
πŸ—“οΈ Published: 5/4/2026
πŸ”— http://arxiv.org/abs/2605.02661v1
πŸ‘₯ Authors: Junjie Yu, Pengrui Lu, Weiye Si, Hongliang Lu, Jiabao Wu, Kaiwen Tao, Kun Wang, Lingyu Yang, Qiran Zhang, Xiuting Guo, Xuanyu Wang, Yang Wang (possible past Baidu (China) affiliation), Yanjie Wang, Yi Yang (possible past Baidu (China) affiliation), Zijian Hu, Ziyi Yang (possible past Tencent (China) affiliation), Zonghan Zhou, Binghao Qiang, Borui Zhang, Chenning Li, Enchang Zhang, Feifan Chen, Feng Jian, Fengyin Sun, Hao Qiu, Hao Zheng, Haoran Zhu, Hongyu Liu, Jianbin Deng, Jiaxin Song, Jiaying Chi, Jiayou Shi, Jie Fang, Jinghui Zhong, Jingyu Zhou, Jinze Li, Junfeng Yi, Junyan Yu, Junzhi Xue, Ni Song, Pengyi Chen, Qi Chen (possible past Baidu (China) affiliation), Quansheng Li, Rui Tao, Shenghai Gong, Shenhang Lu, Tianqi Shen, Tianxiang Zhu, Tiehan Kang, Tingyu Li, Wendi Wu, Xiao Shen, Xiao Zhou, Xiaotao Zhang, Xinrong Li, Xuankun Yang, Xun Zhang, Yan Li (possible past Tencent (China) affiliation), Ye Lu, Yi Wang, Yibo Zhou, Yichi Zhang, Yihao Sun, Yijun Huang, Yixin Zhu, Yixuan Wu, Yuchen Sun, Yue Wu, Yuheng Sun, Yukun Li (possible past Baidu (China) affiliation), Yutian Tu, Yuxuan Qin, Yuzhuo Wu, Zeyu Li (possible past Peking University affiliation), Zhengyu Lou, Zhenning Ran, Zizhu He, Pengfei Liu
Abstract

Benchmarks within the OpenClaw ecosystem have thus far evaluated exclusively assistant-level tasks, leaving the academic-level capabilities of OpenClaw largely unexamined. We introduce AcademiClaw, a bilingual benchmark of 80 complex, long-horizon tasks sourced directly from university students' real academic workflows -- homework, research projects, competitions, and personal projects -- that they found current AI agents unable to solve effectively. Curated from 230 student-submitted candidates...

πŸ“„ Uni-OPD: Unifying On-Policy Distillation with a Dual-Perspective Recipe
πŸ—“οΈ Published: 5/5/2026
πŸ”— http://arxiv.org/abs/2605.03677v1
πŸ‘₯ Authors: Wenjin Hou, Shangpin Peng, Weinong Wang, Zheng Ruan, Yue Zhang, Zhenglin Zhou, Mingqi Gao, Yifei Chen (possible past Baidu (China) affiliation), Kaiqi Wang, Hongming Yang, Chengquan Zhang (possible past Baidu (China) affiliation), Zhuotao Tian, Han Hu, Yi Yang (possible past Baidu (China) affiliation), Fei Wu (possible past Google (United States) affiliation), Hehe Fan
Abstract

On-policy distillation (OPD) has recently emerged as an effective post-training paradigm for consolidating the capabilities of specialized expert models into a single student model. Despite its empirical success, the conditions under which OPD yields reliable improvement remain poorly understood. In this work, we identify two fundamental bottlenecks that limit effective OPD: insufficient exploration of informative states and unreliable teacher supervision for student rollouts. Building on this i...

πŸ“„ Learning Discriminative Signed Distance Functions from Multi-scale Level-of-detail Features for 3D Anomaly Detection
πŸ—“οΈ Published: 5/5/2026
πŸ”— http://arxiv.org/abs/2605.03437v1
πŸ‘₯ Authors: Haibo Xiao, Hanzhe Liang, Jie Zhou (possible past Tsinghua University affiliation), Jinbao Wang, Can Gao (possible past Baidu (China) affiliation)
Abstract

Detecting anomalies from 3D point clouds has received increasing attention in the field of computer vision, with some group-based or point-based methods achieving impressive results in recent years. However, learning accurate point-wise representations for 3D anomaly detection faces great challenges due to the large scale and sparsity of point clouds. In this study, a surface-based method is proposed for 3D anomaly detection, which learns a discriminative signed distance function using multi-sca...

πŸ“„ Partial Effective Information Decomposition for Synergistic Causality
πŸ—“οΈ Published: 5/5/2026
πŸ”— http://arxiv.org/abs/2605.03267v1
πŸ‘₯ Authors: Mingzhe Yang, Shuo Wang (possible past Nvidia (United States) affiliation), Jiang Zhang (possible past Google (United States) affiliation)
Abstract

Causality is a central topic in scientific inquiry, yet for complex systems, the identification and analysis of synergistic causation remain a challenging and fundamental problem. In the context of causal relations among multivariate variables, a decomposition framework grounded in interventionist causation is still lacking. To address this gap, this paper proposes Partial Effective Information Decomposition (PEID), a framework that decomposes the influence of multiple source variables on a targ...

πŸ“„ OGPO: Sample Efficient Full-Finetuning of Generative Control Policies
πŸ—“οΈ Published: 5/4/2026
πŸ”— http://arxiv.org/abs/2605.03065v1
πŸ‘₯ Authors: Sarvesh Patil, Mitsuhiko Nakamoto, Manan Agarwal, Shashwat Saxena, Jesse Zhang, Giri Anantharaman, Cleah Winston, Chaoyi Pan, Douglas Chen, Nai-Chieh Huang, Zeynep Temel, Oliver Kroemer, Sergey Levine (possible past University Of Washington affiliation), Abhishek Gupta (possible past University Of California, Berkeley affiliation), Hongkai Da, Paarth Shah, Max Simchowitz
Abstract

Generative control policies (GCPs), such as diffusion- and flow-based control policies, have emerged as effective parameterizations for robot learning. This work introduces Off-policy Generative Policy Optimization (OGPO), a sample-efficient algorithm for finetuning GCPs that maintains off-policy critic networks to maximize data reuse and propagate policy gradients through the full generative process of the policy via a modified PPO objective, using critics as the terminal reward. OGPO achieves ...

πŸ“„ Multi-fidelity surrogates for mechanics of composites: from co-kriging to multi-fidelity neural networks
πŸ—“οΈ Published: 5/4/2026
πŸ”— http://arxiv.org/abs/2605.02871v1
πŸ‘₯ Authors: Haizhou Wen, Elham Kiyani, Gang Li (possible past Tsinghua University affiliation), Srikanth Pilla, George Em Karniadakis, Zhen Li (possible past Google (United States) affiliation)
Abstract

Composite materials exhibit strongly hierarchical and anisotropic properties governed by coupled mechanisms spanning constituents, plies, laminates, structures, and manufacturing history. This intrinsic complexity makes predictive modeling of composites expensive, because repeated experiments and high-fidelity simulations are needed to cover large design spaces of material, structure, and manufacturing. Multi-fidelity surrogate modeling addresses this challenge by combining abundant, less expens...

πŸ“„ VideoNet: A Large-Scale Dataset for Domain-Specific Action Recognition
πŸ—“οΈ Published: 5/4/2026
πŸ”— http://arxiv.org/abs/2605.02834v2
πŸ‘₯ Authors: Tanush Yadav, Mohammadreza Salehi, Jae Sung Park (possible past University Of California, Berkeley affiliation), Vivek Ramanujan, Hannaneh Hajishirzi (possible past University Of Washington affiliation), Yejin Choi (possible past Allen Institute For Artificial Intelligence affiliation), Ali Farhadi (possible past University Of Washington affiliation), Rohun Tripathi, Ranjay Krishna (possible past University Of Washington affiliation)
Abstract

Videos are unique in their ability to capture actions which transcend multiple frames. Accordingly, for many years action recognition was the quintessential task for video understanding. Unfortunately, due to a lack of sufficiently diverse and challenging data, modern vision-language models (VLMs) are no longer evaluated on their action recognition capabilities. To revitalize action recognition in the era of VLMs, we advocate for a returned focus on domain-specific actions. To this end, we intro...

πŸ“„ Visual Latents Know More Than They Say: Unsilencing Latent Reasoning in MLLMs
πŸ—“οΈ Published: 5/4/2026
πŸ”— http://arxiv.org/abs/2605.02735v1
πŸ‘₯ Authors: Xin Zhang (possible past Google (United States) affiliation), Qiqi Tao, Jiawei Du, Moyun Liu, Joey Tianyi Zhou (possible past Tencent (China) affiliation)
Abstract

Continuous latent-space reasoning offers a compact alternative to textual chain-of-thought for multimodal models, enabling high-dimensional visual evidence to be integrated without explicit reasoning tokens. However, we identify a previously overlooked optimization pathology in existing latent visual reasoning methods: although visual latents become semantically enriched during training, their contribution to final answer prediction is systematically suppressed. Within the shared parameter space...

πŸ“„ CARD: Coarse-to-fine Autoregressive Modeling with Radix-based Decomposition for Transferable Free Energy Estimation
πŸ—“οΈ Published: 5/4/2026
πŸ”— http://arxiv.org/abs/2605.02657v1
πŸ‘₯ Authors: Ziyang Yu, Yi He, Wenbing Huang (possible past Tsinghua University affiliation), Wen Yan, Yang Liu (possible past Tsinghua University affiliation)
Abstract

Estimating free energy differences quantifies thermodynamic preferences in molecular interactions, which is central to chemistry and drug discovery. Despite fruitful progress, existing methods still face key limitations: classical computational approaches remain prohibitively expensive due to their reliance on extensive molecular dynamics simulations, while deep learning-based methods are constrained by either less-expressive generative models or input dimensions tied to a specific system, resul...

*Notable papers are those with at least two authors from a "big" AI/ML lab.