πŸ“„ Notable* Recent AI/ML arXiv Papers

Last updated just now...

πŸ“„ FreeStyle: Free Control of Style-Content Dual-Reference Generation from Community LoRA Mining
πŸ—“οΈ Published: 6/18/2026
πŸ”— http://arxiv.org/abs/2606.20506v1
πŸ‘₯ Authors: Jinghong Lan, Wei Cheng, Yunuo Chen, Ziqi Ye, Peng Xing, Yixiao Fang, Rui Wang (possible past Tencent (China) affiliation), Yufeng Yang, Xuanyang Zhang, Xianfang Zeng, Difan Zou, Gang Yu (possible past Tencent (China) affiliation), Chi Zhang (possible past Peking University affiliation)
Abstract

Style-content dual-reference generation aims to synthesize an image that preserves the structure and semantics of a content reference while adopting the style of a separate style reference.Despite recent progress, this setting remains challenging because models must balance content fidelity, style alignment, and instruction following avoiding semantic leakage from the style reference.A key bottleneck is the lack of large-scale triplet data with clean content-style separation and broad long-tail ...

πŸ“„ SPOT-E: Test-Time Entropy Shaping with Visual Spotlights for Frozen VLMs
πŸ—“οΈ Published: 6/18/2026
πŸ”— http://arxiv.org/abs/2606.20244v1
πŸ‘₯ Authors: Bo Yin, Xiaobin Hu (possible past Tencent (China) affiliation), Chengming Xu, Ruolin Shen, Mo Yang, Jiangning Zhang (possible past Tencent (China) affiliation), Peng-Tao Jiang, Cheng Tan, Shuicheng Yan (possible past National University Of Singapore affiliation)
Abstract

Vision-language models (VLMs) often underperform on evidence intensive tasks because decisive visual evidence are small, localized, and easy to overlook, leading to failures in evidence readout even when high-level reasoning is intact. Prior inference-time visual interventions can improve grounding without retraining, but they are largely open-loop and lack a mechanism to verify whether highlighted evidence is actually used. We study answer-span prediction entropy as a model-internal feedback si...

πŸ“„ ScholarQuest: A Taxonomy-Guided Benchmark for Agentic Academic Paper Search in Open Literature Environments
πŸ—“οΈ Published: 6/18/2026
πŸ”— http://arxiv.org/abs/2606.20235v1
πŸ‘₯ Authors: Tingyue Pan, Mingyue Cheng, Daoyu Wang, Yitong Zhou, Jie Ouyang, Qi Liu (possible past Tencent (China) affiliation), Enhong Chen (possible past Baidu (China) affiliation)
Abstract

Academic paper search is a core step in scientific research, and LLM-based search agents are emerging as a promising paradigm for iterative, intent-driven literature exploration. However, existing benchmarks are insufficient for systematically evaluating agentic academic search under realistic open literature environments. We propose ScholarQuest, a large-scale, taxonomy-guided benchmark for agentic academic paper search. ScholarQuest is constructed from over 1,000 computer science topics and fo...

πŸ“„ From Texts to Scores: Tracing the Emergence of Essay Quality Representations in Large Language Models
πŸ—“οΈ Published: 6/18/2026
πŸ”— http://arxiv.org/abs/2606.20152v1
πŸ‘₯ Authors: Jiaxu Zuo, Mu You, Kaixin Lan, Tao Fang, Yujia Huo, Henghua Shen, Lidia S. Chao (possible past Tencent (China) affiliation), Derek F. Wong (possible past Tencent (China) affiliation)
Abstract

Recent advances in Large Language Models (LLMs) have substantially transformed Automated Essay Scoring (AES), yet the internal mechanisms underlying LLM-based scoring remain poorly understood. In this work, we systematically analyze the hidden representations of eight LLMs across two English essay datasets (ASAP++, CSEE) and one Portuguese dataset (ENEM). Using linear probing, cross-prompt generalization, dimensionality reduction, and neuron-level analyses, we find consistent evidence that essay...

πŸ“„ Frequency-Aware Flow Matching for Continuous and Consistent Robotic Action Generation
πŸ—“οΈ Published: 6/18/2026
πŸ”— http://arxiv.org/abs/2606.20135v1
πŸ‘₯ Authors: Jianing Guo, Fangzheng Chen, Zihao Mao, Wong Lik Hang Kenny, Zhenhong Wu, Yu Li (possible past Tencent (China) affiliation), Yishuai Cai, Yuanpei Chen, Yikun Ban, Kai Chen (possible past Shanghai Jiao Tong University affiliation), Qi Dou, Yaodong Yang, Xianglong Liu, Huijie Zhao, Simin Li
Abstract

Flow matching has emerged as a standard paradigm for robotic manipulation owing to its strong expressive power for modelling complex, multimodal action distributions, alongside similar approaches like diffusion policy. However, existing methods rely on discretized action chunks, making them brittle to demonstrations collected at heterogeneous control frequencies and prone to temporally inconsistent actions that degrade control stability. In this paper, we propose Frequency-Aware Flow Matching (F...

πŸ“„ ENPIRE: Agentic Robot Policy Self-Improvement in the Real World
πŸ—“οΈ Published: 6/18/2026
πŸ”— http://arxiv.org/abs/2606.19980v1
πŸ‘₯ Authors: Wenli Xiao, Jia Xie, Tonghe Zhang, Haotian Lin, Letian "max" Fu, Haoru Xue, Jalen Lu, Yi Yang (possible past Baidu (China) affiliation), Cunxi Dai, Zi Wang, Jimmy Wu, Guanzhi Wang (possible past Stanford University affiliation), S. Shankar Sastry, Ken Goldberg (possible past University Of California, Berkeley affiliation), Linxi "jim" Fan, Yuke Zhu (possible past Stanford University affiliation), Guanya Shi
Abstract

Achieving dexterous robotic manipulation in the real world heavily relies on human supervision and algorithm engineering, which becomes a central bottleneck in the pursuit of general physical intelligence. Although emerging coding agents can generate code to automate algorithm search, their successes remain largely confined in digital environments. We conjecture that the missing abstraction to automate robotics research is a repeatable feedback loop for real-world policy improvement: reset the s...

πŸ“„ ROSE: Benchmarking the Perception-to-Action Gap in Multimodal Models
πŸ—“οΈ Published: 6/18/2026
πŸ”— http://arxiv.org/abs/2606.19965v1
πŸ‘₯ Authors: Yihao Wang, Zijian He (possible past Meta (United States) affiliation), Jie Ren (possible past Google (United States) affiliation), Keze Wang
Abstract

Multimodal large language models (MLLMs) are increasingly expected to act on visual information, yet the same scene may require different actions under different task contexts. How reliably can a model turn the same visual evidence into the action required by the current context? To answer this question, we introduce \textsc{ROSE} (\textbf{R}eference-conditioned \textbf{O}ddity and \textbf{S}ymbolic \textbf{E}xecution), a controlled benchmark that holds the visual scene fixed while varying regio...

πŸ“„ Data Standards for Humanoid Robotics: The Missing Infrastructure for Physical AI
πŸ—“οΈ Published: 6/18/2026
πŸ”— http://arxiv.org/abs/2606.19769v1
πŸ‘₯ Authors: Shaoshan Liu, Xiugong Qin, Xuan Wu, Xuan Xia, Ning Ding (possible past Tsinghua University affiliation), Jialu Liu (possible past Google (United States) affiliation), Jie Tang (possible past Tsinghua University affiliation)
Abstract

The scalability of humanoid robots will depend not only on models and hardware, but also on whether physical experience can accumulate across robots, tasks, organizations, and time. Drawing on the authors' work in developing ISO/WD 26264-1, Humanoid robot datasets -- Part 1: General requirements, within ISO/TC 299/WG 16, this article argues that data standards are becoming foundational infrastructure for Physical AI. We develop three insights. First, humanoid robot data is embodied interaction d...

πŸ“„ Denoising Implicit Feedback for Cold-start Recommendation
πŸ—“οΈ Published: 6/17/2026
πŸ”— http://arxiv.org/abs/2606.19658v1
πŸ‘₯ Authors: Gaode Chen, Shicheng Wang, Shikun Li, Rui Huang (possible past Google (United States) affiliation), Xinghua Zhang, Yunze Luo, Shipeng Li, Shiming Ge, Ruina Sun, Yinjie Jiang, Jun Zhang (possible past Tencent (China) affiliation)
Abstract

Implicit feedback is widely used in recommender systems due to its accessibility and generality, yet it usually presents noisy samples (e.g., clickbait, position bias). Meanwhile, recommenders inevitably face the item cold-start problem due to the continuous influx of new items. We identify that cold items are more prone to noisy samples due to the aforementioned factors, and researchers often overlook the significance of denoising implicit feedback for cold items. Previous denoising studies usu...

πŸ“„ Token Factory: Efficiently Integrating Diverse Signals into Large Recommendation Models
πŸ—“οΈ Published: 6/17/2026
πŸ”— http://arxiv.org/abs/2606.19635v1
πŸ‘₯ Authors: Xilun Chen, Shao-Chuan Wang, Baykal Cakici, Lukasz Heldt (possible past Google (United States) affiliation), Lichan Hong (possible past Google (United States) affiliation), Raghu Keshavan, Aniruddh Nath (possible past Google (United States) affiliation), Li Wei (possible past Google (United States) affiliation), Xinyang Xi
Abstract

Large Recommendation Models (LRMs) have demonstrated promising capabilities in industry-scale recommendation tasks. However, holistically integrating traditional signals into these transformer-based architectures effectively and efficiently remains a major challenge. Conventional approaches that "textualize" these signals directly or create discrete item representations often lead to excessively long prompts, substantial memory footprints, and high computational overhead. To overcome these limit...

πŸ“„ IHBench: Evaluating Post-Interruption Recovery in Voice Agents with Structured Workflows
πŸ—“οΈ Published: 6/17/2026
πŸ”— http://arxiv.org/abs/2606.19595v1
πŸ‘₯ Authors: Ahmad Salimi, Wentao Ma, Yuzhi Tang, Dongming Shen, Mu Li (possible past Carnegie Mellon University affiliation), Alex Smola (possible past Google (United States) affiliation)
Abstract

Voice agents deployed in structured workflows (customer service, healthcare scheduling, account management) must handle frequent user interruptions while maintaining progress through multi-step procedures. Existing benchmarks for speech-capable models focus on the timing of interruptions: barge-in detection, endpointing, and turn-taking dynamics. They leave unmeasured what happens after the interruption: does the agent resume the workflow at the correct step? Does it address the user's interject...

πŸ“„ Review of Machine Learning Models for Solar Energetic Particle Prediction
πŸ—“οΈ Published: 6/17/2026
πŸ”— http://arxiv.org/abs/2606.19539v1
πŸ‘₯ Authors: Spiridon Kasapis, Pouya Hosseinzadeh, Kathryn Whitman, Ricky Egeland, Manolis Georgoulis, Angelos Vourlidas, Athanasios Papaioannou, Eleni Lavasa, Anastasios Anastasiadis, Giorgos Giannopoulos, Andres Munoz-Jaramillo, Bala Poduval, Irina N. Kitiashvili, Alexander G. Kosovichev, Viacheslav Sadykov, Soukaina Filali Boubrahimi, Tate T. Hutchins, Hameedullah A. Farooki, Manuel E. Cuesta, Leng Y. Khoo, Sungmin Pak, Robert Czarnota, Jamie S. Rankin, Jamey Szalay, Mitchell M. Shen, Georgios Livadiotis, Zigong Xu, David J. Mccomas, Nikolaos Sarlis, Dionissios Hristopulos, Arik Posner, Alec J. Engell, Mohammed Abubakr Ali, Ali G. A. Abdelkawy, Abdelrazek M. K. Shaltout, M. M. Beheary, Christina O. Lee, Sigiava Aminalragia-Giamini, Constantinos Papadimitriou, Ingmar Sandberg, Savvas Raptis, Shah Muhammad Hamdi, Monica Laurenza, Mirko Stumpo, Sumanth A. Rotti, India Jackson, Aatiya Ali, Atilim Gunes Baydin, Nathan Schwadron, Subhamoy Chatterjee, Maher A. Dayeh, Gelu M. Nita, Patrick M. O'keefe, Chun Jie Chong, Paul Kosovich, Russell D. Marroquin, Berkay Aydin, Petrus C. Martens, Lulu Zhao, Yang Chen (possible past Tencent (China) affiliation), Yian Yu, Monica G. Bobra, Ward Manchester, Tamas Gombosi, Ming Zhang (possible past Peking University affiliation), Jesse Torres, Philip K. Chan, Mohamed Nedal, Kamen Kozarev, Peijin Zhang, Kimberly Moreland, Hazel M. Bain, Samuel Hart, Michael J. Starkey, Alan G. Ling, Simone Benella
Abstract

Solar energetic particle (SEP) events have attracted increasing attention due to their significant radiation hazards for aviation, spacecraft electronics, and human missions beyond Earth's magnetosphere. From a scientific perspective, SEP events are intriguing because they arise from a set of physical processes extending from the solar surface and corona through the heliosphere, offering insight into particle acceleration and transport mechanisms that are widely applicable across astrophysics. T...

πŸ“„ PerceptionDLM: Parallel Region Perception with Multimodal Diffusion Language Models
πŸ—“οΈ Published: 6/17/2026
πŸ”— http://arxiv.org/abs/2606.19534v1
πŸ‘₯ Authors: Yueyi Sun, Yuhao Wang, Jason Li (possible past Nvidia (United States) affiliation), Ye Tian, Tao Zhang (possible past Nvidia (United States) affiliation), Jacky Mai, Yihan Wang, Haochen Wang, Jinbin Bai, Ling Yang, Yunhai Tong
Abstract

Multimodal large language models (MLLMs) have achieved remarkable progress in visual understanding tasks. However, most existing MLLMs rely on autoregressive generation, which limits their efficiency for perception tasks that require captioning multiple regions. In this work, we propose PerceptionDLM, a multimodal diffusion language model optimized for efficient parallel region perception. Built upon PerceptionDLM-Base, a strong foundational baseline that achieves state-of-the-art performance am...

πŸ“„ Can In-Context Learning Support Intrinsic Curiosity?
πŸ—“οΈ Published: 6/17/2026
πŸ”— http://arxiv.org/abs/2606.19476v1
πŸ‘₯ Authors: Eric Elmoznino, Sangnie Bhardwaj, Johannes Von Oswald, Rajai Nasser, Blaise AgΓΌera Y Arcas (possible past Google (United States) affiliation), JoΓ£o Sacramento (possible past Eth Zurich affiliation), Rif A. Saurous (possible past Google (United States) affiliation), Guillaume Lajoie
Abstract

Effective machine learning depends not only on how we model data, but also on what data we choose to collect. While large sequence models have revolutionized data modeling, the problem of automated data selection, or "intrinsic curiosity", remains a significant challenge. Classic approaches incentivize exploration by rewarding an agent based on its "learning progress", which measures how much a newly acquired observation improves a world model's predictive ability. However, evaluating these rewa...

πŸ“„ Scaling Generative Foundation Models for Chest Radiography with Rectified Flow Transformers
πŸ—“οΈ Published: 6/17/2026
πŸ”— http://arxiv.org/abs/2606.19460v1
πŸ‘₯ Authors: Fabio De Sousa Ribeiro, Emma A. M. Stanley, Charles Jones, Tian Xia (possible past Baidu (China) affiliation), Dominic C. Marshall, Laurent Renard TrichΓ©, Christopher V. Cosgriff, Panagiotis Dimitrakopoulos, Sotirios A. Tsaftaris (possible past University Of Edinburgh affiliation), Ben Glocker
Abstract

We introduce the first generative foundation model for chest radiograph synthesis trained from scratch at the billion-parameter scale. Existing radiographic AI models often suffer from poor generalisation across patient subpopulations, institutions, and acquisition settings, resulting in limited real-world clinical utility. Controlled, high-fidelity synthesis of chest radiographs is a promising path toward diversifying clinical datasets and evaluating the robustness of diagnostic models. Therefo...

πŸ“„ Playful Agentic Robot Learning
πŸ—“οΈ Published: 6/17/2026
πŸ”— http://arxiv.org/abs/2606.19419v1
πŸ‘₯ Authors: Junyi Zhang, Jiaxin Ge, Hanjun Yoo, Letian Fu, Zihan Yang, Yaowei Liu, Raj Saravanan, Shaofeng Yin, Justin Yu, Dantong Niu, Zirui Wang, Roei Herzig, Ken Goldberg (possible past University Of California, Berkeley affiliation), Yutong Bai, David M. Chan, Ion Stoica (possible past University Of California, Berkeley affiliation), Angjoo Kanazawa (possible past University Of California, Berkeley affiliation), Jiahui Lei, Haiwen Feng, Trevor Darrell (possible past University Of California, Berkeley affiliation)
Abstract

Current agentic robot systems can write executable Code-as-Policy programs, observe feedback, and revise behavior across multiple attempts, but they remain largely task-driven: reusable skills are acquired only after explicit instructions. We study Playful Agentic Robot Learning, where an embodied coding agent uses self-directed play as a continual skill-learning stage before downstream tasks arrive. We introduce RATs, Robotics Agent Teams designed for play-time skill acquisition. During play, R...

πŸ“„ Self-Adaptive Scale Handling for Forecasting Time Series with Scale Heterogeneity
πŸ—“οΈ Published: 6/18/2026
πŸ”— http://arxiv.org/abs/2606.20010v1
πŸ‘₯ Authors: Xu Zhang (possible past Tencent (China) affiliation), Zhengang Huang, Yunzhi Wu, Xun Lu, Erpeng Qi, Yunkai Chen, Zhongya Xue, Peng Wang (possible past Peking University affiliation), Wei Wang (possible past University Of Oxford affiliation)
Abstract

Current time series forecasting (TSF) research predominantly focuses on scale-homogeneous data, where different time series share similar numerical magnitude ranges. However, in real-world industrial scenarios such as financial product sales, different time series often differ by orders of magnitude (scale heterogeneity). Since these series share similar temporal patterns, joint modeling is desirable for better data utilization, yet existing scaling methods either compress low-scale signals (glo...

πŸ“„ VIMPO: Value-Implicit Policy Optimization for LLMs
πŸ—“οΈ Published: 6/18/2026
πŸ”— http://arxiv.org/abs/2606.20008v1
πŸ‘₯ Authors: Zhewei Kang, Aosong Feng, Sergey Levine (possible past University Of Washington affiliation), Dawn Song (possible past University Of California, Berkeley affiliation), Xuandong Zhao
Abstract

Reinforcement learning with verifiable rewards has become a central tool for improving the reasoning ability of large language models, but current methods face a trade-off between simplicity and credit assignment. Group-relative methods such as GRPO avoid training a critic, but typically assign a trajectory-level advantage to every token. Actor-critic methods provide denser learning signals, but require a learned value function with its own training instability. We introduce VIMPO, a critic-free...

πŸ“„ DF-ExpEnse: Diffusion Filtered Exploration for Sample Efficient Finetuning
πŸ—“οΈ Published: 6/17/2026
πŸ”— http://arxiv.org/abs/2606.19656v1
πŸ‘₯ Authors: Calvin Luo, Chen Sun (possible past Google (United States) affiliation), Shuran Song (possible past Google (United States) affiliation)
Abstract

A natural recipe for intelligent robotic decision-making is initializing from pretrained generative control policies, which have summarized offline experience, and adapting them to self-collected online experience. We present DF-ExpEnse, an exploration technique that improves the quality of online experience collection, thus increasing finetuning sample-efficiency. DF-ExpEnse leverages the multimodal modeling capabilities of the generative control policy to create an expressive and tractably eva...

πŸ“„ MassSpecGym in the Wild: Uncovering and Correcting Evaluation Pitfalls in AI-Driven Molecule Discovery
πŸ—“οΈ Published: 6/17/2026
πŸ”— http://arxiv.org/abs/2606.19624v1
πŸ‘₯ Authors: Hongxuan Liu, Roman Bushuiev, Ivy Lightheart, Mrunali Manjrekar, Anton Bushuiev, Magdalena Lederbauer, Filip Jozefov, Yinkai Wang, Soha Hassoun (possible past University Of Washington affiliation), Josef Sivic, James Taylor, Runzhong Wang, David Healey, TomΓ‘Ε‘ Pluskal, Connor W. Coley (possible past Massachusetts Institute Of Technology affiliation)
Abstract

Reliable benchmarking is critical for developing machine learning models for tandem mass spectrometry (MS/MS) based molecule discovery. Subtle issues in experimental design and model evaluation procedures can degrade the trustworthiness of such benchmarks and lead to erroneous conclusions. We conduct a thorough review of model evaluation issues in the recent MS/MS machine learning literature, using the standard MassSpecGym benchmark suite as a case study to illustrate the impact of these issues....

πŸ“„ Predicting Mergeability of Parameter-Efficient Fine-Tuning Updates
πŸ—“οΈ Published: 6/17/2026
πŸ”— http://arxiv.org/abs/2606.19549v1
πŸ‘₯ Authors: Lin Tang, Wei Zhang (possible past Tsinghua University affiliation), Jing Li (possible past Tencent (China) affiliation), Hongyu Chen, Ming Zhao (possible past Tencent (China) affiliation), Yuxuan Wang (possible past Google (United States) affiliation)
Abstract

Low-rank adaptation (LoRA) makes it cheap to train many domain- and task-specific language model adapters, but whether two adapters can be merged is usually discovered only after both have been fully trained and evaluated. This late feedback is costly: adapters that are strong in isolation can interfere destructively once their updates are combined. We ask whether this outcome can be anticipated. We formalize adapter mergeability as the degree to which an adapter preserves its single-task utilit...

πŸ“„ 3D-DLP: Self-Supervised 3D Object-Centric Scene Representation Learning
πŸ—“οΈ Published: 6/17/2026
πŸ”— http://arxiv.org/abs/2606.19451v1
πŸ‘₯ Authors: Ellina Zhang, Madhaven Iyengar, Amir Zadeh, Chuan Li, Deepak Pathak (possible past University Of California, Berkeley affiliation), David Held (possible past University Of California, Berkeley affiliation), Tal Daniel
Abstract

We introduce 3D-DLP, a self-supervised object-centric representation learning model that decomposes scene-level RGB-D or voxel observations into a set of 3D latent particles. Building on the Deep Latent Particles (DLP) framework, each particle encodes disentangled attributes, including 3D keypoint position, bounding box dimensions, and appearance features, and represents a distinct entity in the scene. The model learns interpretable per-particle segmentation maps through an end-to-end self-super...

*Notable papers are those with at least two authors from a "big" AI/ML lab.