📄 Notable* Recent AI/ML arXiv Papers

Last updated just now...

📄 FAST: A Framework for Aligned Sampling and Training in Parallel Reinforcement Learning for Autonomous Driving
🗓️ Published: 6/19/2026
🔗 http://arxiv.org/abs/2606.21587v1
👥 Authors: Bonan Wang, Letian Tao, Bin Shuai, Jiaxin Gao, Wenxin Zhao, Wei Xiong, Kehua Sheng, Bo Zhang (possible past Tencent (China) affiliation), Yang Guan, Shengbo Eben Li (possible past Tsinghua University affiliation)
Abstract

Deep reinforcement learning is pivotal for closed-loop autonomous driving yet remains constrained by severe bottlenecks in sampling efficiency. Standard parallel sampling mitigates this but suffers from the straggler effect, where the premature termination of a single environment necessitates a synchronized batch re-initialization, leading to suboptimal sample utilization and prohibitive re-initialization latency. To address this, we propose FAST, a synchronous parallel framework tailored for cl...

📄 Decoupling the Declarative from the Procedural in Vision-Language-Action Models
🗓️ Published: 6/19/2026
🔗 http://arxiv.org/abs/2606.21496v1
👥 Authors: Nikolaos Tsagkas, Andreas Sochopoulos, Chris Xiaoxuan Lu (possible past University Of Oxford affiliation), Oisin Mac Aodha (possible past University Of Edinburgh affiliation), Alexandros Kouris
Abstract

Deploying generalist robotic agents in the real world requires transferable skills. Specifically, a policy trained to clone a behavior from object-specific demonstrations must generalize beyond that object, otherwise data collection requirements become intractable. Recently, fine-tuning of pre-trained billion-parameter Vision-Language Models (VLMs), initially on large-scale robot datasets and then on fewer scenario-specific demonstrations, has emerged as the predominant paradigm for designing Vi...

📄 Warning labels shift perceptions of sycophantic AI, but not its influence
🗓️ Published: 6/19/2026
🔗 http://arxiv.org/abs/2606.21317v1
👥 Authors: Lujain Ibrahim, Myra Cheng (possible past Deepmind (United Kingdom) affiliation), Cinoo Lee, Pranav Khadpe, Desmong Ong, Dan Jurafsky (possible past Stanford University affiliation), Diyi Yang (possible past Stanford University affiliation)
Abstract

Recent work has raised concerns about the influence of sycophantic AI on user judgment and relationships. One proposed mitigation, which has received regulatory attention, is to warn users about potentially harmful AI behaviors such as sycophancy. In a preregistered experiment in which participants (N = 2,610) discussed real interpersonal conflicts with an AI system, we test whether warning labels mitigate sycophancy's influence. We find that a basic AI disclosure (``This chatbot is AI'') has no...

📄 ARCO: Adaptive Rubric with Co-Evolution for Multi-Step LLM-Based Agents
🗓️ Published: 6/19/2026
🔗 http://arxiv.org/abs/2606.21262v1
👥 Authors: Zihang Tian, Jingsen Zhang, Rui Li (possible past Google (United States) affiliation), Xiaohe Bo, Yuanzi Li, Xu Chen (possible past Tencent (China) affiliation)
Abstract

Reinforcement learning for multi-step LLM agents often relies on scalar rewards that indicate success but cannot explain why a trajectory is good or bad. Rubric-based rewards improve interpretability through natural-language criteria, but existing methods score at the trajectory level and freeze the scorer behind a closed-source judge, leaving step-level credit assignment unresolved and the judge itself static. We propose ARCO (Adaptive Rubric CO-evolution), a rubric framework in which a same-sc...

📄 AdaMem: Learning What to Remember for Personalized Long-Horizon LLM Agents
🗓️ Published: 6/19/2026
🔗 http://arxiv.org/abs/2606.21144v1
👥 Authors: Xingyu Chen (possible past Tencent (China) affiliation), Rui Wang (possible past Tencent (China) affiliation), Zhaopeng Tu (possible past Tencent (China) affiliation), Liefeng Bo
Abstract

Long-term memory systems for Large Language Model (LLM) agents typically try to \emph{remember everything}, extracting memories uniformly to retain as many facts as possible. In production, however, inference cost and finite context budgets make this untenable: beyond consolidating raw dialogue into memory, an agent must exert \emph{write control}, efficiently keeping only the information each user actually cares about. Otherwise, long-horizon personalized interactions suffer \emph{memory bloat}...

📄 MammoExpert: Benchmarking Chain-of-Thought Reasoning in Mammography Diagnosis
🗓️ Published: 6/19/2026
🔗 http://arxiv.org/abs/2606.21119v1
👥 Authors: Di Dai, Bo Liu (possible past Meta (United States) affiliation), Youcheng Li, Haojun Yu, Zhouhang Bian, Quanlin Wu, Dong Wang (possible past Tsinghua University affiliation), Sichen Meng, Hongye Xuan, Zijie Lan, Shenda Hong (possible past Peking University affiliation), Liwei Wang (possible past Tencent (China) affiliation)
Abstract

Mammography is an essential tool for breast cancer detection, with millions of examinations conducted annually. However, publicly available high-quality mammography datasets for AI development remain limited in both scale and annotation richness, particularly regarding pathological subtype coverage and structured diagnostic reasoning annotations. In this paper, we present MammoExpert, the first mammography dataset with Chain-of-Thought reasoning annotations across three diagnostic phases: (i) pr...

📄 BioInsight: Multi-Agent Orchestration for Interactive Biomedical Knowledge Discovery
🗓️ Published: 6/19/2026
🔗 http://arxiv.org/abs/2606.20997v1
👥 Authors: Jieyi Wang, Bingxuan Li, Nanyi Jiang, Desong Meng, Zirui Fan, Yuxin Guo, Jiayu Liu, Kunlun Zhu, Eddie Yang, Xiusi Chen (possible past Peking University affiliation), Pan Lu (possible past Baidu (China) affiliation), Bingxin Zhao
Abstract

Biomedical researchers increasingly use AI-generated analyses and reports to interpret protein-level signals, but static outputs are often insufficient for research decision-making, where users need to inspect evidence, assess uncertainty, compare mechanisms, and refine hypotheses. We present \textsc{BioInsight}, a multi-agent system that moves from static biomedical report generation to interactive evidence-centered interactive interface generation. Given a disease name, a protein association t...

📄 Vesta: A Generalist Embodied Reasoning Model
🗓️ Published: 6/18/2026
🔗 http://arxiv.org/abs/2606.20905v1
👥 Authors: Johan Bjorck, Zhiqi Li, Yunze Man, Jing Wang (possible past Google (United States) affiliation), An-Chieh Cheng, Sifei Liu (possible past Nvidia (United States) affiliation), Shihao Wang, Zhiding Yu (possible past Nvidia (United States) affiliation), Abhishek Badki, Stan Birchfield (possible past Nvidia (United States) affiliation), Valts Blukis, Yevgen Chebotar (possible past Google (United States) affiliation), Siyi Chen, Sicong Leng, Yu-Cheng Chou, Tianli Ding, Boyi Li, Zhengyi Luo, Hang Su (possible past Tsinghua University affiliation), Jonathan Tremblay, Tingwu Wang, Bowen Wen, Jimmy Wu, Xianghui Xie, Hanrong Ye, Hongxu Yin, K. R. Zentner, Liangyan Gui, Yu-Xiong Wang (possible past Carnegie Mellon University affiliation), Yuke Zhu (possible past Stanford University affiliation), Linxi "jim" Fan, Jan Kautz (possible past Nvidia (United States) affiliation)
Abstract

Robots operating in open-world environments must seamlessly integrate localization, spatial reasoning, navigation, and long-horizon planning. While specialist models excel at individual tasks, deploying a multi-model stack is computationally expensive and prone to cascading errors. We present Vesta, a unified embodied generalist that consolidates these capabilities into a single foundation model. Our approach combines a diverse and massive curated corpus designed to induce spatial grounding and ...

📄 Fara-1.5: Scalable Learning Environments for Computer Use Agents
🗓️ Published: 6/18/2026
🔗 http://arxiv.org/abs/2606.20785v1
👥 Authors: Ahmed Awadallah, Sahil Gupta, Yash Lara, Yadong Lu, Hussein Mozannar, Akshay Nambi, Zach Nussbaum, Yash Pandya, Aravind Rajeswaran (possible past University Of Washington affiliation), Corby Rosset, Alexey Taymanov, Luiz Do Valle, Vibhav Vineet (possible past University Of Oxford affiliation), Spencer Whitehead, Andrew Zhao
Abstract

Collecting computer use data from human demonstrations is expensive and slow, motivating the need for scalable generation strategies. This requires two key ingredients: environments in which agents can act and verifiers that can judge whether their demonstrations succeeded. We introduce FaraGen1.5, a scalable data pipeline for computer use agents composed of three modular components: environments, solvers, and verifiers. FaraGen1.5 uses both live websites and synthetic environments that faithful...

📄 FreeStyle: Free Control of Style-Content Dual-Reference Generation from Community LoRA Mining
🗓️ Published: 6/18/2026
🔗 http://arxiv.org/abs/2606.20506v1
👥 Authors: Jinghong Lan, Wei Cheng, Yunuo Chen, Ziqi Ye, Peng Xing, Yixiao Fang, Rui Wang (possible past Tencent (China) affiliation), Yufeng Yang, Xuanyang Zhang, Xianfang Zeng, Difan Zou, Gang Yu (possible past Tencent (China) affiliation), Chi Zhang (possible past Peking University affiliation)
Abstract

Style-content dual-reference generation aims to synthesize an image that preserves the structure and semantics of a content reference while adopting the style of a separate style reference.Despite recent progress, this setting remains challenging because models must balance content fidelity, style alignment, and instruction following avoiding semantic leakage from the style reference.A key bottleneck is the lack of large-scale triplet data with clean content-style separation and broad long-tail ...

📄 SPOT-E: Test-Time Entropy Shaping with Visual Spotlights for Frozen VLMs
🗓️ Published: 6/18/2026
🔗 http://arxiv.org/abs/2606.20244v2
👥 Authors: Bo Yin, Xiaobin Hu (possible past Tencent (China) affiliation), Chengming Xu, Ruolin Shen, Mo Yang, Jiangning Zhang (possible past Tencent (China) affiliation), Peng-Tao Jiang, Cheng Tan, Shuicheng Yan (possible past National University Of Singapore affiliation)
Abstract

Vision-language models (VLMs) often underperform on evidence intensive tasks because decisive visual evidence are small, localized, and easy to overlook, leading to failures in evidence readout even when high-level reasoning is intact. Prior inference-time visual interventions can improve grounding without retraining, but they are largely open-loop and lack a mechanism to verify whether highlighted evidence is actually used. We study answer-span prediction entropy as a model-internal feedback si...

📄 ScholarQuest: A Taxonomy-Guided Benchmark for Agentic Academic Paper Search in Open Literature Environments
🗓️ Published: 6/18/2026
🔗 http://arxiv.org/abs/2606.20235v1
👥 Authors: Tingyue Pan, Mingyue Cheng, Daoyu Wang, Yitong Zhou, Jie Ouyang, Qi Liu (possible past Tencent (China) affiliation), Enhong Chen (possible past Baidu (China) affiliation)
Abstract

Academic paper search is a core step in scientific research, and LLM-based search agents are emerging as a promising paradigm for iterative, intent-driven literature exploration. However, existing benchmarks are insufficient for systematically evaluating agentic academic search under realistic open literature environments. We propose ScholarQuest, a large-scale, taxonomy-guided benchmark for agentic academic paper search. ScholarQuest is constructed from over 1,000 computer science topics and fo...

📄 From Texts to Scores: Tracing the Emergence of Essay Quality Representations in Large Language Models
🗓️ Published: 6/18/2026
🔗 http://arxiv.org/abs/2606.20152v1
👥 Authors: Jiaxu Zuo, Mu You, Kaixin Lan, Tao Fang, Yujia Huo, Henghua Shen, Lidia S. Chao (possible past Tencent (China) affiliation), Derek F. Wong (possible past Tencent (China) affiliation)
Abstract

Recent advances in Large Language Models (LLMs) have substantially transformed Automated Essay Scoring (AES), yet the internal mechanisms underlying LLM-based scoring remain poorly understood. In this work, we systematically analyze the hidden representations of eight LLMs across two English essay datasets (ASAP++, CSEE) and one Portuguese dataset (ENEM). Using linear probing, cross-prompt generalization, dimensionality reduction, and neuron-level analyses, we find consistent evidence that essay...

📄 Frequency-Aware Flow Matching for Continuous and Consistent Robotic Action Generation
🗓️ Published: 6/18/2026
🔗 http://arxiv.org/abs/2606.20135v1
👥 Authors: Jianing Guo, Fangzheng Chen, Zihao Mao, Wong Lik Hang Kenny, Zhenhong Wu, Yu Li (possible past Tencent (China) affiliation), Yishuai Cai, Yuanpei Chen, Yikun Ban, Kai Chen (possible past Shanghai Jiao Tong University affiliation), Qi Dou, Yaodong Yang, Xianglong Liu, Huijie Zhao, Simin Li
Abstract

Flow matching has emerged as a standard paradigm for robotic manipulation owing to its strong expressive power for modelling complex, multimodal action distributions, alongside similar approaches like diffusion policy. However, existing methods rely on discretized action chunks, making them brittle to demonstrations collected at heterogeneous control frequencies and prone to temporally inconsistent actions that degrade control stability. In this paper, we propose Frequency-Aware Flow Matching (F...

📄 Go-with-the-Track: Video Compositing and Motion Control with Point Tracking
🗓️ Published: 6/18/2026
🔗 http://arxiv.org/abs/2606.20891v1
👥 Authors: Koichi Namekata, Yash Kant, Zhizheng Liu, Ryan D Burgert, Yuancheng Xu, Kuan Heng Lin, Emmett Steven, Julien Philip, Li Ma, Andrea Vedaldi (possible past University Of Oxford affiliation), Paul Debevec (possible past Google (United States) affiliation), Ning Yu
Abstract

Filmmaking demands precise motion control and reference image compositing -- capabilities that existing methods treat separately. Point-track-conditioned image-to-video models restrict content insertion to the first frame, while reference-to-video models lack fine-grained spatial-temporal control over how reference content integrates across frames. We present Go-with-the-Track, which unifies both capabilities by jointly conditioning on multiple reference images and reference-anchored point-tra...

*Notable papers are those with at least two authors from a "big" AI/ML lab.