📄 Notable* Recent AI/ML arXiv Papers

Last updated just now...

📄 EvoPolicyGym: Evaluating Autonomous Policy Evolution in Interactive Environments
🗓️ Published: 7/2/2026
🔗 http://arxiv.org/abs/2607.02440v1
👥 Authors: Zhilin Wang, Han Song, Runzhe Zhan, Jusen Du, Jiacheng Chen, Tianle Li, Qingyu Yin, Yulun Wu, Zhennan Shen, Tong Zhu (possible past Nvidia (United States) affiliation), Yanshu Li, Guanjie Chen, Derek F. Wong (possible past Tencent (China) affiliation), Yafu Li, Yu Cheng (possible past National University Of Singapore affiliation), Yang Yang (possible past Tencent (China) affiliation)
Abstract

Autonomous agents are increasingly expected to improve executable policies through feedback, yet existing evaluations often collapse this process into a final score or confound it with open-ended software-engineering progress. We introduce Autonomous Policy Evolution, a controlled evaluation setting in which a harness-model agent repeatedly edits an executable policy system under a fixed interaction budget. We instantiate this setting in EvoPolicyGym, a benchmark built from compact interactive R...

📄 Text-Driven 3D Indoor Scene Synthesis in Non-Manhattan Environments
🗓️ Published: 7/2/2026
🔗 http://arxiv.org/abs/2607.02407v1
👥 Authors: Xianhui Meng, Zirui Song, Yuchen Zhang (possible past University Of California, Berkeley affiliation), Li Zhang (possible past University Of Oxford affiliation), Yongxuan Lv, Xiuying Chen, Kun Wang, Yan Luo, Kai Chen (possible past Shanghai Jiao Tong University affiliation), Hangjun Ye, Long Chen (possible past Tencent (China) affiliation), Jun Liu (possible past Tencent (China) affiliation), Xiaoshuai Hao
Abstract

Large Language Models (LLMs) have demonstrated remarkable capabilities in 3D indoor synthesis for Manhattan environments. However, existing methods often fail to capture plausible object layout patterns in non-Manhattan settings, primarily because they struggle to model non-orthogonal spatial relationships, leading to high geometric violations and low physical fidelity. To address this challenge, we propose SPG-Layout, a novel text-driven framework designed to generate physically plausible indoo...

📄 Dynamic Neural Graph Encoding of Inference Processes in Deep Weight Space
🗓️ Published: 7/2/2026
🔗 http://arxiv.org/abs/2607.02166v1
👥 Authors: Di Wu, Huan Liu (possible past Tsinghua University affiliation), Zhixiang Chi, Yuanhao Yu, Konstantinos N. Plataniotis, Yang Wang (possible past Baidu (China) affiliation)
Abstract

The rapid advancements in using neural networks as implicit data representations have attracted significant interest in developing machine learning methods that analyze and process the weight spaces of other neural networks. However, efficiently handling these highdimensional weight spaces remains challenging. Existing methods often overlook the sequential nature of layer-by-layer processing in neural network inference. In this work, we propose a novel approach using dynamic graphs to represent ...

📄 Mirror Illusion Art
🗓️ Published: 7/2/2026
🔗 http://arxiv.org/abs/2607.02015v1
👥 Authors: Xiaopei Zhu, Zeyuan Li, Jun Zhu (possible past Tsinghua University affiliation), Xiaolin Hu (possible past Tsinghua University affiliation)
Abstract

Mirror Illusion Art is a novel reflection-conditioned 3D illusion where one object yields two target appearances (front and mirror). The task is formulated as inverse design from two target 2D images (front and mirror) to a printable 3D object with geometry and texture. Prior topology-driven and shadow-based approaches demand substantial manual effort, optimize shape only, and often yield non-smooth or incomplete geometry. To address these challenges, we propose AutoMIA, an automated Mirror Illu...

📄 Multimodal Knowledge Edit-Scoped Generalization for Online Recursive MLLM Editing
🗓️ Published: 7/2/2026
🔗 http://arxiv.org/abs/2607.01978v1
👥 Authors: Siyuan Li (possible past Tencent (China) affiliation), Youyuan Zhang, Ruitong Liu, Junxi Wang, Jing Li (possible past Tencent (China) affiliation)
Abstract

Online multimodal knowledge editing requires injecting a continual stream of visual-textual corrections into multimodal large language models (MLLMs) with bounded overhead and minimal disruption to unrelated behaviors. Existing editors mainly emphasize edit reliability and long-horizon stability, but rarely control the semantic boundary of each edit. Our pilot analyses of post-edit behaviors and internal neuronal activities reveal a scope gap behind reliable edits: instance-level success neither...

📄 PhysMani: Physics-principled 3D World Model for Dynamic Object Manipulation
🗓️ Published: 7/2/2026
🔗 http://arxiv.org/abs/2607.01938v1
👥 Authors: Peng Yun, Shouwang Huang, Hao Li (possible past Tsinghua University affiliation), Jinxi Li, Jianan Wang (possible past Deepmind (United Kingdom) affiliation), Bo Yang (possible past Tencent (China) affiliation)
Abstract

Manipulating fast and dynamically moving targets in unstructured 3D environments remains challenging for embodied AI. Existing visual-language-action models and world models struggle with accurate 3D geometry and physically meaningful forecasting. We propose PhysMani, a framework that couples a physics-principled 3D Gaussian world model with a future-aware action policy model. The world model learns a divergence-free Gaussian velocity field via online optimization for fast and physically grounde...

📄 SAB-LVLM: Significance-Aware Binarization for Large Vision-Language Models
🗓️ Published: 7/2/2026
🔗 http://arxiv.org/abs/2607.01876v1
👥 Authors: Qi Lyu, Jiahua Dong, Baichen Liu, Xudong Wang, Mingfei Han, Yulun Zhang, Fahad Shahbaz Khan (possible past Inception Institute Of Artificial Intelligence affiliation), Salman Khan (possible past Inception Institute Of Artificial Intelligence affiliation), Lianqing Liu, Zhi Han
Abstract

Large Vision-Language Models (LVLMs) have achieved remarkable progress in multimodal understanding, yet their enormous parameter scale and cross-modal computation incur substantial memory and latency overhead, severely limiting real-world deployment on resource-constrained devices. Binarization offers an attractive solution by drastically reducing storage and computational costs. However, existing binarization methods neglect the varying importance of weights across different layers and modaliti...

📄 Mixture-of-Parallelisms: Towards Memory-Efficient Training Stack for Mixture-of-Experts Models
🗓️ Published: 7/2/2026
🔗 http://arxiv.org/abs/2607.01844v1
👥 Authors: Xuan-Phi Nguyen, Shrey Pandit, Yiran Zhao, Semih Yavuz (possible past Google (United States) affiliation), Silvio Savarese (possible past Stanford University affiliation), Shafiq Joty
Abstract

This paper showcases a memory-efficient training stack for Mixture-of-Experts (MoE) models. It is a training paradigm that combines and specializes various existing and novel parallelism techniques at different layers and stages of the Mixture-of-Experts (MoE) model training pipeline. It leverages these techniques to achieve maximal efficiency given the physical constraints of CPU, CPU memory, GPU HBM memory, and the CPU-GPU, GPU-GPU, and node-node communication bandwidth of the GPU cluster. It ...

📄 Separating Expert Retention from Autonomous Source Inference in Raw-ECG-Replay-Free Continual ECG Deployment
🗓️ Published: 7/2/2026
🔗 http://arxiv.org/abs/2607.01674v1
👥 Authors: Yufan Lu, Xinhui Liu, Chenyang Xu, Yuxi Zhou, Hao Wang (possible past Tsinghua University affiliation), Shenda Hong (possible past Peking University affiliation)
Abstract

In multi-source ECG deployment, models may need to incorporate new data sources when earlier raw ECGs cannot be retained or replayed. Freezing a pretrained backbone and assigning each source an isolated classifier prevents parameter interference, but deployment still requires selecting an expert when source metadata are unavailable. We study this distinction through \ours{}, an incremental expert bank built on frozen 1024-dimensional ECGFounder features. Each arriving domain adds a balanced-soft...

📄 Procedural Memory Distillation: Online Reflection for Self-Improving Language Models
🗓️ Published: 7/1/2026
🔗 http://arxiv.org/abs/2607.01480v1
👥 Authors: Ye Liu, Srijan Bansal, Bo Pang, Yang Li (possible past Google (United States) affiliation), Zeyu Leo Liu, Yifei Ming, Zixuan Ke, Shafiq Joty, Semih Yavuz (possible past Google (United States) affiliation)
Abstract

Reinforcement learning with verifiable rewards (RLVR), along with recent selfdistillation variants such as SDPO, evaluates each rollout against a verifier and updates the policy from that episode-level signal. However, the richer procedural information in the rollout is rarely retained or reused. Across episodes and epochs, the model repeatedly encounters related problems under a changing policy, producing cross-episode signals that episode-local updates cannot capture: which strategies consiste...

📄 Auto-FL-Research: Agentic Search for Federated Learning Algorithms
🗓️ Published: 7/1/2026
🔗 http://arxiv.org/abs/2607.01366v1
👥 Authors: Holger R. Roth (possible past Nvidia (United States) affiliation), Ziyue Xu (possible past Nvidia (United States) affiliation), Chester Chen, Daguang Xu (possible past Nvidia (United States) affiliation), Peter Cnudde, Andrew Feng (possible past Nvidia (United States) affiliation)
Abstract

Federated learning (FL) research often depends on many small but consequential algorithmic choices: optimizer variants, server aggregation rules, local training schedules, normalization, regularization, and model architecture. These choices are expensive to explore manually and difficult to compare fairly when candidate changes can also alter the FL training or evaluation path. In this work, we present Auto-FL-Research (AFR), a constrained coding-agent workflow for FL algorithmic recipe search. ...

📄 EHHN: An Event-driven Heterogeneous Hypergraph Network for Object-Centric Next Activity Prediction
🗓️ Published: 7/2/2026
🔗 http://arxiv.org/abs/2607.01785v1
👥 Authors: Jiaxing Wang, Kaitao Chen, Zhubin Han, Chenyu Hou, Bin Cao (possible past Microsoft (United States) affiliation), Jing Fan, Ji Zhang (possible past Nvidia (United States) affiliation)
Abstract

Next activity prediction helps service-oriented processes anticipate upcoming steps before delays, exceptions, or service-level risks occur. Most existing methods assume classical single-case event logs, whereas real service processes often involve events shared by multiple typed business objects. Object-centric event logs (OCELs) capture such interactions, but current predictors remain limited. Flattening-based approaches lose cross-object context, and native OCEL graph-based approaches encode ...

📄 Denser $\neq$ Better: Limits of On-Policy Self-Distillation for Continual Post-Training
🗓️ Published: 7/2/2026
🔗 http://arxiv.org/abs/2607.01763v1
👥 Authors: Meng Wang (possible past Google (United States) affiliation), Haohan Zhao, Wenzhuo Liu, Lu Yang, Geng Liu, Haiyang Guo, Guo-Sen Xie (possible past Inception Institute Of Artificial Intelligence affiliation), Gaofeng Meng, Hongbin Liu, Fei Zhu
Abstract

Continual post-training enables foundation models to acquire new knowledge while preserving existing capabilities. Recent work suggests that on-policy learning can mitigate forgetting, with on-policy self-distillation emerging as a particularly attractive approach. In this work, we revisit this optimistic view through self-distillation policy optimization (SDPO). Our experiments show that SDPO can accelerate in-domain specialization when teacher signals are stable and well aligned, but it strugg...

📄 Black-Box Inference of LLM Architectural Properties with Restrictive API Access
🗓️ Published: 7/1/2026
🔗 http://arxiv.org/abs/2607.01313v1
👥 Authors: Christopher Ellis, Shreyas Chaudhari, Mei-Yu Wang, Leighton Barnes, Giulia Fanti (possible past University Of California, Berkeley affiliation), José M. F. Moura (possible past Carnegie Mellon University affiliation)
Abstract

In practice, most commercial LLM providers do not publicly release details of underlying LLM architectures. However, prior work has shown that given limited API access to an LLM (namely, top-$k$ logits and/or a logit bias function), one can recover certain architectural details of an LLM, such as the hidden dimension of the feed-forward network. Perhaps in response to these results, most commercial LLM providers have restricted their APIs to expose only the single logit for each decoded token, a...

📄 Distill to Detect: Exposing Stealth Biases in LLMs through Cartridge Distillation
🗓️ Published: 7/1/2026
🔗 http://arxiv.org/abs/2607.01208v1
👥 Authors: Shayan Talaei, Abhinav Chinta, Devvrit Khatri, Amin Karbasi, Azalia Mirhoseini (possible past Google (United States) affiliation), Amin Saberi (possible past Stanford University affiliation)
Abstract

Language models deployed in high-stakes roles can potentially favor certain entities, brands, or viewpoints, steering user decisions at scale. Such preferential biases can be introduced by any actor in the model's supply chain and are most dangerous when the model reveals its preference only on the relevant topic while behaving identically to its unmodified base on all other inputs. Recent work has shown that these biases can transfer through context distillation on semantically unrelated data, ...

📄 QuasiMoTTo: Quasi-Monte Carlo Test-Time Scaling
🗓️ Published: 7/1/2026
🔗 http://arxiv.org/abs/2607.01179v1
👥 Authors: Michael Y. Li, Anthony Zhan, Kanishk Gandhi, Noah D. Goodman (possible past Stanford University affiliation), Emily B. Fox (possible past Apple (United States) affiliation)
Abstract

Scaling inference compute, by generating many parallel attempts per problem, is a costly but reliable lever for improving language model capabilities. By default these attempts are generated independently, wasting inference compute on redundant solutions. This waste seems unavoidable. After all, independence is what makes parallel sampling trivial to scale. However, this tradeoff is not fundamental: there is a rich design space of samplers that generate correlated but exact samples entirely in p...

*Notable papers are those with at least two authors from a "big" AI/ML lab.