📄 Notable* Recent AI/ML arXiv Papers

Last updated just now...

📄 VGGT-Edit: Feed-forward Native 3D Scene Editing with Residual Field Prediction
🗓️ Published: 5/14/2026
🔗 http://arxiv.org/abs/2605.15186v1
👥 Authors: Kaixin Zhu, Yiwen Tang, Yifan Yang (possible past Tencent (China) affiliation), Renrui Zhang, Bohan Zeng, Ziyu Guo, Ruichuan An, Zhou Liu, Qizhi Chen, Delin Qu, Jaehong Yoon, Wentao Zhang (possible past Mila - Quebec Artificial Intelligence Institute affiliation)
Abstract

High-quality 3D scene reconstruction has recently advanced toward generalizable feed-forward architectures, enabling the generation of complex environments in a single forward pass. However, despite their strong performance in static scene perception, these models remain limited in responding to dynamic human instructions, which restricts their use in interactive applications. Existing editing methods typically rely on a 2D-lifting strategy, where individual views are edited independently and th...

📄 MeMo: Memory as a Model
🗓️ Published: 5/14/2026
🔗 http://arxiv.org/abs/2605.15156v1
👥 Authors: Ryan Wei Heng Quek, Sanghyuk Lee, Alfred Wei Lun Leong, Arun Verma, Alok Prakash, Nancy F. Chen, Bryan Kian Hsiang Low, Daniela Rus (possible past Massachusetts Institute Of Technology affiliation), Armando Solar-Lezama (possible past Massachusetts Institute Of Technology affiliation)
Abstract

Large language models (LLMs) achieve strong performance across a wide range of tasks, but remain frozen after pretraining until subsequent updates. Many real-world applications require timely, domain-specific information, motivating the need for efficient mechanisms to incorporate new knowledge. In this paper, we introduce MeMo (Memory as a Model), a modular framework that encodes new knowledge into a dedicated memory model while keeping the LLM parameters unchanged. Compared to existing methods...

📄 Pelican-Unified 1.0: A Unified Embodied Intelligence Model for Understanding, Reasoning, Imagination and Action
🗓️ Published: 5/14/2026
🔗 http://arxiv.org/abs/2605.15153v1
👥 Authors: Yi Zhang (possible past Google (United States) affiliation), Yinda Chen, Che Liu, Zeyuan Ding, Jin Xu (possible past Tencent (China) affiliation), Shilong Zou, Junwei Liao, Jiayu Hu, Xiancong Ren, Xiaopeng Zhang, Yechi Liu, Haoyuan Shi, Zecong Tang, Haosong Sun, Renwen Cui, Kuishu Wu, Wenhai Liu, Yang Xu, Yingji Zhang, Yidong Wang, Senkang Hu, Jinpeng Lu, Nga Teng Chan, Yechen Wu, Yong Dai, Jian Tang, Xiaozhu Ju
Abstract

We present Pelican-Unified 1.0, the first embodied foundation model trained according to the principle of unification. Pelican-Unified 1.0 uses a single VLM as a unified understanding module, mapping scenes, instructions, visual contexts, and action histories into a shared semantic space. The same VLM also serves as a unified reasoning module, autoregressively producing task-, action-, and future-oriented chains of thought in a single forward pass and projecting the final hidden state into a den...

📄 EverAnimate: Minute-Scale Human Animation via Latent Flow Restoration
🗓️ Published: 5/14/2026
🔗 http://arxiv.org/abs/2605.15042v1
👥 Authors: Wuyang Li, Yang Gao (possible past Tencent (China) affiliation), Mariam Hassan, Lan Feng, Wentao Pan (possible past Tsinghua University affiliation), Po-Chien Luan, Alexandre Alahi
Abstract

We propose EverAnimate, an efficient post-training method for long-horizon animated video generation that preserves visual quality and character identity. Long-form animation remains challenging because highly dynamic human motion must be synthesized against relatively static environments, making chunk-based generation prone to accumulated drift: (i) low-level quality drift, such as progressive degradation of static backgrounds, and (ii) high-level semantic drift, such as inconsistent character ...

📄 Orchard: An Open-Source Agentic Modeling Framework
🗓️ Published: 5/14/2026
🔗 http://arxiv.org/abs/2605.15040v1
👥 Authors: Baolin Peng, Wenlin Yao, Qianhui Wu, Hao Cheng (possible past Tencent (China) affiliation), Xiao Yu, Rui Yang, Tao Ge, Alessandrio Sordoni, Xingdi Yuan, Yelong Shen (possible past Tencent (China) affiliation), Pengcheng He, Tong Zhang (possible past Tencent (China) affiliation), Zhou Yu, Jianfeng Gao (possible past Microsoft (United States) affiliation)
Abstract

Agentic modeling aims to transform LLMs into autonomous agents capable of solving complex tasks through planning, reasoning, tool use, and multi-turn interaction with environments. Despite major investment, open research remains constrained by infrastructure and training gaps. Many high-performing systems rely on proprietary codebases, models, or services, while most open-source frameworks focus on orchestration and evaluation rather than scalable agent training. We present Orchard, an open-sour...

📄 MHSA: A Lightweight Framework for Mitigating Hallucinations via Steered Attention in LVLMs
🗓️ Published: 5/14/2026
🔗 http://arxiv.org/abs/2605.14966v1
👥 Authors: Wei Ding, Yilin Li, Yudong Zhang, Ruobing Xie (possible past Tencent (China) affiliation), Xingwu Sun (possible past Baidu (China) affiliation), Jiansheng Chen, Yu Wang (possible past Tsinghua University affiliation)
Abstract

Large vision-language models (LVLMs) have achieved remarkable performance across diverse multimodal tasks, yet they continue to suffer from hallucinations, generating content that is inconsistent with the visual input. Prior work DHCP (Detecting Hallucinations by Cross-modal Attention Pattern) has explored hallucination detection from the perspective of cross-modal attention, but does not address hallucination mitigation. In this paper, we propose MHSA (Mitigating Hallucinations via Steered Atte...

📄 Beyond Individual Intelligence: Surveying Collaboration, Failure Attribution, and Self-Evolution in LLM-based Multi-Agent Systems
🗓️ Published: 5/14/2026
🔗 http://arxiv.org/abs/2605.14892v1
👥 Authors: Shihao Qi, Jie Ma (possible past University Of Oxford affiliation), Rui Xing, Wei Guo, Xiao Huang, Zhitao Gao, Jianhao Deng, Jun Liu (possible past Tencent (China) affiliation), Lingling Zhang (possible past Google (United States) affiliation), Bifan Wei, Boqian Yang, Pinghui Wang, Jianwen Sun, Jing Tao, Yaqiang Wu, Hui Liu, Yu Yao, Tongliang Liu
Abstract

LLM-based autonomous agents have demonstrated strong capabilities in reasoning, planning, and tool use, yet remain limited when tasks require sustained coordination across roles, tools, and environments. Multi-agent systems address this through structured collaboration among specialized agents, but tighter coordination also amplifies a less explored risk: errors can propagate across agents and interaction rounds, producing failures that are difficult to diagnose and rarely translate into structu...

📄 Towards In-Depth Root Cause Localization for Microservices with Multi-Agent Recursion-of-Thought
🗓️ Published: 5/14/2026
🔗 http://arxiv.org/abs/2605.14866v1
👥 Authors: Lingzhe Zhang, Tong Jia, Kangjin Wang, Chiming Duan, Minghua He, Rongqian Wang, Xi Peng, Meiling Wang (possible past Baidu (China) affiliation), Gong Zhang, Renhai Chen, Ying Li (possible past Meta (United States) affiliation)
Abstract

As modern microservice systems grow increasingly complex due to dynamic interactions and evolving runtime environments, they experience failures with rising frequency. Ensuring system reliability therefore critically depends on accurate root cause localization (RCL). While numerous traditional machine learning and deep learning approaches have been explored for this task, they often suffer from limited interpretability and poor transferability across deployments. More recently, large language mo...

📄 A Deterministic Agentic Workflow for HS Tariff Classification: Multi-Dimensional Rule Reasoning with Interpretable Decisions
🗓️ Published: 5/14/2026
🔗 http://arxiv.org/abs/2605.14857v1
👥 Authors: Yu Zhang (possible past Google (United States) affiliation), Dongjiang Zhuang, Qu Zhou, Zheng Huang, Junhe Wu, Jing Cao, Kai Chen (possible past Shanghai Jiao Tong University affiliation)
Abstract

Harmonized System (HS) tariff classification is a high-stakes, expert-level task in which a free-form product description must be mapped to a specific six- or eight-digit code under the General Interpretive Rules (GIR), section notes, chapter notes, and Explanatory Notes. The difficulty lies not in knowledge volume but in *multi-dimensional rule reasoning*: a correct classification must satisfy competing priority rules along several axes simultaneously, including material, form, function, essent...

📄 Video2GUI: Synthesizing Large-Scale Interaction Trajectories for Generalized GUI Agent Pretraining
🗓️ Published: 5/14/2026
🔗 http://arxiv.org/abs/2605.14747v1
👥 Authors: Weimin Xiong, Shuhao Gu, Bowen Ye, Zihao Yue, Lei Li (possible past Carnegie Mellon University affiliation), Feifan Song, Sujian Li (possible past Peking University affiliation), Hao Tian (possible past Baidu (China) affiliation)
Abstract

Recent advances in multimodal large language models have driven growing interest in graphical user interface (GUI) agents, yet their generalization remains constrained by the scarcity of large-scale training data spanning diverse real-world applications. Existing datasets rely heavily on costly manual annotations and are typically confined to narrow domains. To address this challenge, we propose Video2GUI, a fully automated framework that extracts grounded GUI interaction trajectories directly f...

📄 $π$-Bench: Evaluating Proactive Personal Assistant Agents in Long-Horizon Workflows
🗓️ Published: 5/14/2026
🔗 http://arxiv.org/abs/2605.14678v1
👥 Authors: Haoran Zhang, Luxin Xu, Zhilin Wang, Runquan Gui, Shunkai Zhang, Haodi Lei, Zihao He, Bingsu He, Chicheng Qin, Tong Zhu (possible past Nvidia (United States) affiliation), Xiaoye Qu, Yang Yang (possible past Tencent (China) affiliation), Yu Cheng (possible past National University Of Singapore affiliation), Yafu Li
Abstract

The rise of personal assistant agents, e.g., OpenClaw, highlights the growing potential of large language models to support users across everyday life and work. A core challenge in these settings is proactive assistance, since users often begin with underspecified requests and leave important needs, constraints, or preferences unstated. However, existing benchmarks rarely evaluate whether agents can identify and act on such hidden intents before they are explicitly stated, especially in sustaine...

📄 Resolving Action Bottleneck: Agentic Reinforcement Learning Informed by Token-Level Energy
🗓️ Published: 5/14/2026
🔗 http://arxiv.org/abs/2605.14558v1
👥 Authors: Langzhou He, Junyou Zhu, Yue Zhou, Zhengyao Gu, Junhua Liu, Wei-Chieh Huang, Henry Peng Zou, David Wipf, Philip S. Yu (possible past Tsinghua University affiliation), Qitian Wu (possible past Shanghai Jiao Tong University affiliation)
Abstract

Agentic reinforcement learning trains large language models using multi-turn trajectories that interleave long reasoning traces with short environment-facing actions. Common policy-gradient methods, such as PPO and GRPO, treat each token in a trajectory equally, leading to uniform credit assignment. In this paper, we critically demonstrate that such uniform credit assignment largely misallocates token-level training signals. From an energy-based modeling perspective, we show that token-level tra...

📄 HASTE: Training-Free Video Diffusion Acceleration via Head-Wise Adaptive Sparse Attention
🗓️ Published: 5/14/2026
🔗 http://arxiv.org/abs/2605.14513v1
👥 Authors: Xuzhe Zheng, Yuexiao Ma, Jing Xu (possible past Meta (United States) affiliation), Xiawu Zheng, Rongrong Ji (possible past Tencent (China) affiliation), Fei Chao
Abstract

Diffusion-based video generation has advanced substantially in visual fidelity and temporal coherence, but practical deployment remains limited by the quadratic complexity of full attention. Training-free sparse attention is attractive because it accelerates pretrained models without retraining, yet existing online top-$p$ sparse attention still spends non-negligible cost on mask prediction and applies shared thresholds despite strong head-level heterogeneity. We show that these two overlooked f...

📄 Asymmetric Generative Recommendation via Multi-Expert Projection and Multi-Faceted Hierarchical Quantization
🗓️ Published: 5/14/2026
🔗 http://arxiv.org/abs/2605.14512v1
👥 Authors: Bin Huang, Xin Wang (possible past University Of Edinburgh affiliation), Junwei Pan, Yongqi Zhou, Yifeng Zhou, Zhixiang Feng, Shudong Huang, Haijie Gu, Wenwu Zhu (possible past Tsinghua University affiliation)
Abstract

Generative Recommendation (GenRec) models reformulate recommendation as a sequence generation task, representing items as discrete Semantic IDs used symmetrically as both inputs and prediction targets. We identify a critical dual-stage information bottleneck in this design: (1) the Input Bottleneck, where lossy quantization degrades fine-grained semantics, while popularity bias skews the learned representations toward frequent items, and (2) the Output Bottleneck, where imprecise discrete target...

📄 Head Forcing: Long Autoregressive Video Generation via Head Heterogeneity
🗓️ Published: 5/14/2026
🔗 http://arxiv.org/abs/2605.14487v1
👥 Authors: Jiahao Tian, Yiwei Wang (possible past Google (United States) affiliation), Gang Yu (possible past Tencent (China) affiliation), Chi Zhang (possible past Peking University affiliation)
Abstract

Autoregressive video diffusion models support real-time synthesis but suffer from error accumulation and context loss over long horizons. We discover that attention heads in AR video diffusion transformers serve functionally distinct roles as local heads for detail refinement, anchor heads for structural stabilization, and memory heads for long-range context aggregation, yet existing methods treat them uniformly, leading to suboptimal KV cache allocation. We propose Head Forcing, a training-free...

📄 A plug-and-play generative framework for multi-satellite precipitation estimation
🗓️ Published: 5/14/2026
🔗 http://arxiv.org/abs/2605.14426v1
👥 Authors: Yunfan Yang, Haofei Sun, Xiuyu Sun, Wei Han (possible past Google (United States) affiliation), Xiaoze Xu, Xingtao Song, Jun Li, Zhiqiu Gao, Wei Huang (possible past Google (United States) affiliation), Hao Li (possible past Tsinghua University affiliation)
Abstract

Reliable precipitation monitoring is essential for disaster risk reduction, water resources management, and agricultural decision-making. Multi-source satellite observations, particularly the combination of geostationary infrared and passive microwave measurements, have become a primary means of precipitation detection. Traditional multi-source satellite precipitation estimation methods remain computationally inefficient, and many deep learning methods lack the flexibility to incorporate new sen...

📄 Nexus : An Agentic Framework for Time Series Forecasting
🗓️ Published: 5/14/2026
🔗 http://arxiv.org/abs/2605.14389v1
👥 Authors: Sarkar Snigdha Sarathi Das, Palash Goyal, Mihir Parmar, Nanyun Peng, Vishy Tirumalashetty, Chun-Liang Li, Rui Zhang, Jinsung Yoon (possible past Google (United States) affiliation), Tomas Pfister (possible past University Of Oxford affiliation)
Abstract

Time series forecasting is not just numerical extrapolation, but often requires reasoning with unstructured contextual data such as news or events. While specialized Time Series Foundation Models (TSFMs) excel at forecasting based on numerical patterns, they remain unaware to real-world textual signals. Conversely, while LLMs are emerging as zero-shot forecasters, their performance remains uneven across domains and contextual grounding. To bridge this gap, we introduce Nexus, a multi-agent forec...

📄 Herculean: An Agentic Benchmark for Financial Intelligence
🗓️ Published: 5/14/2026
🔗 http://arxiv.org/abs/2605.14355v1
👥 Authors: Xueqing Peng, Zhuohan Xie, Yupeng Cao, Haohang Li, Lingfei Qian, Yan Wang (possible past Tencent (China) affiliation), Vincent Jim Zhang, Huan He, Xuguang Ai, Linhai Ma, Ruoyu Xiang, Yueru He, Yi Han, Shuyao Wang, Yuqing Guo, Mingyang Jiang, Yilun Zhao, Youzhong Dong, Xiaoyu Wang, Yankai Chen, Ye Yuan (possible past Carnegie Mellon University affiliation), Qiyuan Zhang, Fuyuan Lyu, Haolun Wu, Yonghan Yang, Zichen Zhao, Yuyang Dai, Fan Zhang, Rania Elbadry, Ayesha Gull, Muhammad Usman Safder, Nuo Chen, Fengbin Zhu, Tianshi Cai, Zimu Wang, Polydoros Giannouris, Yuechen Jiang, Zhiwei Liu, Mohsinul Kabir, Yuyan Wang, Yixiang Zheng, Yangyang Yu, Weijin Liu, Wenbo Cao, Anke Xu, Peng Lu, Jerry Huang, Fengran Mo, Mingquan Lin, Prayag Tiwari, Yijia Zhao, Victor Gutierrez Basulto, Xiao-Yang Liu, Kaleb E Smith (possible past Nvidia (United States) affiliation), Jiahuan Pei, Arman Cohan, Jimin Huang, Yuehua Tang, Alejandro Lopez-Lira, Xi Chen (possible past University Of California, Berkeley affiliation), Xue Liu, Junichi Tsujii, Jian-Yun Nie, Sophia Ananiadou
Abstract

As AI agents improve, the central question is no longer whether they can solve isolated well-defined financial tasks, but whether they can reliably carry out financial professional work. Existing financial benchmarks offer only a partial view of this ability, as they primarily evaluate static competencies such as question answering, retrieval, summarization, and classification. We introduce Herculean, the first skilled benchmark for agentic financial intelligence spanning four representative wor...

📄 Learning from Language Feedback via Variational Policy Distillation
🗓️ Published: 5/14/2026
🔗 http://arxiv.org/abs/2605.15113v1
👥 Authors: Yang Li (possible past Google (United States) affiliation), Erik Nijkamp, Semih Yavuz (possible past Google (United States) affiliation), Shafiq Rayhan Joty
Abstract

Reinforcement learning from verifiable rewards (RLVR) suffers from sparse outcome signals, creating severe exploration bottlenecks on complex reasoning tasks. Recent on-policy self-distillation methods attempt to address this by utilizing language feedback to generate dense, token-level supervision. However, these approaches rely on a fixed, passive teacher to interpret the feedback. As the student policy improves, the teacher's zero-shot assessment capabilities plateau, ultimately halting furth...

📄 Octopus: History-Free Gradient Orthogonalization for Continual Learning in Multimodal Large Language Models
🗓️ Published: 5/14/2026
🔗 http://arxiv.org/abs/2605.14938v1
👥 Authors: Yuehao Liu, Shanyan Guan, Weijia Zhang, Xuanming Shang, Yanhao Ge, Wei Li (possible past Peking University affiliation), Chao Ma (possible past Shanghai Jiao Tong University affiliation)
Abstract

Continual learning in multimodal large language models (MLLMs) aims to sequentially acquire knowledge while mitigating catastrophic forgetting, yet existing methods face inherent limitations: architecture-based approaches incur additional computational overhead and often generalize poorly to new tasks, rehearsal-based methods rely on storing historical data, raising privacy and storage concerns, and conventional regularization-based strategies alone are insufficient to fully prevent parameter in...

📄 Selective Safety Steering via Value-Filtered Decoding
🗓️ Published: 5/14/2026
🔗 http://arxiv.org/abs/2605.14746v1
👥 Authors: Bat-Sheva Einbinder, Hen Davidov, Yee Whye Teh (possible past University Of Oxford affiliation), Yarin Gal, Yaniv Romano (possible past Technion – Israel Institute Of Technology affiliation)
Abstract

While large language models (LLMs) are trained to align with human values, their generations may still violate safety constraints. A growing line of work addresses this problem by modifying the model's sampling policy at decoding time using a safety reward. However, existing decoding-time steering methods often intervene unnecessarily, modifying generations that would have been safe under the base model. Such unnecessary interventions are undesirable, as they can distort key properties of the ba...

📄 SeesawNet: Towards Non-stationary Time Series Forecasting with Balanced Modeling of Common and Specific Dependencies
🗓️ Published: 5/14/2026
🔗 http://arxiv.org/abs/2605.14551v1
👥 Authors: Hao Li (possible past Tsinghua University affiliation), Lu Zhang (possible past Tencent (China) affiliation), Liu Chong, Yankai Chen, Pengyang Wang, Yingjie Zhou
Abstract

Instance normalization (IN) is widely used in non-stationary multivariate time series forecasting to reduce distribution shifts and highlight common patterns across samples. However, IN can over-smooth instance-specific structural information that is essential for modeling temporal and cross-channel heterogeneity. While prior methods further suppress distribution discrepancies or attempt to recover temporal specific dependencies, they often ignore a central tension: how to adaptively model commo...

📄 Test-Time Learning with an Evolving Library
🗓️ Published: 5/14/2026
🔗 http://arxiv.org/abs/2605.14477v1
👥 Authors: Weijia Xu, Alessandro Sordoni (possible past Microsoft (United States) affiliation), Chandan Singh, Zelalem Gero, Michel Galley (possible past Microsoft (United States) affiliation), Xingdi Yuan, Jianfeng Gao (possible past Microsoft (United States) affiliation)
Abstract

We introduce EvoLib, a test-time learning framework that enables large language models to accumulate, reuse, and evolve knowledge across problem instances without parameter updates or external supervision. Instead of adapting model parameters, our approach maintains a shared library of knowledge abstractions, including modular skills and reflective insights, automatically extracted from the model's own inference trajectories. To support continual improvement, we introduce a principled weighting ...

📄 LiSA: Lifelong Safety Adaptation via Conservative Policy Induction
🗓️ Published: 5/14/2026
🔗 http://arxiv.org/abs/2605.14454v1
👥 Authors: Minbeom Kim, Lesly Miculicich, Bhavana Dalvi Mishra (possible past Carnegie Mellon University affiliation), Mihir Parmar, Phillip Wallis, Bharath Chandrasekhar, Kyomin Jung, Tomas Pfister (possible past University Of Oxford affiliation), Long T. Le
Abstract

As AI agents move from chat interfaces to systems that read private data, call tools, and execute multi-step workflows, guardrails become a last line of defense against concrete deployment harms. In these settings, guardrail failures are no longer merely answer-quality errors: they can leak secrets, authorize unsafe actions, or block legitimate work. The hardest failures are often contextual: whether an action is acceptable depends on local privacy norms, organizational policies, and user expect...

📄 NodeSynth: Socially Aligned Synthetic Data for AI Evaluation
🗓️ Published: 5/14/2026
🔗 http://arxiv.org/abs/2605.14381v1
👥 Authors: Qazi Mamunur Rashid, Xuan Yang (possible past Stanford University affiliation), Zhengzhe Yang, Yanzhou Pan, Erin Van Liemt, Darlene Neal, Kshitij Pancholi, Jamila Smith-Loud (possible past Google (United States) affiliation)
Abstract

Recent advancements in generative AI facilitate large-scale synthetic data generation for model evaluation. However, without targeted approaches, these datasets often lack the sociotechnical nuance required for sensitive domains. We introduce NodeSynth, an evidence-grounded methodology that generates socially relevant synthetic queries by leveraging a fine-tuned taxonomy generator (TaG) anchored in real-world evidence. Evaluated against four mainstream LLMs (e.g., Claude 4.5 Haiku), NodeSynth el...

📄 EnergyLens: Predictive Energy-Aware Exploration for Multi-GPU LLM Inference Optimization
🗓️ Published: 5/14/2026
🔗 http://arxiv.org/abs/2605.14249v1
👥 Authors: Zhiye Song, Kyungmi Lee, Eun Kyung Lee, Xin Zhang (possible past Google (United States) affiliation), Tamar Eilam, Anantha P. Chandrakasan (possible past Massachusetts Institute Of Technology affiliation)
Abstract

We present EnergyLens, an end-to-end framework for energy-aware large language model (LLM) inference optimization. As LLMs scale, predicting and reducing their energy footprint has become critical for sustainability and datacenter operations, yet existing approaches either require production-level code and expensive profiling or fail to accurately capture multi-GPU energy behavior. As a result, practitioners lack tools for deciding which optimizations to prioritize and for selecting among existi...

📄 Artificial Intelligence-Assistant Cardiotocography: Unified Model for Signal Reconstruction, Fetal Heart Rate Analysis, and Variability Assessment
🗓️ Published: 5/14/2026
🔗 http://arxiv.org/abs/2605.14242v1
👥 Authors: Xiaohua Wang, Kai Yu (possible past Baidu (China) affiliation), Xuxiao Liang, Liang Wang (possible past Tencent (China) affiliation), Chao Han
Abstract

The monitoring of fetal heart rate (FHR) and the assessment of its variability are crucial for preventing fetal compromise and adverse outcomes. However, traditional methods encounter limitations arising from equipment performance, data transmission, and subjective assessments by doctors. We have developed a tailored AI-based FHrCTG model specifically for FHR monitoring, which effectively mitigates noise interference and precisely reconstructs signals. Our model was pre-trained on a massive data...

📄 PreFT: Prefill-only finetuning for efficient inference
🗓️ Published: 5/14/2026
🔗 http://arxiv.org/abs/2605.14217v1
👥 Authors: Andrew Lanpouthakoun, Aryaman Arora, Zhengxuan Wu (possible past Stanford University affiliation), Dhruv Pai, Ben Keigwin, Dan Jurafsky (possible past Stanford University affiliation), Christopher Potts (possible past Tencent (China) affiliation)
Abstract

Large language models can now be personalised efficiently at scale using parameter efficient finetuning methods (PEFTs), but serving user-specific PEFTs harms throughput, even with specialised kernels and memory management techniques. This is because, theoretically and empirically, a mismatch exists between prefill (processing a large number of tokens at once) and decode (generating a single token autoregressively): the latter has far lower throughput when serving multiple adapters. Rather than ...

*Notable papers are those with at least two authors from a "big" AI/ML lab.