📄 Notable* Recent AI/ML arXiv Papers

Last updated just now...

📄 Decision-Level Ordinal Modeling for Multimodal Essay Scoring with Large Language Models
🗓️ Published: 3/16/2026
🔗 http://arxiv.org/abs/2603.14891v1
👥 Authors: Han Zhang (possible past Tsinghua University affiliation), Jiamin Su, Li Liu (possible past National University Of Defense Technology affiliation)
Abstract

Automated essay scoring (AES) predicts multiple rubric-defined trait scores for each essay, where each trait follows an ordered discrete rating scale. Most LLM-based AES methods cast scoring as autoregressive token generation and obtain the final score via decoding and parsing, making the decision implicit. This formulation is particularly sensitive in multimodal AES, where the usefulness of visual inputs varies across essays and traits. To address these limitations, we propose Decision-Level Or...

📄 Towards Privacy-Preserving Machine Translation at the Inference Stage: A New Task and Benchmark
🗓️ Published: 3/16/2026
🔗 http://arxiv.org/abs/2603.14756v1
👥 Authors: Wei Shao (possible past Stanford University affiliation), Lemao Liu (possible past Tencent (China) affiliation), Yinqiao Li, Guoping Huang, Shuming Shi (possible past Tencent (China) affiliation), Linqi Song
Abstract

Current online translation services require sending user text to cloud servers, posing a risk of privacy leakage when the text contains sensitive information. This risk hinders the application of online translation services in privacy-sensitive scenarios. One way to mitigate this risk for online translation services is introducing privacy protection mechanisms targeting the inference stage of translation models. However, compared to subfields of NLP like text classification and summarization, th...

📄 MVHOI: Bridge Multi-view Condition to Complex Human-Object Interaction Video Reenactment via 3D Foundation Model
🗓️ Published: 3/16/2026
🔗 http://arxiv.org/abs/2603.14686v1
👥 Authors: Jinguang Tong, Jinbo Wu, Kaisiyuan Wang, Zhelun Shen, Xuan Huang, Mochu Xiang, Xuesong Li, Yingying Li, Haocheng Feng (possible past Baidu (China) affiliation), Chen Zhao (possible past Stanford University affiliation), Hang Zhou (possible past Baidu (China) affiliation), Wei He (possible past Baidu (China) affiliation), Chuong Nguyen, Jingdong Wang (possible past Baidu (China) affiliation), Hongdong Li
Abstract

Human-Object Interaction (HOI) video reenactment with realistic motion remains a frontier in expressive digital human creation. Existing approaches primarily handle simple image-plane motion (e.g., in-plane translations), struggling with complex non-planar manipulations like out-of-plane reorientation. In this paper, we propose MVHOI, a two-stage HOI video reenactment framework that bridges multi-view reference conditions and video foundation models via a 3D Foundation Model (3DFM). The 3DFM fir...

📄 CangjieBench: Benchmarking LLMs on a Low-Resource General-Purpose Programming Language
🗓️ Published: 3/15/2026
🔗 http://arxiv.org/abs/2603.14501v1
👥 Authors: Junhang Cheng, Fang Liu (possible past Massachusetts Institute Of Technology affiliation), Jia Li (possible past Google (United States) affiliation), Chengru Wu, Nanxiang Jiang, Li Zhang (possible past University Of Oxford affiliation)
Abstract

Large Language Models excel in high-resource programming languages but struggle with low-resource ones. Existing research related to low-resource programming languages primarily focuses on Domain-Specific Languages (DSLs), leaving general-purpose languages that suffer from data scarcity underexplored. To address this gap, we introduce CangjieBench, a contamination-free benchmark for Cangjie, a representative low-resource general-purpose language. The benchmark comprises 248 high-quality samples ...

📄 MBD: A Model-Based Debiasing Framework Across User, Content, and Model Dimensions
🗓️ Published: 3/15/2026
🔗 http://arxiv.org/abs/2603.14422v1
👥 Authors: Yuantong Li, Lei Yuan, Zhihao Zheng, Weimiao Wu, Songbin Liu, Jeong Min Lee (possible past Meta (United States) affiliation), Ali Selman Aydin, Shaofeng Deng, Junbo Chen, Xinyi Zhang, Hongjing Xia, Sam Fieldman, Matthew Kosko, Wei Fu, Du Zhang, Peiyu Yang, Albert Jin Chung, Xianlei Qiu, Miao Yu, Zhongwei Teng, Hao Chen, Sunny Baek, Hui Tang (possible past Tencent (China) affiliation), Yang Lv, Renze Wang, Qifan Wang (possible past Google (United States) affiliation), Zhan Li, Tiantian Xu, Peng Wu, Ji Liu (possible past Tencent (China) affiliation)
Abstract

Modern recommendation systems rank candidates by aggregating multiple behavioral signals through a value model. However, many commonly used signals are inherently affected by heterogeneous biases. For example, watch time naturally favors long-form content, loop rate favors short - form content, and comment probability favors videos over images. Such biases introduce two critical issues: (1) value model scores may be systematically misaligned with users' relative preferences - for instance, a see...

📄 AgroNVILA: Perception-Reasoning Decoupling for Multi-view Agricultural Multimodal Large Language Models
🗓️ Published: 3/15/2026
🔗 http://arxiv.org/abs/2603.14342v1
👥 Authors: Jiarui Zhang, Junqi Hu, Zurong Mai, Yuhang Chen, Shuohong Lou, Henglian Huang, Lingyuan Zhao, Jianxi Huang, Yutong Lu, Haohuan Fu (possible past Tsinghua University affiliation), Juepeng Zheng (possible past Tsinghua University affiliation)
Abstract

Agricultural multimodal reasoning requires robust spatial understanding across varying scales, from ground-level close-ups to top-down UAV and satellite imagery. Existing Multi-modal Large Language Models (MLLMs) suffer from a significant "terrestrial-centric" bias, causing scale confusion and logic drift during complex agricultural planning. To address this, we introduce the first large-scale AgroOmni (288K), a multi-view training corpus designed to capture diverse spatial topologies and scales...

📄 Bringing Model Editing to Generative Recommendation in Cold-Start Scenarios
🗓️ Published: 3/15/2026
🔗 http://arxiv.org/abs/2603.14259v1
👥 Authors: Chenglei Shen, Teng Shi, Weijie Yu, Xiao Zhang (possible past Tsinghua University affiliation), Jun Xu (possible past Google (United States) affiliation)
Abstract

Generative recommendation (GR) has shown strong potential for sequential recommendation in an end-to-end generation paradigm. However, existing GR models suffer from severe cold-start collapse: their recommendation accuracy on cold-start items can drop to near zero. Current solutions typically rely on retraining with cold-start interactions, which is hindered by sparse feedback, high computational cost, and delayed updates, limiting practical utility in rapidly evolving recommendation catalogs. ...

📄 QiMeng-CodeV-SVA: Training Specialized LLMs for Hardware Assertion Generation via RTL-Grounded Bidirectional Data Synthesis
🗓️ Published: 3/15/2026
🔗 http://arxiv.org/abs/2603.14239v1
👥 Authors: Yutong Wu, Chenrui Cao, Pengwei Jin, Di Huang (possible past Google (United States) affiliation), Rui Zhang, Xishan Zhang, Zidong Du, Qi Guo, Xing Hu (possible past Baidu (China) affiliation)
Abstract

SystemVerilog Assertions (SVAs) are crucial for hardware verification. Recent studies leverage general-purpose LLMs to translate natural language properties to SVAs (NL2SVA), but they perform poorly due to limited data. We propose a data synthesis framework to tackle two challenges: the scarcity of high-quality real-world SVA corpora and the lack of reliable methods to determine NL-SVA semantic equivalence. For the former, large-scale open-source RTLs are used to guide LLMs to generate real-worl...

📄 Supervised Fine-Tuning versus Reinforcement Learning: A Study of Post-Training Methods for Large Language Models
🗓️ Published: 3/14/2026
🔗 http://arxiv.org/abs/2603.13985v1
👥 Authors: Haitao Jiang, Wenbo Zhang, Jiarui Yao, Hengrui Cai, Sheng Wang (possible past Tencent (China) affiliation), Rui Song (possible past Peking University affiliation)
Abstract

Pre-trained Large Language Model (LLM) exhibits broad capabilities, yet, for specific tasks or domains their attainment of higher accuracy and more reliable reasoning generally depends on post-training through Supervised Fine-Tuning (SFT) or Reinforcement Learning (RL). Although often treated as distinct methodologies, recent theoretical and empirical developments demonstrate that SFT and RL are closely connected. This study presents a comprehensive and unified perspective on LLM post-training w...

📄 vla-eval: A Unified Evaluation Harness for Vision-Language-Action Models
🗓️ Published: 3/14/2026
🔗 http://arxiv.org/abs/2603.13966v1
👥 Authors: Suhwan Choi, Yunsung Lee, Yubeen Park, Chris Dongjoo Kim, Ranjay Krishna (possible past University Of Washington affiliation), Dieter Fox (possible past University Of Washington affiliation), Youngjae Yu
Abstract

Vision Language Action VLA models are typically evaluated using per benchmark scripts maintained independently by each model repository, leading to duplicated code, dependency conflicts, and underspecified protocols. We present vla eval, an open source evaluation harness that decouples model inference from benchmark execution through a WebSocket msgpack protocol with Docker based environment isolation. Models integrate once by implementing a single predict() method; benchmarks integrate once via...

📄 ArrayTac: A tactile display for simultaneous rendering of shape, stiffness and friction
🗓️ Published: 3/14/2026
🔗 http://arxiv.org/abs/2603.13829v1
👥 Authors: Tianhai Liang, Shiyi Guo, Baiye Cheng, Zhengrong Xue, Han Zhang (possible past Tsinghua University affiliation), Huazhe Xu (possible past University Of California, Berkeley affiliation)
Abstract

Human-computer interaction in the visual and auditory domains has achieved considerable maturity, yet machine-to-human tactile feedback remains underdeveloped. Existing tactile displays struggle to simultaneously render multiple tactile dimensions, such as shape, stiffness, and friction, which limits the realism of haptic simulation. Here, we present ArrayTac, a piezoelectric-driven tactile display capable of simultaneously rendering shape, stiffness, and friction to reproduce realistic haptic s...

📄 AD-Copilot: A Vision-Language Assistant for Industrial Anomaly Detection via Visual In-context Comparison
🗓️ Published: 3/14/2026
🔗 http://arxiv.org/abs/2603.13779v1
👥 Authors: Xi Jiang, Yue Guo, Jian Li (possible past Tencent (China) affiliation), Yong Liu, Bin-Bin Gao, Hanqiu Deng, Jun Liu (possible past Tencent (China) affiliation), Heng Zhao, Chengjie Wang (possible past Tencent (China) affiliation), Feng Zheng
Abstract

Multimodal Large Language Models (MLLMs) have achieved impressive success in natural visual understanding, yet they consistently underperform in industrial anomaly detection (IAD). This is because MLLMs trained mostly on general web data differ significantly from industrial images. Moreover, they encode each image independently and can only compare images in the language space, making them insensitive to subtle visual differences that are key to IAD. To tackle these issues, we present AD-Copilot...

📄 Fold-CP: A Context Parallelism Framework for Biomolecular Modeling
🗓️ Published: 3/16/2026
🔗 http://arxiv.org/abs/2603.14806v1
👥 Authors: Dejun Lin, Simon Chu, Vishanth Iyer, Youhan Lee, John St John, Kevin Boyd, Brian Roland, Xiaowei Ren, Guoqing Zhou, Zhonglin Cao, Polina Binder, Yuliya Zhautouskaya, Jakub Zakrzewski, Maximilian Stadler, Kyle Gion, Yuxing Peng, Xi Chen (possible past University Of California, Berkeley affiliation), Tianjing Zhang, Philipp Junk, Michelle Dimon (possible past Google (United States) affiliation), Paweł Gniewek, Fabian Ortega, Mckinley Polen, Ivan Grubisic, Ali Bashir (possible past Google (United States) affiliation), Graham Holt, Danny Kovtun, Matthias Grass, Luca Naef, Rui Wang (possible past Tencent (China) affiliation), Jian Peng, Anthony Costa (possible past Nvidia (United States) affiliation), Saee Paliwal, Eddie Calleja, Timur Rvachov, Neha Tadimeti, Roy Tal, Emine Kucukbenli
Abstract

Understanding cellular machinery requires atomic-scale reconstruction of large biomolecular assemblies. However, predicting the structures of these systems has been constrained by hardware memory requirements of models like AlphaFold 3, imposing a practical ceiling of a few thousand residues that can be processed on a single GPU. Here we present NVIDIA BioNeMo Fold-CP, a context parallelism framework that overcomes this barrier by distributing the inference and training pipelines of co-folding m...

📄 Fronto-parietal and fronto-temporal EEG coherence as predictive neuromarkers of transcutaneous auricular vagus nerve stimulation response in treatment-resistant schizophrenia: A machine learning study
🗓️ Published: 3/14/2026
🔗 http://arxiv.org/abs/2603.13850v1
👥 Authors: Yapeng Cui, Ruoxi Yun, Shumin Zhang, Yi Gong, Zhiqin Li, Ying Chen (possible past Baidu (China) affiliation), Mingbing Su, Dongniya Wu, Jingxia Wu, Qian Wang, Jianan Wang (possible past Deepmind (United Kingdom) affiliation), Qianqian Tian, Yangyang Yuan, Shuhao Mei, Lei Wu, Xinghua Li, Bingkui Zhang, Taipin Guo, Jinbo Sun
Abstract

Response variability limits the clinical utility of transcutaneous auricular vagus nerve stimulation (taVNS) for negative symptoms in treatment-resistant schizophrenia (TRS). This study aimed to develop an electroencephalography (EEG)-based machine learning (ML) model to predict individual response and explore associated neurophysiological mechanisms. We used ML to develop and validate predictive models based on pre-treatment EEG data features (power, coherence, and dynamic functional connectivi...

*Notable papers are those with at least two authors from a "big" AI/ML lab.