Accepted Main Conference Papers

  • Towards Automated Error Discovery: A Study in Conversational AI
    Dominic Petrak, Thy Thy Tran, Iryna Gurevych
  • Break the Checkbox: Challenging Closed-Style Evaluations of Cultural Alignment in LLMs
    Mohsinul Kabir, Ajwad Abrar, Sophia Ananiadou
  • Biased Tales: Cultural and Topic Bias in Generating Children’s Stories
    Donya Rooein, Vilém Zouhar, Debora Nozza, Dirk Hovy
  • Large Language Models as Realistic Microservice Trace Generators
    Donghyun Kim, Sriram Ravula, Taemin Ha, Alex Dimakis, Daehyeok Kim, Aditya Akella
  • JUDGEBERT: Assessing Legal Meaning Preservation Between Sentences
    David Beauchemin, Michelle Albert-Rochette, Richard Khoury, Pierre-Luc Déziel
  • QFrCoLA: a Quebec-French Corpus of Linguistic Acceptability Judgments
    David Beauchemin, Richard Khoury
  • Revisiting LLM Value Probing Strategies: Are They Robust and Expressive?
    Siqi Shen, Mehar Singh, Lajanugen Logeswaran, Moontae Lee, Honglak Lee, Rada Mihalcea
  • A Systematic Analysis of Base Model Choice for Reward Modeling
    Kian Ahrabian, Pegah Jandaghi, Negar Mokhberian, Sai Praneeth Karimireddy, Jay Pujara
  • Comparing Specialised Small and General Large Language Models on Text Classification: 100 Labelled Samples to Achieve Break-Even Performance
    Branislav Pecher, Ivan Srba, Maria Bielikova
  • Is the Top Still Spinning? Evaluating Subjectivity in Narrative Understanding
    Melanie Subbiah, Akankshya Mishra, Grace Kim, Liyan Tang, Greg Durrett, Kathleen McKeown
  • MathTutorBench: A Benchmark for Measuring Open-ended Pedagogical Capabilities of LLM Tutors
    Jakub Macina, Nico Daheim, Ido Hakimi, Manu Kapur, Iryna Gurevych, Mrinmaya Sachan
  • Preemptive Detection and Correction of Misaligned Actions in LLM Agents
    Haishuo Fang, Xiaodan Zhu, Iryna Gurevych
  • Fingerprinting LLMs through Survey Item Factor Correlation: A Case Study on Humor Style Questionnaire
    Simon Münker
  • Gradient-Attention Guided Dual-Masking Synergetic Framework for Robust Text-based Person Retrieval
    Tianlu Zheng, Yifan Zhang, Xiang An, Ziyong Feng, Kaicheng Yang, Qichuan Ding
  • From Problem-Solving to Teaching Problem-Solving: Aligning LLMs with Pedagogy using Reinforcement Learning
    David Dinucu-Jianu, Jakub Macina, Nico Daheim, Ido Hakimi, Iryna Gurevych, Mrinmaya Sachan
  • CompKBQA: Component-wise Task Decomposition for Knowledge Base Question Answering
    Yuhang Tian, Dandan Song, Zhijing Wu, Pan Yang, Changzhi Zhou, Jun Yang, Hao Wang, Huipeng Ma, Chenhao Li, Luan Zhang
  • Permutative Preference Alignment from Listwise Ranking of Human Judgments
    Yang Zhao, Yixin Wang, Mingzhang Yin
  • ToneCraft: Cantonese Lyrics Generation with Harmony of Tones and Pitches
    Junyu Cheng, Chang Pan, Shuangyin Li
  • SensorLLM: Aligning Large Language Models with Motion Sensors for Human Activity Recognition
    Zechen Li, Shohreh Deldari, Linyao Chen, Hao Xue, Flora D. Salim
  • MixLoRA-DSI: Dynamically Expandable Mixture-of-LoRA Experts for Rehearsal-Free Generative Retrieval over Dynamic Corpora
    Tuan-Luc Huynh, Thuy-Trang Vu, Weiqing Wang, Trung Le, Dragan Gasevic, Yuan-Fang Li, Thanh-Toan Do
  • ViClaim: A Multilingual Multilabel Dataset for Automatic Claim Detection in Videos
    Patrick Giedemann, Pius von Däniken, Jan Milan Deriu, Alvaro Rodrigo, Anselmo Peñas, Mark Cieliebak
  • DeepResearcher: Scaling Deep Research via Reinforcement Learning in Real-world Environments
    Yuxiang Zheng, Dayuan Fu, Xiangkun Hu, Xiaojie Cai, Lyumanshan Ye, Pengrui Lu, Pengfei Liu
  • Mixture of Length and Pruning Experts for Knowledge Graphs Reasoning
    Enjun Du, Siyi Liu, Yongqi Zhang
  • MPRF: Interpretable Stance Detection through Multi-Path Reasoning Framework
    ZhaoDan Zhang, Jin Zhang, Hui Xu, Jiafeng Guo, Xueqi Cheng
  • Analyzing the Effects of Supervised Fine-Tuning on Model Knowledge from Token and Parameter Levels
    Junjie Ye, Yuming Yang, Yang Nan, Shuo Li, Qi Zhang, Tao Gui, Xuanjing Huang, Peng Wang, Zhongchao Shi, Jianping Fan
  • J$\text{I}^2$S: Joint Influence‑Aware Instruction Data Selection for Efficient Fine‑Tuning
    Jingyu Wei, Bo Liu, Tianjiao Wan, Baoyun Peng, Xingkong Ma, Mengmeng Guo
  • SoundMind: RL-Incentivized Logic Reasoning for Audio-Language Models
    Xingjian Diao, Chunhui Zhang, Keyi Kong, Weiyi Wu, Chiyu Ma, Zhongyu Ouyang, Peijun Qing, Soroush Vosoughi, Jiang Gui
  • Seeing More, Saying More: Lightweight Language Experts are Dynamic Video Token Compressors
    Xiangchen Wang, Jinrui Zhang, Teng Wang, Haigang Zhang, Feng Zheng
  • RoT: Enhancing Table Reasoning with Iterative Row-Wise Traversals
    Xuanliang Zhang, Dingzirui Wang, Keyan Xu, Qingfu Zhu, Wanxiang Che
  • T-MAD: Target-driven Multimodal Alignment for Stance Detection
    ZhaoDan Zhang, Jin Zhang, Xueqi Cheng, Hui Xu
  • Emotion Transfer with Enhanced Prototype for Unseen Emotion Recognition in Conversation
    Kun Peng, Cong Cao, Hao Peng, Guanlin Wu, Zhifeng Hao, Lei Jiang, Yanbing Liu, Philip S. Yu
  • PBI-Attack: Prior-Guided Bimodal Interactive Black-Box Jailbreak Attack for Toxicity Maximization
    Ruoxi Cheng, Yizhong Ding, Shuirong Cao, Ranjie Duan, Xiaoshuang Jia, Shaowei Yuan, Simeng Qin, Zhiqiang wang, Xiaojun Jia
  • Training a Utility-based Retriever Through Shared Context Attribution for Retrieval-Augmented Language Models
    Yilong Xu, Jinhua Gao, Xiaoming Yu, Yuanhai Xue, Baolong Bi, Huawei Shen, Xueqi Cheng
  • SportReason: Evaluating Retrieval-Augmented Reasoning across Tables and Text for Sports Question Answering
    Kaiyue Feng, Siyue Zhang, Bingsen Chen, Yilun Zhao, Chen Zhao
  • MAC-Tuning: LLM Multi-Compositional Problem Reasoning with Enhanced Knowledge Boundary Awareness
    Junsheng Huang, Zhitao He, Yuchen Huang, Sandeep Polisetty, Qingyun Wang, Yi R. Fung
  • CODI: Compressing Chain-of-Thought into Continuous Space via Self-Distillation
    Zhenyi Shen, Hanqi Yan, Linhai Zhang, Zhanghao Hu, Yali Du, Yulan He
  • PAFT: Prompt-Agnostic Fine-Tuning
    Chenxing Wei, Yao Shu, Mingwen Ou, Ying Tiffany He, Fei Yu
  • Theorem-Validated Reverse Chain-of-Thought Problem Generation for Geometric Reasoning
    Deng Linger, Linghao Zhu, Yuliang Liu, Yu Wang, Qunyi Xie, Jingjing Wu, Gang Zhang, Yingying Zhu, Xiang Bai
  • TACO: Enhancing Multimodal In-context Learning via Task Mapping-Guided Sequence Configuration
    Yanshu Li, Jianjiang Yang, Tian Yun, Pinyuan Feng, Jinfa Huang, Ruixiang Tang
  • Towards Controllable Speech Synthesis in the Era of Large Language Models: A Systematic Survey
    Tianxin Xie, Yan Rong, Pengfei ZHANG, Wenwu Wang, Li Liu
  • Automating Steering for Safe Multimodal Large Language Models
    Lyucheng Wu, Mengru Wang, Ziwen Xu, Tri Cao, Nay Oo, Bryan Hooi, Shumin Deng
  • EMNLP: Educator-role Moral and Normative Large Language Models Profiling
    Yilin Jiang, Mingzi Zhang, Sheng Jin, Zengyi Yu, Xiangjie Kong, Binghao Tu
  • TracSum: A New Benchmark for Aspect-Based Summarization with Sentence-Level Traceability in Medical Domain
    Bohao Chu, Meijie Li, Sameh Frihat, Chengyu Gu, Georg Lodde, Elisabeth Livingstone, Norbert Fuhr
  • Context Reasoner: Incentivizing Reasoning Capability for Contextualized Privacy and Safety Compliance via Reinforcement Learning
    Wenbin Hu, Haoran Li, Huihao JING, Qi Hu, Ziqian Zeng, Sirui Han, Xu Heli, Tianshu Chu, Peizhao Hu, Yangqiu Song
  • Towards General-Domain Word Sense Disambiguation: Distilling Large Language Model into Compact Disambiguator
    Liqiang Ming, Sheng-hua Zhong, Yuncong Li
  • SLoW: Select Low-frequency Words! Automatic Dictionary Selection for Translation on Large Language Models
    Hongyuan Lu, Zixuan Li, Zefan Zhang, Wai Lam
  • Parallel Continuous Chain-of-Thought with Jacobi Iteration
    Haoyi Wu, Zhihao Teng, Kewei Tu
  • EQA-RM: A Generative Embodied Reward Model with Test-time Scaling
    Yuhang Chen, Zhen Tan, Tianlong Chen
  • Refusal-Aware Red Teaming: Exposing Inconsistency in Safety Evaluations
    Yongkang Chen, Xiaohu Du, Xiaotian Zou, Chongyang Zhao, Huan Deng, Hu LI, Xiaohui Kuang
  • OmniThink: Expanding Knowledge Boundaries in Machine Writing through Thinking
    Zekun Xi, Wenbiao Yin, Jizhan Fang, Jialong Wu, Runnan Fang, Ningyu Zhang, Yong Jiang, Pengjun Xie, Fei Huang, Huajun Chen
  • LinkAlign: Scalable Schema Linking for Real-World Large-Scale Multi-Database Text-to-SQL
    Yihan Wang, Peiyu Liu, Xin Yang
  • On Relation-Specific Neurons in Large Language Models
    Yihong Liu, Runsheng Chen, Lea Hirlimann, Ahmad Dawar Hakimi, Mingyang Wang, Amir Hossein Kargaran, Sascha Rothe, François Yvon, Hinrich Schuetze
  • IPIGuard: A Novel Tool Dependency Graph-Based Defense Against Indirect Prompt Injection in LLM Agents
    Hengyu An, Jinghuai Zhang, Tianyu Du, Chunyi Zhou, Qingming Li, Tao Lin, Shouling Ji
  • ProtoVQA: An Adaptable Prototypical Framework for Explainable Fine-Grained Visual Question Answering
    Xingjian Diao, Weiyi Wu, Keyi Kong, Peijun Qing, Xinwen Xu, Ming Cheng, Soroush Vosoughi, Jiang Gui
  • SEA: Supervised Embedding Alignment for Token-Level Visual-Textual Integration in MLLMs
    Yuanyang Yin, Yaqi Zhao, Yajie Zhang, Yuanxing Zhang, Ke Lin, Jiahao Wang, Xin Tao, Pengfei Wan, Wentao Zhang, Feng Zhao
  • Molecular String Representation Preferences in Pretrained LLMs: A Comparative Study in Zero- & Few-Shot Molecular Property Prediction
    George Arthur Baker, Mario Sanz-Guerrero, Katharina von der Wense
  • Weight-Aware Activation Sparsity with Constrained Bayesian Optimization Scheduling for Large Language Models
    Ming Wang, Miao Zhang, Xuebo Liu, Liqiang Nie
  • DatawiseAgent: A Notebook-Centric LLM Agent Framework for Adaptive and Robust Data Science Automation
    Ziming You, Yumiao Zhang, Dexuan Xu, Yiwei Lou, Yandong Yan, Wei Wang, Huamin Zhang, Yu Huang
  • VC4VG: Optimizing Video Captions for Text-to-Video Generation
    Yang Du, Zhuoran Lin, Kaiqiang Song, Biao Wang, Zhicheng Zheng, Tiezheng Ge, Bo Zheng, Qin Jin
  • LaMP-QA: A Benchmark for Personalized Long-form Question Answering
    Alireza Salemi, Hamed Zamani
  • The LLM Already Knows: Estimating LLM-Perceived Question Difficulty via Hidden Representations
    Yubo Zhu, Dongrui Liu, Zecheng Lin, Wei Tong, Sheng Zhong, Jing Shao
  • MCIP: Protecting MCP Safety via Model Contextual Integrity Protocol
    Huihao JING, Haoran Li, Wenbin Hu, Qi Hu, Xu Heli, Tianshu Chu, Peizhao Hu, Yangqiu Song
  • SAKI-RAG: Mitigating Context Fragmentation in Long-Document RAG via Sentence-level Attention Knowledge Integration
    Wenyu Tao, Xiaofen Xing, Zeliang Li, Xiangmin Xu
  • Skeletons Matter: Dynamic Data Augmentation for Text-to-Query
    Yuchen Ji, Bo Xu, Jie Shi, Jiaqing Liang, Deqing Yang, Yu Mao, Hai Chen, Yanghua Xiao
  • CondenseLM: LLMs-driven Text Dataset Condensation via Reward Matching
    Cheng Shen, Yew-Soon Ong, Joey Tianyi Zhou
  • MovieCORE: COgnitive REasoning in Movies
    Gueter Josmy Faure, Min-Hung Chen, Jia-Fong Yeh, Ying Cheng, Hung-Ting Su, Shang-Hong Lai, Winston H. Hsu
  • Think Wider, Detect Sharper: Reinforced Reference Coverage for Document-Level Self-Contradiction Detection
    Yuhao Chen, Yuanjie Lyu, Shuochen Liu, Chao Zhang, Junhui Lv, Tong Xu
  • DRISHTIKON: A Multimodal Multilingual Benchmark for Testing Language Models’ Understanding on Indian Culture
    Arijit Maji, Raghvendra Kumar, Akash Ghosh, Anushka, Nemil Shah, Abhilekh Borah, Vanshika Shah, Nishant Mishra, Sriparna Saha
  • Learning from Few Samples: A Novel Approach for High-Quality Malcode Generation
    Haijian Ma, Daizong Liu, Xiaowen Cai, Yulai Xie, Pan Zhou
  • Personality Matters: User Traits Predict LLM Preferences in Multi-Turn Collaborative Tasks
    Sarfaroz Yunusov, Kaige Chen, Kazi Nishat Anwar, Ali Emami
  • VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search
    Yiming Jia, Jiachen Li, Xiang Yue, Bo Li, Ping Nie, Kai Zou, Wenhu Chen
  • Thinking Out Loud: Do Reasoning Models Know When They’re Right?
    Qingcheng Zeng, Weihao Xuan, Leyang Cui, Rob Voigt
  • Seeing is Believing, but How Much? A Comprehensive Analysis of Verbalized Calibration in Vision-Language Models
    Weihao Xuan, Qingcheng Zeng, Heli Qi, Junjue Wang, Naoto Yokoya
  • Enhancing Efficiency and Exploration in Reinforcement Learning for LLMs
    Mengqi Liao, Xiangyu Xi, Chen Ruinian, Jia Leng, Yangen Hu, Ke Zeng, Shuai Liu, Huaiyu Wan
  • LLM Bias Detection and Mitigation through the Lens of Desired Distributions
    Ingroj Shrestha, Padmini Srinivasan
  • MEBench: Benchmarking Large Language Models for Cross-Document Multi-Entity Question Answering
    Teng LIN
  • POSITION BIAS MITIGATES POSITION BIAS: Mitigate Position Bias Through Inter-Position Knowledge Distillation
    Yifei Wang, Feng Xiong, Yong Wang, Linjing Li, Xiangxiang Chu, Daniel Dajun Zeng
  • MMLU-ProX: A Multilingual Benchmark for Advanced Large Language Model Evaluation
    Weihao Xuan, Rui Yang, Heli Qi, Qingcheng Zeng, Yunze Xiao, Aosong Feng, Dairui Liu, Yun Xing, Junjue Wang, Fan Gao, Jinghui Lu, Yuang Jiang, Huitao Li, Xin Li, Kunyu Yu, Ruihai Dong, Shangding Gu, Yuekang Li, Xiaofei Xie, Felix Juefei-Xu, Foutse Khomh, Osamu Yoshie, Qingyu Chen, Douglas Teodoro, Nan Liu, Randy Goebel, Lei Ma, Edison Marrese-Taylor, Shijian Lu, Yusuke Iwasawa, Yutaka Matsuo, Irene Li
  • NL-Debugging: Exploiting Natural Language as an Intermediate Representation for Code Debugging
    Weiming Zhang, Qingyao Li, Xinyi Dai, Jizheng Chen, Kounianhua Du, Weinan Zhang, Weiwen Liu, Yasheng Wang, Ruiming Tang, Yong Yu
  • Persuasion Dynamics in LLMs: Investigating Robustness and Adaptability in Knowledge and Safety with DuET-PD
    Bryan Chen Zhengyu Tan, Daniel Wai Kit Chin, Zhengyuan Liu, Nancy F. Chen, Roy Ka-Wei Lee
  • POINTS-Reader: Distillation-Free Adaptation of Vision-Language Models for Document Conversion
    Yuan Liu, Zhongyin Zhao, Le Tian, Haicheng Wang, Xubing Ye, Yangxiu You, Zilin Yu, Chuhan Wu, Zhou Xiao, Yang Yu, Jie Zhou
  • Large Language Models for Automated Literature Review: An Evaluation of Reference Generation, Abstract Writing, and Review Composition
    Xuemei Tang, Xufeng Duan, Zhenguang Cai
  • CoBia: Constructed Conversations Can Trigger Otherwise Concealed Societal Biases in LLMs
    Nafiseh Nikeghbal, Amir Hossein Kargaran, Jana Diesner
  • From Schema to State: Zero-Shot Scheme-Only Dialogue State Tracking via Diverse Synthetic Dialogue and Step-by-Step Distillation
    Huan Xu, Zequn Li, Wen Tang, Jian Jun Zhang
  • Beyond the Surface: Measuring Self-Preference in LLM Judgments
    Zhi-Yuan Chen, Hao Wang, Xinyu Zhang, Enrui Hu, Yankai Lin
  • Beyond Input Activations: Identifying Influential Latents by Gradient Sparse Autoencoders
    Dong Shu, Xuansheng Wu, Haiyan Zhao, Mengnan Du, Ninghao Liu
  • Utility-Focused LLM Annotation for Retrieval and Retrieval-Augmented Generation
    Hengran Zhang, Minghao Tang, Keping Bi, Jiafeng Guo, Shihao Liu, Daiting Shi, Dawei Yin, Xueqi Cheng
  • CiteBART: Learning to Generate Citations for Local Citation Recommendation
    Ege Yiğit Çelik, Selma Tekir
  • Culture Cartography: Mapping the Landscape of Cultural Knowledge
    Caleb Ziems, William Barr Held, Jane Yu, Amir Goldberg, David Grusky, Diyi Yang
  • Interpretability Analysis of Arithmetic In-Context Learning in Large Language Models
    Gregory Polyakov, Christian Hepting, Carsten Eickhoff, Seyed Ali Bahrainian
  • SwarmAgentic: Towards Fully Automated Agentic System Generation via Swarm Intelligence
    Yao Zhang, Chenyang Lin, Shijie Tang, Haokun Chen, Shijie Zhou, Yunpu Ma, Volker Tresp
  • We Politely Insist: Your LLM Must Learn the Persian Art of Taarof
    Nikta Gohari Sadr, Sahar Heidariasl, Karine Megerdoomian, Laleh Seyyed-Kalantari, Ali Emami
  • Unstructured Evidence Attribution for Long Context Query Focused Summarization
    Dustin Wright, Zain Muhammad Mujahid, Lu Wang, Isabelle Augenstein, David Jurgens
  • RAVEN: Query-Guided Representation Alignment for Question Answering over Audio, Video, Embedded Sensors, and Natural Language
    Subrata Biswas, Mohammad Nur Hossain Khan, Bashima Islam
  • Cache-of-Thought: Master-Apprentice Framework for Cost-Effective Vision Language Model Reasoning
    Mingyuan Wu, Jize Jiang, Haozhen Zheng, Meitang Li, Zhaoheng Li, Beitong Tian, Bo Chen, Yongjoo Park, Minjia Zhang, ChengXiang Zhai, Klara Nahrstedt
  • Video Compression Commander: Plug-and-Play Inference Acceleration for Video Large Language Models
    Xuyang Liu, Yiyu Wang, Junpeng Ma, Linfeng Zhang
  • Router-Tuning: A Simple and Effective Approach for Dynamic Depth
    Shwai He, Tao Ge, Guoheng Sun, Bowei Tian, Xiaoyang Wang, Dong Yu
  • Foot-In-The-Door: A Multi-turn Jailbreak for LLMs
    Zixuan Weng, Xiaolong Jin, Jinyuan Jia, Xiangyu Zhang
  • TurnaboutLLM: A Deductive Reasoning Benchmark from Detective Games
    Yuan Yuan, Muyu He, Muhammad Adil Shahid, Ziyang Li, Jiani Huang, Li Zhang
  • Transferable Direct Prompt Injection via Activation-Guided MCMC Sampling
    Minghui Li, Hao Zhang, Yechao Zhang, Wei Wan, Shengshan Hu, pei Xiaobing, Jing Wang
  • Direct Judgement Preference Optimization
    PeiFeng Wang, Austin Xu, Yilun Zhou, Caiming Xiong, Shafiq Joty
  • WebInject: Prompt Injection Attack to Web Agents
    Xilong Wang, John Bloch, Zedian Shao, Yuepeng Hu, Shuyan Zhou, Neil Zhenqiang Gong
  • F²Bench: An Open-ended Fairness Evaluation Benchmark for LLMs with Factuality Considerations
    Tian Lan, Jiang Li, Yemin Wang, Xu Liu, Xiangdong Su, Guanglai Gao
  • Value Profiles for Encoding Human Variation
    Taylor Sorensen, Pushkar Mishra, Roma Patel, Michael Henry Tessler, Michiel A. Bakker, Georgina Evans, Iason Gabriel, Noah Goodman, Verena Rieser
  • Language Models as Causal Effect Generators
    Lucius E.J. Bynum, Kyunghyun Cho
  • Constructions are Revealed in Word Distributions
    Joshua Rozner, Leonie Weissweiler, Kyle Mahowald, Cory Shain
  • CodeMixBench: Evaluating Code-Mixing Capabilities of LLMs Across 18 Languages
    Yilun Yang, Yekun Chai
  • RBPtool: A Deep Language Model Framework for Multi-Resolution RBP-RNA Binding Prediction and RNA Molecule Design
    Jiyue Jiang, Yitao Xu, Zikang Wang, Yihan Ye, Yanruisheng Shao, Yuheng Shan, Jiuming Wang, Xiaodan Fan, Jiao Yuan, Yu Li
  • Unveiling Internal Reasoning Modes in LLMs: A Deep Dive into Latent Reasoning vs. Factual Shortcuts with Attribute Rate Ratio
    Yiran Yang, Haifeng Sun, Jingyu Wang, Qi Qi, Zirui Zhuang, Huazheng Wang, Pengfei Ren, Jing Wang, Jianxin Liao
  • SAE-SSV: Supervised Steering in Sparse Representation Spaces for Reliable Control of Language Models
    Zirui He, Mingyu Jin, Bo Shen, Ali Payani, Yongfeng Zhang, Mengnan Du
  • BabyLM’s First Constructions: Causal interventions provide a signal of learning
    Joshua Rozner, Leonie Weissweiler, Cory Shain
  • Effective Red-Teaming of Policy-Adherent Agents
    Itay Nakash, George Kour, Koren Lazar, Matan Vetzler, Guy Uziel, Ateret Anaby Tavor
  • CondAmbigQA: A Benchmark and Dataset for Conditional Ambiguous Question Answering
    Zongxi Li, Yang Li, Haoran Xie, S. Joe Qin
  • SafeScientist: Enhancing AI Scientist Safety for Risk-Aware Scientific Discovery
    Kunlun Zhu, Jiaxun Zhang, Ziheng Qi, Nuoxing Shang, Zijia Liu, Peixuan Han, Yue Su, Haofei Yu, Jiaxuan You
  • Improving Informally Romanized Language Identification
    Adrian Benton, Alexander Gutkin, Christo Kirov, Brian Roark
  • Integral Transformer: Denoising Attention, Not Too Much Not Too Little
    Ivan Kobyzev, Abbas Ghaddar, Dingtao Hu, Boxing Chen
  • CHENGYU-BENCH: Benchmarking Large Language Models for Chinese Idiom Understanding and Use
    Yicheng Fu, Zhemin Huang, Liuxin Yang, Yumeng Lu, Zhongdongming Dai
  • Improving Cross Lingual Transfer by Pretraining with Active Forgetting
    Divyanshu Aggarwal, Ashutosh Sathe, Sunayana Sitaram
  • Re-Align: Aligning Vision Language Models via Retrieval-Augmented Direct Preference Optimization
    Shuo Xing, Peiran Li, Yuping Wang, Ruizheng Bai, Yueqi Wang, Chan-Wei Hu, Chengxuan Qian, Huaxiu Yao, Zhengzhong Tu
  • To Mask or to Mirror: Human-AI Alignment in Collective Reasoning
    Crystal Qian, Aaron T Parisi, Clémentine Bouleau, Vivian Tsai, Maël Lebreton, Lucas Dixon
  • SWAN: An Efficient and Scalable Approach for Long-Context Language Modeling
    Krishna C Puvvada, Faisal Ladhak, Santiago Akle Serano, Cheng-Ping Hsieh, Shantanu Acharya, Somshubra Majumdar, Fei Jia, Samuel Kriman, Simeng Sun, Dima Rekesh, Boris Ginsburg
  • LLMs Behind the Scenes: Enabling Narrative Scene Illustration
    Melissa Roemmele, John Joon Young Chung, Taewook Kim, Yuqian Sun, Alex Calderwood, Max Kreminski
  • REARANK: Reasoning Re-ranking Agent via Reinforcement Learning
    Le Zhang, Bo Wang, Xipeng Qiu, Siva Reddy, Aishwarya Agrawal
  • Large Language Models Do Multi-Label Classification Differently
    Marcus Ma, Georgios Chochlakis, Niyantha Maruthu Pandiyan, Jesse Thomason, Shrikanth Narayanan
  • FilBench: Can LLMs Understand and Generate Filipino?
    Lester James Validad Miranda, Elyanah Aco, Conner G. Manuel, Jan Christian Blaise Cruz, Joseph Marvin Imperial
  • M-ABSA: A Multilingual Dataset for Aspect-Based Sentiment Analysis
    ChengYan Wu, Bolei Ma, Yihong Liu, Zheyu Zhang, Ningyuan Deng, Yanshu Li, Baolan Chen, Yi Zhang, Yun Xue, Barbara Plank
  • RuCCoD: Towards Automated ICD Coding in Russian
    Alexandr Nesterov, Andrey Sakhovskiy, Ivan Sviridov, Airat Valiev, Vladimir Makharev, Petr Anokhin, Galina Zubkova, Elena Tutubalina
  • Code to Think, Think to Code: A Survey on Code-Enhanced Reasoning and Reasoning-Driven Code Intelligence in LLMs
    Dayu Yang, Tianyang Liu, Daoan Zhang, Antoine Simoulin, Xiaoyi Liu, Yuwei Cao, Zhaopu Teng, Xin Qian, Grey Yang, Jiebo Luo, Julian McAuley
  • Efficient Model Development through Fine-tuning Transfer
    Pin-Jie Lin, Rishab Balasubramanian, Fengyuan Liu, Nikhil Kandpal, Tu Vu
  • Language Mixing in Reasoning Language Models: Patterns, Impact, and Internal Causes
    Mingyang Wang, Lukas Lange, Heike Adel, Yunpu Ma, Jannik Strötgen, Hinrich Schuetze
  • User Feedback in Human-LLM Dialogues: A Lens to Understand Users But Noisy as a Learning Signal
    Yuhan Liu, Michael JQ Zhang, Eunsol Choi
  • Read to Hear: A Zero-Shot Pronunciation Assessment Using Textual Descriptions and LLMs
    Yu-Wen Chen, Melody Ma, Julia Hirschberg
  • COCO-Tree: Compositional Hierarchical Concept Trees for Enhanced Reasoning in Vision-Language Models
    Sanchit Sinha, Guangzhi Xiong, Aidong Zhang
  • SurveyGen: Quality-Aware Scientific Survey Generation with Large Language Models
    Tong Bao, Mir Tafseer Nayeem, Davood Rafiei, Chengzhi Zhang
  • VoiceCraft-X: Unifying Multilingual, Voice-Cloning Speech Synthesis and Speech Editing
    Zhisheng Zheng, Puyuan Peng, Anuj Diwan, Cong Phuoc Huynh, Xiaohang Sun, Zhu Liu, Vimal Bhat, David Harwath
  • From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge
    Dawei Li, Bohan Jiang, Liangjie Huang, Alimohammad Beigi, Chengshuai Zhao, Zhen Tan, Amrita Bhattacharjee, Yuxuan Jiang, Canyu Chen, Tianhao Wu, Kai Shu, Lu Cheng, huan liu
  • MultiMatch: Multihead Consistency Regularization Matching for Semi-Supervised Text Classification
    Iustin Sirbu, Robert-Adrian Popovici, Cornelia Caragea, Stefan Trausan-Matu, Traian Rebedea
  • TTT-Bench: A Benchmark for Evaluating Reasoning Ability with Simple and Novel Tic-Tac-Toe-style Games
    Prakamya Mishra, Jiang Liu, Jialian Wu, Xiaodong Yu, Zicheng Liu, Emad Barsoum
  • Learning from Diverse Reasoning Paths with Routing and Collaboration
    Zhenyu Lei, Zhen Tan, Song Wang, Yaochen Zhu, Zihan Chen, Yushun Dong, Jundong Li
  • Ask Patients with Patience: Enabling LLMs for Human-Centric Medical Dialogue with Grounded Reasoning
    Jiayuan Zhu, Jiazhen Pan, Yuyuan Liu, Fenglin Liu, Junde Wu
  • MedHallu: A Comprehensive Benchmark for Detecting Medical Hallucinations in Large Language Models
    Shrey Pandit, Jiawei Xu, Junyuan Hong, Zhangyang Wang, Tianlong Chen, Kaidi Xu, Ying Ding
  • NUTMEG: Separating Signal From Noise in Annotator Disagreement
    Jonathan Ivey, Susan Gauch, David Jurgens
  • Alignment Quality Index (AQI) : Beyond Refusals: AQI as an Intrinsic Alignment Diagnostic via Latent Geometry, Cluster Divergence, and Layer wise Pooled Representations
    Abhilekh Borah, Chhavi Sharma, Danush Khanna, Utkarsh Bhatt, Gurpreet Singh, Hasnat Md Abdullah, Raghav Kaushik Ravi, Vinija Jain, Jyoti Patel, Shubham Singh, Vasu Sharma, Arpita Vats, Rahul Raja, Aman Chadha, Amitava Das
  • MythTriage: Scalable Detection of Opioid Use Disorder Myths on a Video-Sharing Platform
    Hayoung Jung, Shravika Mittal, Ananya Aatreya, Navreet Kaur, Munmun De Choudhury, Tanu Mitra
  • Demystifying optimized prompts in language models
    Rimon Melamed, Lucas Hurley McCabe, H Howie Huang
  • Whisper-UT: A Unified Translation Framework for Speech and Text
    Cihan Xiao, Matthew Wiesner, Debashish Chakraborty, Reno Kriz, Keith Cunningham, Kenton Murray, Kevin Duh, Luis Tavarez-Arce, Paul McNamee, Sanjeev Khudanpur
  • Unleashing the Reasoning Potential of LLMs by Critique Fine-Tuning on One Problem
    Yubo Wang, Ping Nie, Kai Zou, Lijun Wu, Wenhu Chen
  • Active Layer-Contrastive Decoding Reduces Hallucination in Large Language Model Generation
    Hongxiang Zhang, Hao Chen, Muhao Chen, Tianyi Zhang
  • BBScoreV2: Learning Time-Evolution and Latent Alignment from Stochastic Representation
    Tianhao Zhang, Zhecheng Sheng, Zhexiao Lin, Chen Jiang, Dongyeop Kang
  • SAND: Boosting LLM Agents with Self-Taught Action Deliberation
    Yu Xia, Yiran Jenny Shen, Junda Wu, Tong Yu, Sungchul Kim, Ryan A. Rossi, Lina Yao, Julian McAuley
  • LLMs as World Models: Data-Driven and Human-Centered Pre-Event Simulation for Disaster Impact Assessment
    Lingyao Li, Dawei Li, Zhenhui Ou, Xiaoran Xu, Jingxiao Liu, Zihui Ma, Runlong Yu, Min Deng
  • Two Heads Are Better Than One: Dual-Model Verbal Reflection at Inference-Time
    Jiazheng Li, Yuxiang Zhou, Junru Lu, Gladys Tyen, Lin Gui, Cesare Aloisi, Yulan He
  • Image Embedding Sampling Method for Diverse Captioning
    Sania Waheed, Na Min An
  • Diagnosing Memorization in Chain-of-Thought Reasoning, One Token at a Time
    Huihan Li, You Chen, Siyuan Wang, Yixin He, Ninareh Mehrabi, Rahul Gupta, Xiang Ren
  • FANS: Formal Answer Selection for LLM Natural Language Math Reasoning Using Lean4
    Jiarui Yao, Ruida WANG, Tong Zhang
  • Date Fragments: A Hidden Bottleneck of Tokenization for Temporal Reasoning
    Gagan Bhatia, Maxime Peyrard, Wei Zhao
  • Measuring Risk of Bias in Biomedical Reports: The RoBBR Benchmark
    Jianyou Wang, Weili Cao, Longtian Bao, Youze Zheng, Gil Pasternak, Kaicheng Wang, Xiaoyue Wang, Ramamohan Paturi, Leon Bergen
  • SHIFT: Selected Helpful Informative Frame for Video-guided Machine Translation
    Boyu Guan, Chuang Han, Yining Zhang, Yupu Liang, Zhiyang Zhang, Yang Zhao, Chengqing Zong
  • Surge: On the Potential of Large Language Models as General-Purpose Surrogate Code Executors
    Bohan Lyu, Siqiao Huang, Zichen Liang, Qian Sun, Jiaming Zhang
  • Few-Shot Learning Translation from New Languages
    Carlos Mullov, Alexander Waibel
  • Humanizing Machines: Rethinking LLM Anthropomorphism Through a Multi-Level Framework of Design
    Yunze Xiao, Lynnette Hui Xian Ng, Jiarui Liu, Mona T. Diab
  • TokenSkip: Controllable Chain-of-Thought Compression in LLMs
    Heming Xia, Chak Tou Leong, Wenjie Wang, Yongqi Li, Wenjie Li
  • Are Generative Models Underconfident? Better Quality Estimation with Boosted Model Probability
    Tu Anh Dinh, Jan Niehues
  • reWordBench: Benchmarking and Improving the Robustness of Reward Models with Transformed Inputs
    Zhaofeng Wu, Michihiro Yasunaga, Andrew Cohen, Yoon Kim, Asli Celikyilmaz, Marjan Ghazvininejad
  • Why Do Some Inputs Break Low-Bit LLM Quantization?
    Ting-Yun Chang, Muru Zhang, Jesse Thomason, Robin Jia
  • LiteASR: Efficient Automatic Speech Recognition with Low-Rank Approximation
    Keisuke Kamahori, Jungo Kasai, Noriyuki Kojima, Baris Kasikci
  • AROMA: Autonomous Rank-one Matrix Adaptation
    Hao Nan SHENG, Zhi-Yong Wang, Hing Cheung So, Mingrui Yang
  • Large Language Models Have Intrinsic Meta-Cognition, but Need a Good Lens
    Ziyang Ma, Qingyue Yuan, Zhenglin Wang, Deyu Zhou
  • Anchoring-Guidance Fine-Tuning (AnGFT): Elevating Professional Response Quality in Role-Playing Conversational Agents
    Qibin Li, Zhen Xu, Shengyuan Bai, Nianmin Yao, Kaili Sun, Ying Li, Baoxun Wang, Bowen Wu
  • RiTTA: Modeling Event Relations in Text-to-Audio Generation
    Yuhang He, Yash Jain, Xubo Liu, Andrew Markham, Vibhav Vineet
  • Shallow Focus, Deep Fixes: Enhancing Shallow Layers Vision Attention Sinks to Alleviate Hallucination in LVLMs
    Xiaofeng Zhang, Yihao Quan, Chen Shen, Chaochen Gu, Xiaosong Yuan, Shaotian Yan, Jiawei Cao, Hao Cheng, Kaijie Wu, Jieping Ye
  • WangchanThaiInstruct: An instruction-following Dataset for Culture-Aware, Multitask, and Multi-domain Evaluation in Thai
    Peerat Limkonchotiwat, Pume Tuchinda, Lalita Lowphansirikul, Surapon Nonesung, Panuthep Tasawong, Alham Fikri Aji, Can Udomcharoenchaikit, Sarana Nutanong
  • MemeReaCon: Probing Contextual Meme Understanding in Large Vision-Language Models
    Zhengyi Zhao, Shubo Zhang, Yuxi Zhang, Yanxi Zhao, Yifan Zhang, Zezhong WANG, Huimin WANG, Yutian Zhao, Bin Liang, Yefeng Zheng, Binyang Li, Kam-Fai Wong, Xian Wu
  • A Comprehensive Literary Chinese Reading Comprehension Dataset with an Evidence Curation Based Solution
    Dongning Rao, Rongchu Zhou, Peng Chen, Zhihua Jiang
  • Dialect-SQL: An Adaptive Framework for Bridging the Dialect Gap in Text-to-SQL
    Jie Shi, Xi Cao, Bo Xu, Jiaqing Liang, Yanghua Xiao, Jia Chen, Peng Wang, Wei Wang
  • FinMTEB: Finance Massive Text Embedding Benchmark
    Yixuan Tang, Yi Yang
  • Scaling Rich Style-Prompted Text-to-Speech Datasets
    Anuj Diwan, Zhisheng Zheng, David Harwath, Eunsol Choi
  • Exploring Changes in Nation Perception with Nationality-Assigned Personas in LLMs
    Mahammed Kamruzzaman, Gene Louis Kim
  • Eliciting Implicit Acoustic Styles from Open-domain Instructions to Facilitate Fine-grained Controllable Generation of Speech
    Jianxing Yu, Gou Zihao, Chen Li, Zhisheng Wang, Peiji Yang, Wenqing Chen, Jian Yin
  • OBLIVIATE: Robust and Practical Machine Unlearning for Large Language Models
    Xiaoyu Xu, Minxin Du, Qingqing Ye, Haibo Hu
  • AdaptThink: Reasoning Models Can Learn When to Think
    Jiajie Zhang, Nianyi Lin, Lei Hou, Ling Feng, Juanzi Li
  • T$^2$: An Adaptive Test-Time Scaling Strategy for Contextual Question Answering
    Zhengyi Zhao, Shubo Zhang, Zezhong WANG, Huimin WANG, Yutian Zhao, Bin Liang, Yefeng Zheng, Binyang Li, Kam-Fai Wong, Xian Wu
  • Non-Existent Relationship: Fact-Aware Multi-Level Machine-Generated Text Detection
    Yang Wu, Ruijia Wang, Jie Wu
  • Calibrating Verbal Uncertainty as a Linear Feature to Reduce Hallucinations
    Ziwei Ji, Lei Yu, Yeskendir Koishekenov, Yejin Bang, Anthony Hartshorn, Alan Schelten, Cheng Zhang, Pascale Fung, Nicola Cancedda
  • JUREX-4E: Juridical Expert-Annotated Four-Element Knowledge Base for Legal Reasoning
    Huanghai Liu, Quzhe Huang, Qingjing Chen, Yiran HU, Jiayu Ma, Yun Liu, Weixing Shen, Yansong Feng
  • CIE: Controlling Language Model Text Generations Using Continuous Signals
    Vinay Samuel, Harshita Diddee, Yiming Zhang, Daphne Ippolito
  • Stand on The Shoulders of Giants: Building JailExpert from Previous Attack Experience
    Xi Wang, Songlei Jian, Shasha Li, Xiaopeng Li, Bin Ji, Ma Jun, Xiaodong Liu, Jing Wang, Jianfeng Zhang, Jie Yu, Feilong Bao, Wangbaosheng
  • Language-to-Space Programming for Training-Free 3D Visual Grounding
    Boyu Mi, Hanqing Wang, Tai Wang, Yilun Chen, Jiangmiao Pang
  • RAG-Instruct: Boosting LLMs with Diverse Retrieval-Augmented Instructions
    Wanlong Liu, Junying Chen, Ke Ji, Li Zhou, Wenyu Chen, Benyou Wang
  • AdaRewriter: Unleashing the Power of Prompting-based Conversational Query Reformulation via Test-Time Adaptation
    Yilong Lai, Jialong Wu, Zhenglin Wang, Deyu Zhou
  • SmartBench: Is Your LLM Truly a Good Chinese Smartphone Assistant?
    Xudong Lu, Haohao Gao, Renshou Wu, Shuai Ren, Xiaoxin Chen, Hongsheng Li, Fangyuan Li
  • F2TEval: Human-Aligned Multi-Dimensional Evaluation for Figure-to-Text Task
    Tan Yue, Rui Mao, Zilong Song, Zonghai Hu, Dongyan Zhao
  • Icon$^2$: Aligning Large Language Models Using Self-Synthetic Preference Data via Inherent Regulation
    Qiyuan Chen, Hongsen Huang, Qian Shao, Jiahe Chen, Jintai Chen, Hongxia Xu, Renjie Hua, Ren Chuan, Jian Wu
  • DSCD: Large Language Model Detoxification with Self-Constrained Decoding
    Ming Dong, Jinkui Zhang, Bolong Zheng, Xinhui Tu, Po Hu, Tingting He
  • From Reasoning to Answer: Empirical, Attention-Based and Mechanistic Insights into Distilled DeepSeek R1 Models
    Jue Zhang, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang
  • Quantifying Language Disparities in Multilingual Large Language Models
    Songbo Hu, Ivan Vulić, Anna Korhonen
  • KoBLEX: Open Legal Question Answering with Multi-hop Reasoning
    Jihyung Lee, DaeHee Kim, Seonjeong Hwang, Hyounghun Kim, Gary Lee
  • End-to-End Learnable Psychiatric Scale Guided Risky Post Screening for Depression Detection on Social Media
    Bichen Wang, Yuzhe Zi, Yixin Sun, Hao Yang, Yanyan Zhao, Bing Qin
  • ReAgent: Reversible Multi-Agent Reasoning for Knowledge-Enhanced Multi-Hop QA
    Zhao Xinjie, Fan Gao, Xingyu Song, Yingjian Chen, Rui Yang, Yanran Fu, Yuyang Wang, Yusuke Iwasawa, Yutaka Matsuo, Irene Li
  • Matter-of-Fact: A Benchmark for Verifying the Feasibility of Literature-Supported Claims in Materials Science
    Peter Jansen, Samiah Hassan, Ruoyao Wang
  • ModRWKV: Transformer Multimodality in Linear Time
    Jiale Kang, Ziyin Yue, Qingyu Yin, Rui Jiang, Weile Li, Zening Lu, Zhouran Ji
  • Multimedia Event Extraction with LLM Knowledge Editing
    Jiaao Yu, Yijing Lin, Zhipeng Gao, Xuesong Qiu, Lanlan Rui
  • Exploring the Impact of Personality Traits on LLM Toxicity and Bias
    Shuo Wang, Renhao Li, Xi Chen, Yulin Yuan, Min Yang, Derek F. Wong
  • Task-aware Contrastive Mixture of Experts for Quadruple Extraction in Conversations with Code-like Replies and Non-opinion Detection
    Chenyuan He, Yuxiang Jia, Fei Gao, Senbin Zhu, Hongde Liu, Hongying Zan, Min Peng
  • Mitigating Biases in Language Models via Bias Unlearning
    Dianqing Liu, Yi Liu, Guoqing Jin, Zhendong Mao
  • UNComp: Can Matrix Entropy Uncover Sparsity? — A Compressor Design from an Uncertainty-Aware Perspective
    Jing Xiong, Jianghan Shen, Fanghua Ye, Chaofan Tao, Zhongwei Wan, Jianqiao Lu, Xun Wu, Chuanyang Zheng, Zhijiang Guo, Min Yang, Lingpeng Kong, Ngai Wong
  • Superpose Task-specific Features for Model Merging
    Haiquan Qiu, You Wu, Dong Li, Jianmin Guo, Quanming Yao
  • FinRAGBench-V: A Benchmark for Multimodal RAG with Visual Citation in the Financial Domain
    Zhao Suifeng, Zhuoran Jin, Sujian Li, Jun Gao
  • BacktrackAgent: Enhancing GUI Agent with Error Detection and Backtracking Mechanism
    Qinzhuo Wu, Pengzhi Gao, Wei Liu, Jian Luan
  • Diffusion vs. Autoregressive Language Models: A Text Embedding Perspective
    Siyue Zhang, Yilun Zhao, Liyuan Geng, Arman Cohan, Anh Tuan Luu, Chen Zhao
  • BannerAgency: Advertising Banner Design with Multimodal LLM Agents
    Heng Wang, Yotaro Shimose, Shingo Takamatsu
  • DIDS: Domain Impact-aware Data Sampling for Large Language Model Training
    Weijie Shi, Jipeng Zhang, Yaguang Wu, Jingzhi Fang, Shibo Zhang, Yao Zhao, Hao Chen, Ruiyuan Zhang, Yue Cui, Jia Zhu, Sirui Han, Jiajie Xu, Xiaofang Zhou
  • Training LLMs to be Better Text Embedders through Bidirectional Reconstruction
    Chang Su, Dengliang Shi, Siyuan Huang, Jintao Du, Changhua Meng, Yu Cheng, Weiqiang Wang, Zhouhan Lin
  • ReMedy: Learning Machine Translation Evaluation from Human Preferences with Reward Modeling
    Shaomu Tan, Christof Monz
  • SolEval: Benchmarking Large Language Models for Repository-level Solidity Smart Contract Generation
    Zhiyuan Peng, Xin Yin, Rui Qian, Peiqin Lin, YongKang Liu, Hao Zhang, Chenhao Ying, Yuan Luo
  • In-Context Learning Boosts Speech Recognition via Human-like Adaptation to Speakers and Language Varieties
    Nathan Roll, Calbert Graham, Yuka Tatsumi, Kim Tien Nguyen, Meghan Sumner, Dan Jurafsky
  • Reasoning Model Unlearning: Forgetting Traces, Not Just Answers, While Preserving Reasoning Skills
    Changsheng Wang, Chongyu Fan, Yihua Zhang, Jinghan Jia, Dennis Wei, Parikshit Ram, Nathalie Baracaldo, Sijia Liu
  • Chain-of-Talkers (CoTalk): Fast Human Annotation of Dense Image Captions
    Yijun Shen, Delong Chen, Fan Liu, Xingyu Wang, Chuanyi Zhang, Liang Yao, Yuhui Zheng
  • DecoupleSearch: Decouple Planning and Search via Hierarchical Reward Modeling
    Hao Sun, Zile Qiao, Bo Wang, Guoxin Chen, Yingyan Hou, Yong Jiang, Pengjun Xie, Fei Huang, Yan Zhang
  • RewardDS: Privacy-Preserving Fine-Tuning for Large Language Models via Reward Driven Data Synthesis
    Jianwei Wang, Chengming Shi, Junyao Yang, Haoran Li, Qianli Ma, Huiping Zhuang, Cen Chen, Ziqian Zeng
  • Synergizing Multimodal Temporal Knowledge Graphs and Large Language Models for Social Relation Recognition
    Haorui Wang, Zheng Wang, Yuxuan Zhang, Bo Wang, Bin Wu
  • LegalSearchLM: Rethinking Legal Case Retrieval as Legal Elements Generation
    Chaeeun Kim, Jinu Lee, Wonseok Hwang
  • ChartMind: A Comprehensive Benchmark for Complex Real-world Multimodal Chart Question Answering
    Jingxuan Wei, Nan Xu, Junnan Zhu, haoyanni, Gaowei Wu, Qi Chen, Bihui Yu, Lei Wang
  • COLA: Collaborative Multi-Agent Framework with Dynamic Task Scheduling for GUI Automation
    Di Zhao, Longhui Ma, Siwei Wang, Miao Wang, Zhao Lv
  • DASA-Trans-STM: Adaptive Efficient Transformer for Short Text Matching using Data Augmentation and Semantic Awareness
    Jiguo Liu, Chao Liu, Meimei Li, Nan Li, Shihao Gao, Dali Zhu
  • Pruning the Paradox: How CLIP’s Most Informative Heads Enhance Performance While Amplifying Bias
    Avinash Madasu, Vasudev Lal, Phillip Howard
  • CoLA: Compute-Efficient Pre-Training of LLMs via Low-Rank Activation
    Ziyue Liu, Ruijie ZHANG, Zhengyang Wang, Zi Yang, Paul D. Hovland, Bogdan Nicolae, Franck Cappello, Zheng Zhang
  • TS-CLIP: Time Series Understanding by CLIP
    Ziwen Chen, Xiaoyuan Zhang, Ming Zhu
  • MultiAgentESC: A LLM-based Multi-Agent Collaboration Framework for Emotional Support Conversation
    YangyangXu, Jinpeng Hu, Zhuoer Zhao, Zhangling Duan, Xiao Sun, Xun Yang
  • Continuously Steering LLMs Sensitivity to Contextual Knowledge with Proxy Models
    Yilin Wang, Heng Wang, Yuyang Bai, Minnan Luo
  • Probing LLM World Models: Enhancing Guesstimation with Wisdom of Crowds Decoding
    Yun-Shiuan Chuang, Sameer Narendran, Nikunj Harlalka, Alexander Cheung, Sizhe Gao, Siddharth Suresh, Junjie Hu, Timothy T. Rogers
  • Recall with Reasoning: Chain-of-Thought Distillation for Mamba’s Long-Context Memory and Extrapolation
    Jun-Yu Ma, Tianqing Fang, Zhisong Zhang, Hongming Zhang, Haitao Mi, Dong Yu
  • Scalable Data Synthesis through Human-like Cognitive Imitation and Data Recombination
    Zhongyi Ye, Weitai Zhang, Xinyuan Zhou, Yongxin Zhu, Ninghui Rao, Enhong Chen
  • BeSimulator: A Large Language Model Powered Text-based Behavior Simulator
    Jianan Wang, Bin Li, Jingtao Qi, xueying wang, Fu Li, Lihanxun
  • Too Consistent to Detect: A Study of Self-Consistent Errors in LLMs
    Hexiang Tan, Fei Sun, Sha Liu, Du Su, Qi Cao, Xin Chen, Jingang Wang, Xunliang Cai, Yuanzhuo Wang, Huawei Shen, Xueqi Cheng
  • pFedGPT: Hierarchically Optimizing LoRA Aggregation Weights for Personalized Federated GPT Models
    Zhanming Shen, Tianqi Xu, Hao Wang, Jian Li, Miao Pan
  • QSpec: Speculative Decoding with Complementary Quantization Schemes
    Juntao Zhao, Wenhao Lu, Sheng Wang, Lingpeng Kong, Chuan Wu
  • Co-Evolving LLMs and Embedding Models via Density-Guided Preference Optimization for Text Clustering
    Zetong Li, Qinliang Su, Minhua Huang, Yin Yang
  • P-MMEval: A Parallel Multilingual Multitask Benchmark for Consistent Evaluation of LLMs
    Yidan Zhang, Yu Wan, Boyi Deng, Baosong Yang, Hao-Ran Wei, Fei Huang, Bowen Yu, Dayiheng Liu, Junyang Lin, Fei Huang, Jingren Zhou
  • Single LLM, Multiple Roles: A Unified Retrieval-Augmented Generation Framework Using Role-Specific Token Optimization
    Yutao Zhu, Jiajie Jin, Hongjin Qian, Zheng Liu, Zhicheng Dou, Ji-Rong Wen
  • TrInk: Ink Generation with Transformer Network
    Zezhong Jin, Shubhang Desai, Xu Chen, Biyi Fang, Zhuoyi Huang, Zhe LI, Chong-Xin Gan, Xiao Tu, Man-Wai Mak, Yan Lu, Shujie LIU
  • CalligraphicOCR for Chinese Calligraphy Recognition
    Xiaoyi Bao, Zhongqing Wang, Jinghang Gu, Chu-Ren Huang
  • When Audio and Text Disagree: Revealing Text Bias in Large Audio-Language Models
    Cheng Wang, Gelei Deng, XIANGLIN YANG, Han Qiu, Tianwei Zhang
  • RESF: Regularized-Entropy-Sensitive Fingerprinting for Black-Box Tamper Detection of Large Language Models
    Pingyi Hu, Xiaofan Bai, Xiaojing Ma, Chaoxiang He, Dongmei Zhang, Bin Benjamin Zhu
  • Model-based Large Language Model Customization as Service
    Zhaomin Wu, Jizhou Guo, Junyi Hou, Bingsheng He, Lixin Fan, Qiang Yang
  • Collab-Overcooked: Benchmarking and Evaluating Large Language Models as Collaborative Agents
    Haochen Sun, Shuwen Zhang, Lujie Niu, Lei Ren, Hao Xu, Hao Fu, Fangkun Zhao, Caixia Yuan, Xiaojie Wang
  • Improving Reasoning Capabilities in Small Models through Mixture-of-layers Distillation with Stepwise Attention on Key Information
    YaoChen, Jiawei Sheng, Wenyuan Zhang, Tingwen Liu
  • Through the Valley: Path to Effective Long CoT Training for Small Language Models
    Renjie Luo, Jiaxi Li, Chen Huang, Wei Lu
  • RED: Unleashing Token-Level Rewards from Holistic Feedback via Reward Redistribution
    Jiahui Li, Lin Li, Tai-Wei Chang, Kun Kuang, Long Chen, JUN ZHOU, Cheng Yang
  • SDGO: Self-Discrimination-Guided Optimization for Consistent Safety in Large Language Models
    Peng Ding, Wen Sun, Dailin Li, Wei Zou, Jiaming wang, Jiajun Chen, Shujian Huang
  • InMind: Evaluating LLMs in Capturing and Applying Individual Human Reasoning Styles
    Zizhen Li, Chuanhao Li, Yibin Wang, Qi Chen, Diping Song, Yukang Feng, Jianwen Sun, Jiaxin Ai, Fanrui Zhang, Mingzhu Sun, Kaipeng Zhang
  • MIO: A Foundation Model on Multimodal Tokens
    Zekun Moore Wang, King Zhu, Chunpu Xu, Wangchunshu Zhou, Jiaheng Liu, Yibo Zhang, Jiashuo WANG, Ning Shi, Siyu Li, Yizhi LI, Haoran Que, Zhaoxiang Zhang, Yuanxing Zhang, Ge Zhang, Ke Xu, Jie Fu, Wenhao Huang
  • DART: Distilling Autoregressive Reasoning to Silent Thought
    Nan Jiang, Ziming Wu, De-Chuan Zhan, Fuming Lai, Shaobing Lian
  • LeTS: Learning to Think-and-Search via Process-and-Outcome Reward Hybridization
    Qi Zhang, Shouqing Yang, Lirong Gao, Hao Chen, Xiaomeng Hu, Jinglei Chen, Jiexiang Wang, Sheng Guo, Bo Zheng, Haobo Wang, Junbo Zhao
  • CYCLE-INSTRUCT: Fully Seed-Free Instruction Tuning via Dual Self-Training and Cycle Consistency
    Zhanming Shen, Hao Chen, Yulei Tang, shaolin Zhu, Wentao Ye, Xiaomeng Hu, Haobo Wang, Gang Chen, Junbo Zhao
  • Good Intentions Beyond ACL: Who Does NLP for Social Good, and Where?
    Grace LeFevre, Qingcheng Zeng, Adam Leif, Jason Jewell, Denis Peskoff, Rob Voigt
  • From General Reward to Targeted Reward: Improving Open-ended Long-context Generation Models
    Zhihan Guo, Jiele Wu, Wenqian Cui, Yifei Zhang, Minda Hu, Yufei Wang, Irwin King
  • Think in Safety: Unveiling and Mitigating Safety Alignment Collapse in Multimodal Large Reasoning Model
    Xinyue Lou, You Li, Jinan Xu, Xiangyu Shi, Chi Chen, Kaiyu Huang
  • Understanding the Modality Gap: An Empirical Study on the Speech-Text Alignment Mechanism of Large Speech Language Models
    Bajian Xiang, Shuaijiang Zhao, Tingwei Guo, Wei zou
  • AssoCiAm: A Benchmark for Evaluating Association Thinking while Circumventing Ambiguity
    Yifan Liu, Wenkuan Zhao, Shanshan Zhong, Jinghui Qin, Mingfu Liang, Zhongzhan Huang, Wushao Wen
  • M-BRe: Discovering Training Samples for Relation Extraction from Unlabeled Texts with Large Language Models
    Zexuan Li, Hongliang Dai, Piji Li
  • R-TOFU: Unlearning in Large Reasoning Models
    Sangyeon Yoon, Wonje Jeung, Albert No
  • Chat-Driven Text Generation and Interaction for Person Retrieval
    Zequn Xie, Chuxin Wang, Yeqiang Wang, Sihang Cai, Shulei Wang, Tao Jin
  • Spontaneous Giving and Calculated Greed in Language Models
    Yuxuan Li, Hirokazu Shirado
  • SenDetEX: Sentence-Level AI-Generated Text Detection for Human-AI Hybrid Content via Style and Context Fusion
    Lei Jiang, Desheng Wu, Xiaolong Zheng
  • Judge and Improve: Towards a Better Reasoning of Knowledge Graphs with Large Language Models
    Mo Zhiqiang, yanghua, Jiahui Li, Yuan Liu, Shawn Wong, Jianmin Huang
  • Add-One-In: Incremental Sample Selection for Large Language Models via a Choice-Based Greedy Paradigm
    Zhuo Li, Yuhao Du, Xiaoqi Jiao, Steven Y. Guo, yuege feng, Xiang Wan, Anningzhe Gao, Jinpeng Hu
  • QuZO: Quantized Zeroth-Order Fine-Tuning for Large Language Models
    Jiajun Zhou, Yifan Yang, Kai Zhen, Ziyue Liu, Yequan Zhao, Ershad Banijamali, Athanasios Mouchtaris, Ngai Wong, Zheng Zhang
  • Cost-Optimal Grouped-Query Attention for Long-Context Modeling
    Yingfa Chen, Yutong Wu, Chenyang Song, Zhen Leng Thai, Xingyu Shen, Xu Han, Zhiyuan Liu, Maosong Sun
  • ChatVLA: Unified Multimodal Understanding and Robot Control with Vision-Language-Action Model
    Zhongyi Zhou, Yichen Zhu, Minjie Zhu, Junjie Wen, Ning Liu, Zhiyuan Xu, Weibin Meng, Ran Cheng, Yaxin Peng, Chaomin Shen, Feifei Feng
  • KG-RAG: Enhancing GUI Agent Decision-Making via Knowledge Graph-Driven Retrieval-Augmented Generation
    Ziyi Guan, Jason Chun Lok Li, Zhijian Hou, Pingping Zhang, Donglai Xu, Yuzhi Zhao, Mengyang Wu, Jinpeng Chen, Thanh-Toan Nguyen, Pengfei Xian, Wenao MA, Shengchao Qin, Graziano Chesi, Ngai Wong
  • CLIP-MoE: Towards Building Mixture of Experts for CLIP with Diversified Multiplet Upcycling
    Jihai Zhang, Xiaoye Qu, Tong Zhu, Yu Cheng
  • Search-o1: Agentic Search-Enhanced Large Reasoning Models
    Xiaoxi Li, Guanting Dong, Jiajie Jin, Yuyao Zhang, Yujia Zhou, Yutao Zhu, Zhicheng Dou
  • From Personas to Talks: Revisiting the Impact of Personas on LLM-Synthesized Emotional Support Conversations
    Shenghan Wu, Yimo Zhu, Wynne Hsu, Mong-Li Lee, Yang Deng
  • Select-Then-Decompose: From Empirical Analysis to Adaptive Selection Strategy for Task Decomposition in Large Language Models
    Shuodi Liu, Yingzhuo Liu, Zi Wang, yusheng wang, Huijia Wu, Liuyu Xiang, Zhaofeng He
  • TombRaider: Entering the Vault of History to Jailbreak Large Language Models
    Junchen Ding, Jiahao Zhang, Yi Liu, Ziqi Ding, Gelei Deng, Yuekang Li
  • Text Meets Topology: Rethinking Out-of-distribution Detection in Text-Rich Networks
    Danny Wang, Ruihong Qiu, Guangdong Bai, Zi Huang
  • APLOT: Robust Reward Modeling via Adaptive Preference Learning with Optimal Transport
    Zhuo Li, yuege feng, Dandan Guo, Jinpeng Hu, Anningzhe Gao, Xiang Wan
  • HS-STaR: Hierarchical Sampling for Self-Taught Reasoners via Difficulty Estimation and Budget Reallocation
    Feng Xiong, Hongling Xu, Yifei Wang, Runxi Cheng, Yong Wang, Xiangxiang Chu
  • SEPS: A Separability Measure for Robust Unlearning in LLMs
    Wonje Jeung, Sangyeon Yoon, Albert No
  • TRUST-VL: An Explainable News Assistant for General Multimodal Misinformation Detection
    Zehong Yan, Peng Qi, Wynne Hsu, Mong-Li Lee
  • Tree-of-Quote Prompting Improves Factuality and Attribution in Multi-Hop and Medical Reasoning
    Justin Xu, Yiming Li, Zizheng Zhang, Augustine Yui Hei Luk, Mayank Jobanputra, Samarth Oza, David W Eyre
  • UnitCoder: Scalable Code Synthesis from Pre-training Corpora
    Yichuan Ma, Yunfan Shao, Peiji Li, Demin Song, Qipeng Guo, Linyang Li, Xipeng Qiu, Kai Chen
  • GRPO-LEAD: A Difficulty-Aware Reinforcement Learning Approach for Concise Mathematical Reasoning in Language Models
    Jixiao Zhang, Chunsheng Zuo
  • Improving Low-Resource Sequence Labeling with Knowledge Fusion and Contextual Label Explanations
    Peichao Lai, Jiaxin Gan, Feiyang Ye, Wentao Zhang, Fangcheng Fu, Yilei Wang, Bin CUI
  • Rethinking Cross-Subject Data Splitting for Brain-to-Text Decoding
    Congchi Yin, Qian Yu, Zhiwei Fang, Changping Peng, Piji Li
  • RCScore: Quantifying Response Consistency in Large Language Models
    Dongjun Jang, Youngchae Ahn, Hyopil Shin
  • A Multi-Agent Framework with Automated Decision Rule Optimization for Cross-Domain Misinformation Detection
    Hui Li, Ante Wang, Kunquan Li, Zhihao Wang, Liang Zhang, Delai Qiu, Qingsong Liu, Jinsong Su
  • OmniEval: An Omnidirectional and Automatic RAG Evaluation Benchmark in Financial Domain
    Shuting Wang, Jiejun Tan, Zhicheng Dou, Ji-Rong Wen
  • AQuilt: Weaving Logic and Self-Inspection into Low-Cost, High-Relevance Data Synthesis for Specialist LLMs
    Xiaopeng Ke, Hexuan Deng, Xuebo Liu, Jun Rao, Zhenxi Song, Jun Yu, Min Zhang
  • MoSEs: Uncertainty-Aware AI-Generated Text Detection via Mixture of Stylistics Experts with Conditional Thresholds
    Junxi Wu, Jinpeng Wang, Zheng Liu, Bin Chen, Dongjian Hu, Hao Wu, Shu-Tao Xia
  • Merger-as-a-Stealer: Stealing Targeted PII from Aligned LLMs with Model Merging
    Lin Lu, Zhigang Zuo, Ziji Sheng, Pan Zhou
  • Pragmatic Inference Chain (PIC) Improving LLMs’ Reasoning of Authentic Implicit Toxic Language
    Xi Chen, Shuo Wang
  • Beyond Demonstrations: Dynamic Vector Construction from Latent Representations
    Wang Cai, Hsiu-Yuan Huang, Zhixiang Wang, Yunfang Wu
  • Detoxifying Large Language Models via the Diversity of Toxic Samples
    Ying Zhao, Yuanzhao Guo, XuemengWeng, Yuan Tian, Wei Wang, Yi Chang
  • LLM-Driven Implicit Target Augmentation and Fine-Grained Contextual Modeling for Zero-Shot and Few-Shot Stance Detection
    Yanxu Ji, Jinzhong Ning, Yijia Zhang, Zhi Liu, Hongfei Lin
  • Dial-In LLM: Human-Aligned LLM-in-the-loop Intent Clustering for Customer Service Dialogues
    Mengze Hong, Wailing Ng, Chen Jason Zhang, Yuanfeng SONG, Di Jiang
  • Superficial Self-Improved Reasoners Benefit from Model Merging
    Xiangchi Yuan, Chunhui Zhang, Zheyuan Liu, Dachuan Shi, Leyan Pan, Soroush Vosoughi, Wenke Lee
  • CARFT: Boosting LLM Reasoning via Contrastive Learning with Annotated Chain-of-Thought-based Reinforced Fine-Tuning
    Wenqiao Zhu, Ji Liu, Rongjunchen Zhang, Haipang WU, Yulun Zhang
  • QualBench: Benchmarking Chinese LLMs with Localized Professional Qualifications for Vertical Domain Evaluation
    Mengze Hong, Wailing Ng, Chen Jason Zhang, Di Jiang
  • VideoEraser: Concept Erasure in Text-to-Video Diffusion Models
    Naen Xu, Jinghuai Zhang, Changjiang Li, Zhi Chen, Chunyi Zhou, Qingming Li, Tianyu Du, Shouling Ji
  • Diagram-Driven Course Questions Generation
    Xinyu Zhang, Lingling Zhang, Yanrui Wu, Muye Huang, Wenjun Wu, Bo Li, Shaowei Wang, Basura Fernando, Jun Liu
  • ECC: An Emotion-Cause Conversation Dataset for Empathy Response
    Yuanyuan He, Yongsen Pan, Wei Li, Jiali You, Jiawen Deng, Fuji Ren
  • ThoughtProbe: Classifier-Guided LLM Thought Space Exploration via Probing Representations
    Zijian Wang, Chang Xu
  • JOLT-SQL: Joint Loss Tuning of Text-to-SQL with Confusion-aware Noisy Schema Sampling
    Jinwang Song, Hongying Zan, Kunli Zhang, Lingling Mu, Yingjie Han, Haobo Hua, Min Peng
  • DMDTEval: An Evaluation and Analysis of LLMs on Disambiguation in Multi-domain Translation
    Zhibo Man, Yuanmeng Chen, Yujie Zhang, Jinan Xu
  • SciRIFF: A Resource to Enhance Language Model Instruction-Following over Scientific Literature
    David Wadden, Kejian Shi, Jacob Morrison, Alan Li, Aakanksha Naik, Shruti Singh, Nitzan Barzilay, Kyle Lo, Tom Hope, Luca Soldaini, Shannon Zejiang Shen, Doug Downey, Hannaneh Hajishirzi, Arman Cohan
  • MAKAR: a Multi-Agent framework based Knowledge-Augmented Reasoning for Grounded Multimodal Named Entity Recognition
    Xinkui Lin, yuhui zhang, Yongxiu Xu, Kun Huang, Hongzhang Mu, Yubin Wang, Gaopeng Gou, Li Qian, Li Peng, Wei Liu, Hongbo Xu
  • VisCRA: A Visual Chain Reasoning Attack for Jailbreaking Multimodal Large Language Models
    Bingrui Sima, Linhua Cong, Wenxuan Wang, Kun He
  • Investigating Neurons and Heads in Transformer-based LLMs for Typographical Errors
    Kohei Tsuji, Tatsuya Hiraoka, Yuchang Cheng, Eiji Aramaki, Tomoya Iwakura
  • LMR-BENCH: Evaluating LLM Agent’s Ability on Reproducing Language Modeling Research
    Shuo Yan, Ziming Luo, Zimu Wang, Ruochen Li, Daoyang Li, Liqiang Jing, Kaiyu He, Peilin Wu, Juntong Ni, George Michalopoulos, Yue Zhang, Ziyang Zhang, Mian Zhang, Zhiyu Chen, Xinya Du
  • RAV: Retrieval-Augmented Voting for Tactile Descriptions Without Training
    Jinlin Wang, Yulong Ji, Hongyu Yang
  • Static Word Embeddings for Sentence Semantic Representation
    Takashi Wada, Yuki hirakawa, Ryotaro Shimizu, Takahiro Kawashima, Yuki Saito
  • PropRAG: Guiding Retrieval with Beam Search over Proposition Paths
    William Wang, Jiawei Han
  • Rethinking Backdoor Detection Evaluation for Language Models
    Jun Yan, Wenjie Jacky Mo, Xiang Ren, Robin Jia
  • Glider: Global and Local Instruction-Driven Expert Router
    Pingzhi Li, Prateek Yadav, Jaehong Yoon, Jie Peng, Yi-Lin Sung, Mohit Bansal, Tianlong Chen
  • CoVoGER: A Multilingual Multitask Benchmark for Speech-to-text Generative Error Correction with Large Language Models
    Zhengdong Yang, Zhen Wan, Sheng Li, Chao-Han Huck Yang, Chenhui Chu
  • Tiny Budgets, Big Gains: Parameter Placement Strategy in Parameter Super-Efficient Fine-Tuning
    Jinman Zhao, Xueyan Zhang, Jiaru Li, Jingcheng Niu, Yulan Hu, Erxue Min, Gerald Penn
  • Legal Fact Prediction: The Missing Piece in Legal Judgment Prediction
    Junkai Liu, Yujie Tong, Hui Huang, Bowen Zheng, Yiran HU, Peicheng Wu, Chuan Xiao, Makoto Onizuka, Muyun Yang, Shuyuan Zheng
  • DAMON: A Dialogue-Aware MCTS Framework for Jailbreaking Large Language Models
    Xu Zhang, Xunjian Yin, Dinghao Jing, Huixuan Zhang, Xinyu Hu, Xiaojun Wan
  • Multilingual Prompting for Improving LLM Generation Diversity
    Qihan Wang, Shidong Pan, Tal Linzen, Emily Black
  • MOSAIC: Modeling Social AI for Content Dissemination and Regulation in Multi-Agent Simulations
    Genglin Liu, Vivian T. Le, Salman Rahman, Elisa Kreiss, Marzyeh Ghassemi, Saadia Gabriel
  • Identification of Multiple Logical Interpretations in Counter-Arguments
    Wenzhi Wang, Paul Reisert, Shoichi Naito, Naoya Inoue, Machi Shimmei, Surawat Pothong, Jungmin Choi, Kentaro Inui
  • LyapLock: Bounded Knowledge Preservation in Sequential Large Language Model Editing
    Peng Wang, biyu zhou, Xuehai Tang, Jizhong Han, Songlin Hu
  • AlignX: Advancing Multilingual Large Language Models with Multilingual Representation Alignment
    Mengyu Bu, Shaolei Zhang, Zhongjun He, Hua Wu, Yang Feng
  • What Makes a Good Reasoning Chain? Uncovering Structural Patterns in Long Chain-of-Thought Reasoning
    Gangwei Jiang, Yahui Liu, Zhaoyi Li, V. W., Fuzheng Zhang, Linqi Song, Ying Wei, Defu Lian
  • HD-PiSSA: High-Rank Distributed Orthogonal Adaptation
    Yiding Wang, Fanxu Meng, Xuefeng Zhang, Fan Jiang, Pingzhi Tang, Muhan Zhang
  • Firewall Routing: Blocking Leads to Better Hybrid Inference for LLMs
    Runyu Peng, Yunhua Zhou, Kai Lv, Yang Gao, Qipeng Guo, Xipeng Qiu
  • SPE Attention: Making Attention Equivariant to Semantic-Preserving Permutation for Code Processing
    Chengyu Jiao, Shuhao Chen, Yu Zhang
  • Audio-centric Video Understanding Benchmark without Text Shortcut
    Yudong Yang, Jimin Zhuang, Guangzhi Sun, Changli Tang, Yixuan Li, Peihan Li, Yifan Jiang, Wei Li, Zejun MA, Chao Zhang
  • TurboRAG: Accelerating Retrieval-Augmented Generation with Precomputed KV Caches for Chunked Text
    Songshuo Lu, Hua Wang, Yutian Rong, Zhi Chen, Yaohua Tang
  • ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration
    Haozhan Shen, Kangjia Zhao, Tiancheng Zhao, Ruochen Xu, Zilun Zhang, Mingwei Zhu, Jianwei Yin
  • Learning Like Humans: Advancing LLM Reasoning Capabilities via Adaptive Difficulty Curriculum Learning and Expert-Guided Self-Reformulation
    Enci Zhang, Xingang Yan, Wei Lin, Tianxiang.Zhang, LU Qianchun
  • VersaTune: An Efficient Data Composition Framework for Training Multi-Capability LLMs
    Keer Lu, Keshi Zhao, Zhuoran Zhang, Zheng Liang, Bin CUI, Tengjiao Wang, Wentao Zhang
  • FlightGPT: Towards Generalizable and Interpretable UAV Vision-and-Language Navigation with Vision-Language Models
    Hengxing Cai, Jinhan Dong, Jingjun Tan, Jingcheng Deng, Sihang Li, Zhifeng Gao, Haidong Wang, Zicheng Su, Agachai Sumalee, Renxin ZHONG
  • Multimodal Language Models See Better When They Look Shallower
    Haoran Chen, Junyan Lin, Xinghao Chen, Yue Fan, Jianfeng Dong, Xin Jin, Hui Su, Jinlan Fu, Xiaoyu Shen
  • LoSiA: Efficient High-Rank Fine-Tuning via Subnet Localization and Optimization
    Xujia Wang, Yunjia Qi, Bin Xu
  • Invisible Entropy: Towards Safe and Efficient Low-Entropy LLM Watermarking
    Tianle Gu, Zongqi Wang, Kexin Huang, Yuanqi Yao, Xiangliang Zhang, Yujiu Yang, Xiuying Chen
  • Measuring Bias or Measuring the Task: Understanding the Brittle Nature of LLM Gender Biases
    Bufan Gao, Elisa Kreiss
  • Alignment-Augmented Speculative Decoding with Alignment Sampling and Conditional Verification
    Jikai Wang, Zhenxu Tian, Juntao Li, Qingrong Xia, Xinyu Duan, Zhefeng Wang, Baoxing Huai, Min Zhang
  • ViLBench: A Suite for Vision-Language Process Reward Modeling
    Haoqin Tu, Weitao Feng, Hardy Chen, Hui Liu, Xianfeng Tang, Cihang Xie
  • Keep Security! Benchmarking Security Policy Preservation in Large Language Model Contexts Against Indirect Attacks in Question Answering
    Hwan Chang, Yumin Kim, Yonghyun Jun, Hwanhee Lee
  • Route Sparse Autoencoder to Interpret Large Language Models
    Wei Shi, Sihang Li, Tao Liang, Mingyang Wan, Guojun Ma, Xiang Wang, Xiangnan He
  • BTS: Harmonizing Specialized Experts into a Generalist LLM
    Qizhen Zhang, Prajjwal Bhargava, Chloe Bi, Chris X. Cai, Jakob Nicolaus Foerster, Jeremy Fu, Punit Singh Koura, Ruan Silva, Sheng Shen, Emily Dinan, Suchin Gururangan, Mike Lewis
  • CoCoA: Confidence- and Context-Aware Adaptive Decoding for Resolving Knowledge Conflicts in Large Language Models
    Anant Khandelwal, Manish Gupta, Puneet Agrawal
  • R-Bind: Unified Enhancement of Attribute and Relation Binding in Text-to-Image Diffusion Models
    Huixuan Zhang, Xiaojun Wan
  • Middo: Model-Informed Dynamic Data Optimization for Enhanced LLM Fine-Tuning via Closed-Loop Learning
    Zinan Tang, Xin Gao, Zhuoshi Pan, Qizhi Pei, Mengzhang Cai, Jiang Wu, Conghui He, Lijun Wu
  • Information Integration in Large Language Models is Gated by Linguistic Structural Markers
    Wei Liu, Nai Ding
  • Why and How LLMs Benefit from Knowledge Introspection in Commonsense Reasoning
    Chengfeng Zhao, Shizhu He, Shanshan Jiang, Bin Dong, Jun Zhao, Kang Liu
  • GraDaSE: Graph-Based Dataset Search with Examples
    Jing He, Mingyang Lv, Qing Shi, Gong Cheng
  • Confidence-guided Refinement Reasoning for Zero-shot Question Answering
    Youwon Jang, Woo Suk Choi, Minjoon Jung, Minsu Lee, Byoung-Tak Zhang
  • DICE: Structured Reasoning in LLMs through SLM-Guided Chain-of-Thought Correction
    Yiqi Li, Yusheng Liao, Zhe Chen, Yanfeng Wang, Yu Wang
  • CTCC: A Robust and Stealthy Fingerprinting Framework for Large Language Models via Cross-Turn Contextual Correlation Backdoor
    Zhenhua Xu, Xixiang Zhao, Xubin Yue, shengwei tian, Changting Lin, Meng Han
  • Realistic Training Data Generation and Rule Enhanced Decoding in LLM for NameGuess
    Yikuan Xia, Jiazun Chen, Sujian Li, Jun Gao
  • EverTracer: Hunting Stolen Large Language Models via Stealthy and Robust Probabilistic Fingerprint
    Zhenhua Xu, Meng Han, Wenpeng Xing
  • Selective Preference Optimization via Token-Level Reward Function Estimation
    Kailai Yang, Zhiwei Liu, Qianqian Xie, Jimin Huang, Erxue Min, Sophia Ananiadou
  • Arena-lite: Efficient and Reliable Large Language Model Evaluation via Tournament-Based Direct Comparisons
    Seonil Son, Ju-Min Oh, Heegon Jin, Cheolhun Jang, JEONGBEOM JEONG, KunTae Kim
  • Addressing Tokenization Inconsistency in Steganography and Watermarking Based on Large Language Models
    Ruiyi Yan, Yugo Murawaki
  • ExeCoder: Empowering Large Language Models with Executability Representation for Code Translation
    Minghua He, Yue Chen, Fangkai Yang, Pu Zhao, Wenjie Yin, Yu Kang, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang
  • TableEval: A Real-World Benchmark for Complex, Multilingual, and Multi-Structured Table Question Answering
    Junnan Zhu, Jingyi Wang, Bohan Yu, Xiaoyu Wu, Junbo Li, Lei Wang, Nan Xu
  • NOVA-63: Native Omni-lingual Versatile Assessments of 63 Disciplines
    Jinyang Zhang, Kexin Yang, Yu Wan, Muyang Ye, Baosong Yang, Fei Huang, Junyang Lin, Dayiheng Liu
  • InfoGain-RAG: Boosting Retrieval-Augmented Generation through Document Information Gain-based Reranking and Filtering
    Zihan Wang, Zihan Liang, Zhou Shao, Yufei Ma, Huangyu Dai, Ben Chen, MaoLingtao, Chenyi Lei, Yuqing DING, Han Li
  • SpecVLM: Enhancing Speculative Decoding of Video LLMs via Verifier-Guided Token Pruning
    Yicheng Ji, Jun Zhang, Heming Xia, Jinpeng Chen, Lidan Shou, Gang Chen, Huan Li
  • What Do Indonesians Really Need from Language Technology? A Nationwide Survey
    Muhammad Dehan Al Kautsar, Lucky Susanto, Derry Tanti Wijaya, Fajri Koto
  • LEO-MINI: An Efficient Multimodal Large Language Model using Conditional Token Reduction and Mixture of Multi-Modal Experts
    Yimu Wang, Mozhgan Nasr Azadani, Sean Sedwards, Krzysztof Czarnecki
  • Confounding Factors in Relating Model Performance to Morphology
    Wessel Poelman, Thomas Bauwens, Miryam de Lhoneux
  • Context-Aware Membership Inference Attacks against Pre-trained Large Language Models
    Hongyan Chang, Ali Shahin Shamsabadi, Kleomenis Katevas, Hamed Haddadi, Reza Shokri
  • Formalizing Style in Personal Narratives
    Gustave Cortal, Alain Finkel
  • TopicAttack: An Indirect Prompt Injection Attack via Topic Transition
    Yulin Chen, Haoran Li, Yuexin Li, Yue Liu, Yangqiu Song, Bryan Hooi
  • PSET: a Phonetics-Semantics Evaluation Testbed
    Gianluca Sperduti, Dong Nguyen
  • From Unaligned to Aligned: Scaling Multilingual LLMs with Multi-Way Parallel Corpora
    Yingli Shen, Wen Lai, Shuo Wang, Kangyang Luo, Alexander Fraser, Maosong Sun
  • GATEAU: Selecting Influential Samples for Long Context Alignment
    Shuzheng Si, Haozhe Zhao, Gang Chen, Yunshui Li, Kangyang Luo, Chuancheng Lv, Kaikai An, Fanchao Qi, Baobao Chang, Maosong Sun
  • Teach Small Models to Reason by Curriculum Distillation
    Wangyi Jiang, Yaojie Lu, Hongyu Lin, Xianpei Han, Le Sun
  • Enhancing Reasoning Abilities of Small LLMs with Cognitive Alignment
    Wenrui Cai, Chengyu Wang, Junbing Yan, Jun Huang, Xiangzhong Fang
  • NOVER: Incentive Training for Language Models via Verifier-Free Reinforcement Learning
    Wei Liu, Siya Qi, Xinyu Wang, Chen Qian, Yali Du, Yulan He
  • Genre Matters: How Text Types Interact with Decoding Strategies and Lexical Predictors in Shaping Reading Behavior
    Lena Sophia Bolliger, Lena Ann Jäger
  • RTE-GMoE: A Model-agnostic Approach for Relation Triplet Extraction via Graph-based Mixture-of-Expert Mutual Learning
    Aziguli Wulamu, Kaiyuan Gong, Lyu Zhengyu, Yu Han, Zhihong Zhu, Bowen Xing
  • Avoidance Decoding for Diverse Multi-Branch Story Generation
    Kyeongman Park, Nakyeong Yang, Kyomin Jung
  • Probabilistic Soundness Guarantees in LLM Reasoning Chains
    Weiqiu You, Anton Xue, Shreya Havaldar, Delip Rao, Helen Jin, Chris Callison-Burch, Eric Wong
  • SQLWOZ: A Realistic Task-Oriented Dialogue Dataset with SQL-Based Dialogue State Representation for Complex User Requirements
    Heng-Da Xu, Xian-Ling Mao, Fanshu Sun, Tian-Yi Che, Cheng-Xin Xin, Heyan Huang
  • SURE: Safety Understanding and Reasoning Enhancement for Multimodal Large Language Models
    Yuxin Gou, Xiaoning Dong, Qin Li, Shishen Gu, Richang Hong, Wenbo Hu
  • EMO: Embedding Model Distillation via Intra-Model Relation and Optimal Transport Alignments
    Minh Phuc Truong, Hai An Vu, Tu Vu, Nguyen Thi Ngoc Diep, Linh Ngo Van, Thien Huu Nguyen, Trung Le
  • AesBiasBench: Evaluating Bias and Alignment in Multimodal Language Models for Personalized Image Aesthetic Assessment
    Kun Li, Lai Man Po, Hongzheng Yang, XUYUAN XU, Kangcheng Liu, Yuzhi Zhao
  • DA-Pred: Performance Prediction for Text Summarization under Domain-Shift and Instruct-Tuning
    Anum Afzal, Florian Matthes, Alexander Fabbri
  • UnCo: Uncertainty-Driven Collaborative Framework of Large and Small Models for Grounded Multimodal NER
    Jielong Tang, Yang Yang, Jianxing Yu, Zhen-Xing Wang, Haoyuan Liang, Liang Yao, Jian Yin
  • An Empirical Study of LLM Reasoning Ability Under Strict Output Length Constraint
    Yi Sun, Han Wang, Jiaqiang Li, Jiacheng Liu, Xiangyu Li, Hao Wen, Huiwen Zheng, Yan Liang, Yuanchun Li, Yunxin Liu
  • Enrich-on-Graph: Query-Graph Alignment for Complex Reasoning with LLM Enriching
    Songze Li, Zhiqiang Liu, Zhengke Gui, Huajun Chen, Wen Zhang
  • Noise, Adaptation, and Strategy: Assessing LLM Fidelity in Decision-Making
    Yuanjun Feng, Vivek Choudhary, Yash Raj Shrestha
  • Structuring Radiology Reports: Challenging LLMs with Lightweight Models
    Johannes Moll, Louisa Fay, Asfandyar Azhar, Sophie Ostmeier, Sergios Gatidis, Tim C. Lueth, Curtis Langlotz, Jean-Benoit Delbrouck
  • PricingLogic: Evaluating LLMs Reasoning on Complex Tourism Pricing Tasks
    Yunuo Liu, Dawei Zhu, Zena Al Khalili, Dai Cheng, Yanjun Chen, Dietrich Klakow, Wei Zhang, Xiaoyu Shen
  • EcoTune: Token-Efficient Multi-Fidelity Hyperparameter Optimization for Large Language Model Inference
    Yuebin XU, Zeyi Wen
  • Investigating Value-Reasoning Reliability in Small Large Language Models
    杜霞, Shuhan Sun, Pengyuan Liu, Dong Yu
  • Can LLMs Explain Themselves Counterfactually?
    Zahra Dehghanighobadi, Asja Fischer, Muhammad Bilal Zafar
  • Self-Adjust Softmax
    Chuanyang Zheng, Yihang Gao, Guoxuan Chen, Han Shi, Jing Xiong, Xiaozhe Ren, Chao Huang, Zhenguo Li, Yu Li
  • XAutoLM: Efficient Fine-Tuning of Language Models via Meta-Learning and AutoML
    Ernesto Luis Estevanell Valladares, Suilan Estevez-Velarde, Yoan Gutierrez, Andrés Montoyo, Ruslan Mitkov
  • UNCERTAINTY-LINE: Length-Invariant Estimation of Uncertainty for Large Language Models
    Roman Vashurin, Maiya Goloburda, Preslav Nakov, Maxim Panov
  • WebAgent-R1: Training Web Agents via End-to-End Multi-Turn Reinforcement Learning
    Zhepei Wei, Wenlin Yao, Yao Liu, Weizhi Zhang, Qin Lu, Liang Qiu, Changlong Yu, Puyang Xu, Chao Zhang, Bing Yin, Hyokun Yun, Lihong Li
  • Same evaluation, more tokens: On the effect of input length for machine translation evaluation using Large Language Models
    Tobias Domhan, Dawei Zhu
  • PAKTON: A Multi-Agent Framework for Question Answering in Long Legal Agreements
    Raptopoulos Petros, Giorgos Filandrianos, Maria Lymperaiou, Giorgos Stamou
  • PoSum-Bench: Benchmarking Position Bias in LLM-based Conversational Summarization
    XU SUN, Lionel Delphin-Poulat, Christèle Tarnec, Anastasia Shimorina
  • ConCISE: Confidence-guided Compression in Step-by-step Efficient Reasoning
    Ziqing Qiao, Yongheng Deng, Jiali Zeng, Dong Wang, Lai Wei, Guanbo Wang, Fandong Meng, Jie Zhou, Ju Ren, Yaoxue Zhang
  • Layer-Aware Representation Filtering: Purifying Finetuning Data to Preserve LLM Safety Alignment
    Hao Li, Lijun Li, Zhenghao Lu, Xianyi Wei, Rui Li, Jing Shao, Lei Sha
  • Cross-domain Rumor Detection via Test-Time Adaptation and Large Language Models
    Yuxia Gong, Shuguo Hu, Huaiwen Zhang
  • MLWQ: Efficient Small Language Model Deployment via Multi-Level Weight Quantization
    Chun Hu, Junhui He, Shangyu Wu, YuxinHe, Chun Jason Xue, Qingan Li
  • ToDi: Token-wise Distillation via Fine-Grained Divergence Control
    Seongryong Jung, Suwan Yoon, DongGeon Kim, Hwanhee Lee
  • RethinkMCTS: Refining Erroneous Thoughts in Monte Carlo Tree Search for Code Generation
    Qingyao Li, Wei Xia, Xinyi Dai, Kounianhua Du, Weiwen Liu, Yasheng Wang, Ruiming Tang, Yong Yu, Weinan Zhang
  • Probing for Arithmetic Errors in Language Models
    Yucheng Sun, Alessandro Stolfo, Mrinmaya Sachan
  • NILE: Internal Consistency Alignment in Large Language Models
    Minda Hu, Qiyuan Zhang, Yufei Wang, Bowei He, Hongru WANG, Jingyan Zhou, Liangyou Li, Yasheng Wang, Chen Ma, Irwin King
  • Mining the Past with Dual Criteria: Integrating Three types of Historical Information for Context-aware Event Forecasting
    Rong Ma, Lei Wang, Yating Yang, Bo Ma, Rui Dong, Fengyi Yang, Ahtamjan Ahmat, Kaiwen Lu, Xinyue Wang
  • RAGferee: Building Contextual Reward Models for Retrieval-Augmented Generation
    Andrei Catalin Coman, Ionut Teodor Sorodoc, Leonardo F. R. Ribeiro, Bill Byrne, James Henderson, Adrià de Gispert
  • Large Language Models Discriminate Against Speakers of German Dialects
    Minh Duc Bui, Carolin Holtermann, Valentin Hofmann, Anne Lauscher, Katharina von der Wense
  • Uncovering Argumentative Flow: A Question-Focus Discourse Structuring Framework
    Yini Wang, Xian Zhou, Shengan Zheng, Linpeng Huang, Zhunchen Luo, Wei Luo, Xiaoying Bai
  • AbsVis – Benchmarking How Humans and Vision-Language Models “See” Abstract Concepts in Images
    Tarun Tater, Diego Frassinelli, Sabine Schulte im Walde
  • A Rigorous Evaluation of LLM Data Generation Strategies for Low-Resource Languages
    Tatiana Anikina, Jan Cegin, Jakub Simko, Simon Ostermann
  • Alignment with Fill-In-the-Middle for Enhancing Code Generation
    Houxing Ren, Zimu Lu, Weikang Shi, Haotian Hou, Yunqiao Yang, Ke Wang, Aojun Zhou, Junting Pan, Mingjie Zhan, Hongsheng Li
  • A Middle Path for On-Premises LLM Deployment: Preserving Privacy Without Sacrificing Model Confidentiality
    Hanbo Huang, Yihan Li, Bowen Jiang, Bo Jiang, Lin Liu, Zhuotao Liu, Ruoyu Sun, Shiyu Liang
  • Variance Sensitivity Induces Attention Entropy Collapse and Instability in Transformers
    Jonghyun Hong, Sungyoon Lee
  • X-FLoRA: Cross-modal Federated Learning with Modality-expert LoRA for Medical VQA
    Min Hyuk Kim, Changheon Kim, Seok Bong Yoo
  • Robust Native Language Identification through Agentic Decomposition
    Ahmet Yavuz Uluslu, Tannon Kew, Tilia Ellendorff, Gerold Schneider, Rico Sennrich
  • ConsistentChat: Building Skeleton-Guided Consistent Multi-Turn Dialogues for Large Language Models from Scratch
    Jiawei Chen, Xinyan Guan, Qianhao Yuan, Mo guozhao, Weixiang Zhou, Yaojie Lu, Hongyu Lin, Ben He, Le Sun, Xianpei Han
  • Does Acceleration Cause Hidden Instability in Vision Language Models? Uncovering Instance-Level Divergence Through a Large-Scale Empirical Study
    Yizheng Sun, Hao Li, Chang Xu, Hongpeng Zhou, Chenghua Lin, Riza Batista-Navarro, Jingyuan Sun
  • When Annotators Disagree, Topology Explains: Mapper, a Topological Tool for Exploring Text Embedding Geometry and Ambiguity
    Nisrine Rair, Alban Goupil, Valeriu Vrabie, Emmanuel Chochoy
  • Self-Critique and Refinement for Faithful Natural Language Explanations
    Yingming Wang, Pepa Atanasova
  • The Psychology of Falsehood: A Human-Centric Survey of Misinformation Detection
    Arghodeep Nandi, Megha Sundriyal, Euna Mehnaz Khan, Jikai Sun, Emily K. Vraga, Jaideep Srivastava, Tanmoy Chakraborty
  • SEAL: Structure and Element Aware Learning Improves Long Structured Document Retrieval
    Xinhao Huang, Zhibo Ren, Yipeng Yu, Ying Zhou, Zulong Chen, Zeyi Wen
  • AnchorAttention: Difference-Aware Sparse Attention with Stripe Granularity
    Yu Zhang, Dong Guo, Fang Wu, Dian Ding, Yiming Zhang
  • Attacks by Content: Automated Fact-checking is an AI Security Issue
    Michael Sejr Schlichtkrull
  • MUZO: Leveraging Multiple Queries and Momentum for Zeroth-Order Fine-Tuning of Large Language Models
    Yuezhang PENG, Yuxin Liu, Fei Wen, Xie Chen
  • Your Language Model Can Secretly Write Like Humans: Contrastive Paraphrase Attacks on LLM-Generated Text Detectors
    Hao Fang, Jiawei Kong, Tianqu Zhuang, Yixiang Qiu, Kuofeng Gao, Bin Chen, Shu-Tao Xia, Yaowei Wang, Min Zhang
  • Will It Still Be True Tomorrow? Multilingual Evergreen Question Classification to Improve Trustworthy QA
    Sergey Pletenev, Maria Marina, Nikolay Ivanov, Daria Galimzianova, Nikita Krayko, Mikhail Salnikov, Vasily Konovalov, Alexander Panchenko, Viktor Moskvoretskii
  • Steering Language Models in Multi-Token Generation: A Case Study on Tense and Aspect
    Alina Klerings, Jannik Brinkmann, Daniel Ruffinelli, Simone Paolo Ponzetto
  • DocReRank: Single-Page Hard Negative Query Generation for Training Multi-Modal RAG Rerankers
    navve wasserman, Oliver Heinimann, Yuval Golbari, Tal Zimbalist, Eli Schwartz, michal Irani
  • Reason to Rote: Rethinking Memorization in Reasoning
    Yupei Du, Philipp Mondorf, Silvia Casola, Yuekun Yao, Robert Litschko, Barbara Plank
  • VELA: An LLM-Hybrid-as-a-Judge Approach for Evaluating Long Image Captions
    Kazuki Matsuda, Yuiga Wada, Shinnosuke Hirano, Seitaro Otsuki, Komei Sugiura
  • LLM-Independent Adaptive RAG: Let the Question Speak for Itself
    Maria Marina, Nikolay Ivanov, Sergey Pletenev, Mikhail Salnikov, Daria Galimzianova, Nikita Krayko, Vasily Konovalov, Alexander Panchenko, Viktor Moskvoretskii
  • TurnBack: A Geospatial Route Cognition Benchmark for Large Language Models through Reverse Route
    Hongyi Luo, Qing Cheng, Daniel Matos, Hari Krishna Gadi, Yanfeng Zhang, Lu Liu, Yongliang Wang, Niclas Zeller, Daniel Cremers, Liqiu Meng
  • Certainty in Uncertainty: Reasoning over Uncertain Knowledge Graphs with Statistical Guarantees
    Yuqicheng Zhu, Jingcheng Wu, Yizhen Wang, Hongkuan Zhou, Jiaoyan Chen, Evgeny Kharlamov, Steffen Staab
  • Beyond Seen Data: Improving KBQA Generalization Through Schema-Guided Logical Form Generation
    Shengxiang Gao, Jey Han Lau, Jianzhong Qi
  • A Training-Free Length Extrapolation Approach for LLMs: Greedy Attention Logit Interpolation
    Yan Li, Tianyi Zhang, Zechuan Li, Caren Han
  • Taming Text-to-Image Synthesis for Novices: User-centric Prompt Generation via Multi-turn Guidance
    Yilun Liu, Minggui HE, Feiyu Yao, Yuhe Ji, Shimin Tao, Jingzhou DU, JustinLi, Jian Gao, Zhang Li, Hao Yang, Boxing Chen, Osamu Yoshie
  • We Need to Measure Data Diversity in NLP — Better and Broader
    Dong Nguyen, Esther Ploeger
  • Sheaf Discovery with Joint Computation Graph Pruning and Flexible Granularity
    Jingcheng Niu, Lei Yu, Zining Zhu, Xi Chen, Gerald Penn
  • Hierarchical Bracketing Encodings Work for Dependency Graphs
    Ana Ezquerro, Carlos Gómez-Rodríguez, David Vilares
  • Multimodal Fine-grained Context Interaction Graph Modeling for Conversational Speech Synthesis
    Zhenqi Jia, Rui Liu, Berrak Sisman, Haizhou Li
  • Judging Quality Across Languages: A Multilingual Approach to Pretraining Data Filtering with Language Models
    Mehdi Ali, Manuel Brack, Max Lübbering, Elias Wendt, Abbas Goher Khan, Richard Rutmann, Alex Jude, Maurice Kraus, Alexander Arno Weber, Felix Stollenwerk, David Kaczér, Florian Mai, Lucie Flek, Rafet Sifa, Nicolas Flores-Herr, Joachim Koehler, Patrick Schramowski, Michael Fromm, Kristian Kersting
  • Conditional [MASK] Discrete Diffusion Language Model
    Hyukhun Koh, Minha Jhang, Dohyung Kim, Sangmook Lee, Kyomin Jung
  • Language-Guided Temporal Token Pruning for Efficient VideoLLM Processing
    Yogesh Kumar
  • A Fully Probabilistic Perspective on Large Language Model Unlearning: Evaluation and Optimization
    Anda Cheng, Wei Huang, Yinggui Wang
  • IIET: Efficient Numerical Transformer via Implicit Iterative Euler Method
    Xinyu Liu, Bei Li, Jiahao Liu, Junhao Ruan, Kechen Jiao, Hongyin Tang, Jingang Wang, Tong Xiao, JingBo Zhu
  • WebEvolver: Enhancing Web Agent Self-Improvement with Co-evolving World Model
    Tianqing Fang, Hongming Zhang, Zhisong Zhang, Kaixin Ma, Wenhao Yu, Haitao Mi, Dong Yu
  • Leveraging Semantic Triples for Private Document Generation with Local Differential Privacy Guarantees
    Stephen Meisenbacher, Maulik Chevli, Florian Matthes
  • HVGuard: Utilizing Multimodal Large Language Models for Hateful Video Detection
    yiheng jing, mingming zhang, Yong Zhuang, jiacheng guo, Juan Wang, Xiaoyang Xu, Wenzhe Yi, Keyan Guo, Hongxin Hu
  • Accelerate Parallelizable Reasoning via Parallel Decoding within One Sequence
    Yijiong Yu, Ji Pei, Wei Wang, Ran Chen
  • SlideCoder: Layout-aware RAG-enhanced Hierarchical Slide Generation from Design
    Wenxin Tang, Jingyu Xiao, Wenxuan Jiang, Xi Xiao, Yuhang Wang, Xuxin Tang, Qing Li, Yuehe Ma, Junliang Liu, Shisong Tang, Michael R. Lyu
  • LLM-OREF: An Open Relation Extraction Framework Based on Large Language Models
    Hongyao Tu, Liang Zhang, Yujie Lin, Xin Lin, Haibo Zhang, Long zhang, Jinsong Su
  • Ambiguity Awareness Optimization: Towards Semantic Disambiguation for Direct Preference Optimization
    Jian Li, Shenglin Yin, Yujia Zhang, Alan Zhao, Xi Chen, Xiaohui Zhou, Pengfei Xu
  • Improving Multilingual Retrieval-Augmented Language Models through Dialectic Reasoning Argumentations
    Leonardo Ranaldi, Federico Ranaldi, Fabio Massimo Zanzotto, Barry Haddow, Alexandra Birch
  • Predicate-Guided Generation for Mathematical Reasoning
    Jiajun Chen, Yik-Cheung Tam
  • ComplexTempQA: A 100m Dataset for Complex Temporal Question Answering
    Raphael Gruber, Abdelrahman Abdallah, Michael Färber, Adam Jatowt
  • ViDoRAG: Visual Document Retrieval-Augmented Generation via Dynamic Iterative Reasoning Agents
    Qiuchen Wang, Ruixue Ding, Zehui Chen, Weiqi Wu, Shihang Wang, Pengjun Xie, Feng Zhao
  • IndoSafety: Culturally Grounded Safety for LLMs in Indonesian Languages
    Muhammad Falensi Azmi, Muhammad Dehan Al Kautsar, Alfan Farizki Wicaksono, Fajri Koto
  • Can LLMs Help You at Work? A Sandbox for Evaluating LLM Agents in Enterprise Environments
    Harsh Vishwakarma, Ankush Agarwal, Ojas Patil, Chaitanya Devaguptapu, Mahesh Chandran
  • Steering LLM Reasoning Through Bias-Only Adaptation
    Viacheslav Sinii, Alexey Gorbatovski, Artem Cherepanov, Boris Shaposhnikov, Nikita Balagansky, Daniil Gavrilov
  • VLASCD: A Visual Language Action Model for Simultaneous Chatting and Decision Making
    Zuojin Tang, Bin Hu, Chenyang Zhao, De Ma, Gang Pan, Bin Liu
  • M-LongDoc: A Benchmark For Multimodal Super-Long Document Understanding And A Retrieval-Aware Tuning Framework
    Yew Ken Chia, Liying Cheng, Hou Pong Chan, Maojia Song, Chaoqun Liu, Mahani Aljunied, Soujanya Poria, Lidong Bing
  • Look Again, Think Slowly: Enhancing Visual Reflection in Vision-Language Models
    Pu Jian, Junhong Wu, Wei Sun, Chen Wang, Shuo Ren, Jiajun Zhang
  • FB-Bench: A Fine-Grained Multi-Task Benchmark for Evaluating LLMs’ Responsiveness to Human Feedback
    Youquan Li, Miao Zheng, Fan Yang, Guosheng Dong, Bin CUI, Weipeng Chen, Zenan Zhou, Wentao Zhang
  • HYDRA: A Multi-Head Encoder-only Architecture for Hierarchical Text Classification
    Fabian Karl, Ansgar Scherp
  • CARD: Cross-modal Agent Framework for Generative and Editable Residential Design
    Pengyu Zeng, Jun Yin, Miao Zhang, Yuqin Dai, Jizhizi Li, Shuai Lu
  • DrDiff: Dynamic Routing Diffusion with Hierarchical Attention for Breaking the Efficiency-Quality Trade-off
    Jusheng Zhang, Yijia Fan, Kaitong Cai, Zimeng Huang, Xiaofei Sun, Jian Wang, Chengpei Tang, Keze Wang
  • FaST: Feature-aware Sampling and Tuning for Personalized Preference Alignment with Limited Data
    Thibaut Thonet, Germán Kruszewski, Jos Rozen, Pierre ERBACHER, Marc Dymetman
  • On LLM-Based Scientific Inductive Reasoning Beyond Equations
    Brian S. Lin, Jiaxin Yuan, Zihan Zhou, Shouli Wang, Shuo Wang, Cunliang Kong, Qi Shi, Yuxuan Li, Liner Yang, Zhiyuan Liu, Maosong Sun
  • SPECS: Specificity-Enhanced CLIP-Score for Long Image Caption Evaluation
    Xiaofu Chen, Israfel Salazar, Yova Kementchedjhieva
  • LM-Searcher: Cross-domain Neural Architecture Search with LLMs via Unified Numerical Encoding
    Yuxuan Hu, Jihao Liu, Ke Wang, Jinliang Zheng, Weikang Shi, Manyuan Zhang, Qi Dou, Rui Liu, Aojun Zhou, Hongsheng Li
  • Does quantization affect models’ performance on long-context tasks?
    Anmol Mekala, Anirudh Atmakuru, Yixiao Song, Marzena Karpinska, Mohit Iyyer
  • Token-Aware Editing of Internal Activations for Large Language Model Alignment
    Tianbo Wang, Kewei Liao, Yuqing Ma, Chengzhao Yang, Zhange Zhang, Jiakai Wang, Xianglong Liu
  • Bitune: Leveraging Bidirectional Attention to Improve Decoder-Only LLMs
    Dawid Jan Kopiczko, Tijmen Blankevoort, Yuki M Asano
  • Disambiguation in Conversational Question Answering in the Era of LLMs and Agents: A Survey
    Mehrab Tanjim, Yeonjun In, Xiang Chen, Victor Bursztyn, Ryan A. Rossi, Sungchul Kim, Guang-Jie Ren, Vaishnavi Muppala, Shun Jiang, Yongsung Kim, Chanyoung Park
  • Plan Dynamically, Express Rhetorically: A Debate-Driven Rhetorical Framework for Argumentative Writing
    Xueguan Zhao, Wenpeng Lu, Chaoqun Zheng, Weiyu Zhang, Jiasheng Si, Deyu Zhou
  • TCPO: Thought-Centric Preference Optimization for Effective Embodied Decision-making
    Kechen Jiao, Zhirui Fang, Jiahao Liu, Bei Li, Qifan Wang, Xinyu Liu, Junhao Ruan, Zhongjian Qiao, Yifan Zhu, Yaxin Xu, Jingang Wang, Xiu Li
  • Reimagining Safety Alignment with An Image
    Yifan Xia, Guorui Chen, Wenqian Yu, Zhijiang Li, Philip Torr, Jindong Gu
  • Visual Contextual Attack: Jailbreaking MLLMs with Image-Driven Context Injection
    Miao Ziqi, Yi Ding, Lijun Li, Jing Shao
  • Can Large Language Models Win the International Mathematical Games?
    Alessio Cocchieri, Luca Ragazzi, Giuseppe Tagliavini, Lorenzo Tordi, Antonella Carbonaro, Gianluca Moro
  • CodeArena: Evaluating and Aligning CodeLLMs on Human Preference
    Jian Yang, Jiaxi Yang, Wei Zhang, JinKe, Yibo Miao, Lei Zhang, Liqun Yang, Zeyu Cui, Yichang Zhang, Zhoujun Li, Binyuan Hui, Junyang Lin
  • Language models can learn implicit multi-hop reasoning, but only if they have lots of training data
    Yuekun Yao, Yupei Du, Dawei Zhu, Michael Hahn, Alexander Koller
  • UniversalCEFR: Enabling Open Multilingual Research on Language Proficiency Assessment
    Joseph Marvin Imperial, Abdullah Barayan, Regina Stodden, Rodrigo Wilkens, Ricardo Muñoz Sánchez, GAO Lingyun, Melissa Torgbi, Dawn Knight, Gail Forey, Reka R. Jablonkai, Ekaterina Kochmar, Robert Joshua Reynolds, Eugénio Ribeiro, Horacio Saggion, Elena Volodina, Sowmya Vajjala, Thomas François, Fernando Alva-Manchego, Harish Tayyar Madabushi
  • CROP: Contextual Region-Oriented Visual Token Pruning
    Jiawei Guo, Feifei Zhai, Pu Jian, qianrun Wei, Yu Zhou
  • CR4-NarrEmote: An Open Vocabulary Dataset of Narrative Emotions Derived Using Citizen Science
    Andrew Piper, Robert Budac
  • XQuant: Achieving Ultra-Low Bit KV Cache Quantization with Cross-Layer Compression
    Haoqi Yang, Yao Yao, Zuchao Li, Baoyuan Qi, Liu Guoming, hai zhao
  • DINT Transformer
    Yueyang Cang, Yuhang Liu, Xiaoteng Zhang, Erlu Zhao, Li Shi
  • ICR: Iterative Clarification and Rewriting for Conversational Search
    Zhiyu Cao, Peifeng Li, Qiaoming Zhu
  • Pre-training CLIP against Data Poisoning with Optimal Transport-based Matching and Alignment
    Tong Zhang, Kuofeng Gao, Jiawang Bai, Leo Yu Zhang, Xin Yin, Zonghui Wang, Shouling Ji, Wenzhi CHEN
  • Similarity = Value? Consultation Value-Assessment and Alignment for Personalized Search
    Weicong Qin, Yi Xu, Weijie Yu, Teng Shi, Chenglei Shen, Ming He, Jianping Fan, Xiao Zhang, Jun Xu
  • RTQA : Recursive Thinking for Complex Temporal Knowledge Graph Question Answering with Large Language Models
    Zhaoyan Gong, Juan Li, Zhiqiang Liu, Lei Liang, Huajun Chen, Wen Zhang
  • Not All Parameters Are Created Equal: Smart Isolation Boosts Fine-Tuning Performance
    YaoWang, Di Liang, Minlong Peng
  • AI Knows Where You Are: Exposure, Bias, and Inference in Multimodal Geolocation with KoreaGEO
    Xiaonan Wang, Bo Shao, Hansaem Kim
  • CAT: Causal Attention Tuning For Injecting Fine-grained Causal Knowledge into Large Language Models
    Kairong Han, Wenshuo Zhao, Ziyu Zhao, Ye Jun Jian, Lujia Pan, Kun Kuang
  • Enhancing LLM Text Detection with Retrieved Contexts and Logits Distribution Consistency
    Zhaoheng Huang, Yutao Zhu, Ji-Rong Wen, Zhicheng Dou
  • Stop Looking for ``Important Tokens’’ in Multimodal Language Models: Duplication Matters More
    Zichen Wen, Yifeng Gao, Shaobo Wang, Junyuan Zhang, Qintong Zhang, Weijia Li, Conghui He, Linfeng Zhang
  • AgentPro: Enhancing LLM Agents with Automated Process Supervision
    Yuchen Deng, Shichen Fan, Naibo Wang, Xinkui Zhao, See-Kiong Ng
  • PORTS: Preference-Optimized Retrievers for Tool Selection with Large Language Models
    Lorenzo Molfetta, Giacomo Frisoni, Nicolò Monaldini, Gianluca Moro
  • MusKGC: A Flexible Multi-source Knowledge Enhancement Framework for Open-World Knowledge Graph Completion
    Xin Song, Liu Haiyan, Haiyang Wang, Ye Wang, Kai Chen, Bin Zhou
  • Towards Transferable Personality Representation Learning based on Triplet Comparisons and Its Applications
    Kai Tang, Rui Wang, Renyu Zhu, Minmin Lin, Xiao Ding, Tangjie Lv, Changjie Fan, Runze Wu, Haobo Wang
  • Reshaping Representation Space to Balance the Safety and Over-rejection in Large Audio Language Models
    Hao Yang, Lizhen Qu, Ehsan Shareghi, Gholamreza Haffari
  • Benchmarking Large Language Models Under Data Contamination: A Survey from Static to Dynamic Evaluation
    Simin Chen, Yiming Chen, Zexin Li, Yifan Jiang, Zhongwei Wan, Yixin He, Dezhi Ran, Tianle Gu, Haizhou Li, Tao Xie, Baishakhi Ray
  • FinTrust: A Comprehensive Benchmark of Trustworthiness Evaluation in Finance Domain
    Tiansheng Hu, Tongyan Hu, Liuyang Bai, Yilun Zhao, Arman Cohan, Chen Zhao
  • RecGPT: A Foundation Model for Sequential Recommendation
    Yangqin Jiang, Xubin Ren, Lianghao Xia, Da Luo, Kangyi Lin, Chao Huang
  • Towards Holistic Evaluation of Large Audio-Language Models: A Comprehensive Survey
    Chih-Kai Yang, Neo S. Ho, Hung-yi Lee
  • Train One Sparse Autoencoder Across Multiple Sparsity Budgets to Preserve Interpretability and Accuracy
    Nikita Balagansky, Yaroslav Aksenov, Daniil Laptev, Vadim Kurochkin, Gleb Gerasimov, Nikita Koriagin, Daniil Gavrilov
  • Learn and Unlearn: Addressing Misinformation in Multilingual LLMs
    TaiMing Lu, Philipp Koehn
  • PRISM: Efficient Long-Range Reasoning With Short-Context LLMs
    Dulhan Jayalath, James Bradley Wendt, Nicholas Monath, Sandeep Tata, Beliz Gunel
  • Augmenting Multi-Agent Communication with State Delta Trajectory
    Yichen Tang, Weihang Su, Yujia Zhou, Yiqun LIU, Min Zhang, Shaoping Ma, Qingyao Ai
  • SAEs Are Good for Steering – If You Select the Right Features
    Dana Arad, Aaron Mueller, Yonatan Belinkov
  • CoBA: Counterbias Text Augmentation for Mitigating Various Spurious Correlations via Semantic Triples
    Kyohoon Jin, Juhwan Choi, JungMin Yun, Junho Lee, Soojin Jang, YoungBin Kim
  • Layered Insights: Generalizable Analysis of Human Authorial Style by Leveraging All Transformer Layers
    Milad Alshomary, Nikhil Reddy Varimalla, Vishal Anand, Smaranda Muresan, Kathleen McKeown
  • When Long Helps Short: How Context Length in Supervised Fine-tuning Affects Behavior of Large Language Models
    Yingming Zheng, Hanqi Li, Lu Chen, Kai Yu
  • A Case Against Implicit Standards: Homophone Normalization in Machine Translation for Languages that use the Ge’ez Script.
    Hellina Hailu Nigatu, Atnafu Lambebo Tonja, Henok Biadglign Ademtew, Hizkiel Mitiku Alemayehu, Negasi Haile Abadi, Tadesse Destaw Belay, Seid Muhie Yimam
  • Evaluating Language Translation Models by Playing Telephone
    Syeda Jannatus Saba, Steven Skiena
  • Doubling Your Data in Minutes: Ultra-fast Tabular Data Generation via LLM-Induced Dependency Graphs
    Shuo Yang, Zheyu Zhang, Bardh Prenkaj, Gjergji Kasneci
  • SPaRC: A Spatial Pathfinding Reasoning Challenge
    Lars Benedikt Kaesberg, Jan Philip Wahle, Terry Ruas, Bela Gipp
  • Primus: A Pioneering Collection of Open-Source Datasets for Cybersecurity LLM Training
    Yao-Ching Yu, Tsun-Han Chiang, Cheng-Wei Tsai, Chien-Ming Huang, Wen-Kwang Tsao
  • Bit-Flip Error Resilience in LLMs: A Comprehensive Analysis and Defense Framework
    Yuhang Chen, Zhen Tan, AJAY KUMAR JAISWAL, Huaizhi Qu, Xinyu Zhao, Qi Lin, Yu Cheng, Andrew Kwong, Zhichao Cao, Tianlong Chen
  • Towards Faithful Natural Language Explanations: A Study Using Activation Patching in Large Language Models
    Yeo Wei Jie, Ranjan Satapathy, Erik Cambria
  • Calibrating LLM Confidence by Probing Perturbed Representation Stability
    Reza Khanmohammadi, Erfan Miahi, Mehrsa Mardikoraem, Simerjot Kaur, Ivan Brugere, Charese Smiley, Kundan S Thind, Mohammad M. Ghassemi
  • SATER: A Self-Aware and Token-Efficient Approach to Routing and Cascading
    Yuanzhe Shen, Yide Liu, Zisu Huang, Ruicheng Yin, Xiaoqing Zheng, Xuanjing Huang
  • DSG-MCTS: A Dynamic Strategy-Guided Monte Carlo Tree Search for Diversified Reasoning in Large Language Models
    Rui Ha, Chaozhuo Li, Rui Pu, Litian Zhang, Xi Zhang, Sen Su
  • CIFLEX: Contextual Instruction Flow for Sub-task Execution in Multi-Turn Interactions with a Single On-Device LLM
    Juntae Lee, Jihwan Bang, Seunghan Yang, Simyung Chang
  • On the Role of Model Prior in Real-World Inductive Reasoning
    Zhuo Liu, Ding Yu, Hangfeng He
  • Viability of Machine Translation for Healthcare in Low-Resourced Languages
    Hellina Hailu Nigatu, Nikita Mehandru, Negasi Haile Abadi, Blen Gebremeskel, Ahmed Alaa, Monojit Choudhury
  • Latent Inter-User Difference Modeling for LLM Personalization
    Yilun Qiu, Tianhao Shi, Xiaoyan Zhao, Fengbin ZHU, Yang Zhang, Fuli Feng
  • IG-Pruning: Input-Guided Block Pruning for Large Language Models
    Kangyu Qiao, Shaolei Zhang, Yang Feng
  • Are Checklists Really Useful for Automatic Evaluation of Generative Tasks?
    Momoka Furuhashi, Kouta Nakayama, Takashi Kodama, Saku Sugawara
  • Measuring the Effect of Disfluency in Multilingual Knowledge Probing Benchmarks
    Kirill Semenov, Rico Sennrich
  • Knowledge Editing through Chain-of-Thought
    Changyue Wang, Weihang Su, Qingyao Ai, Yichen Tang, Yiqun LIU
  • SelfRACG: Enabling LLMs to Self-Express and Retrieve for Code Generation
    Qian Dong, Jia Chen, Qingyao Ai, Hongning Wang, Haitao Li, YIWU, Yao Hu, Yiqun LIU, Shaoping Ma
  • Probing Logical Reasoning of MLLMs in Scientific Diagrams
    Yufei Wang, Adriana Kovashka
  • AdamS: Momentum Itself Can Be A Normalizer for LLM Pretraining and Post-training
    Huishuai Zhang, Bohan Wang, Luoxin Chen
  • Demystifying Synthetic Data in LLM Pre-training: A Systematic Study of Scaling Laws, Benefits, and Pitfalls
    Feiyang Kang, Newsha Ardalani, Michael Kuchnik, Youssef Emad, Mostafa Elhoushi, Shubhabrata Sengupta, Shang-Wen Li, Ramya Raghavendra, Ruoxi Jia, Carole-Jean Wu
  • Static or Dynamic: Towards Query-Adaptive Token Selection for Video Question Answering
    Yumeng Shi, Quanyu Long, Wenya Wang
  • DischargeSim: A Simulation Benchmark for Educational Doctor–Patient Communication at Discharge
    Zonghai Yao, Michael Sun, Won Seok Jang, SUNJAE KWON, Soie Kwon, hong yu
  • Can Vision-Language Models Solve Visual Math Equations?
    Monjoy Narayan Choudhury, Junling Wang, Yifan Hou, Mrinmaya Sachan
  • From Scores to Steps: Diagnosing and Improving LLM Performance in Evidence-Based Medical Calculations
    Benlu Wang, Iris Xia, Yifan Zhang, Junda Wang, Feiyun Ouyang, Shuo Han, hong yu, Zonghai Yao
  • Bridging External and Parametric Knowledge: Mitigating Hallucination of LLMs with Shared-Private Semantic Synergy in Dual-Stream Knowledge
    Yi Sui, Chaozhuo Li, Chen Zhang, Dawei Song, Qiuchi Li
  • Deep Associations, High Creativity: A Simple yet Effective Metric for Evaluating Large Language Models
    Ziliang Qiu, Renfen Hu
  • Identifying Unlearned Data in LLMs via Membership Inference Attacks
    Advit Deepak, Megan Mou, Jing Huang, Diyi Yang
  • Feature Extraction and Steering for Enhanced Chain-of-Thought Reasoning in Language Models
    Zihao Li, Xu Wang, Yuzhe YANG, Ziyu Yao, Haoyi Xiong, Mengnan Du
  • LLMs cannot spot math errors, even when allowed to peek into the solution
    KV Aditya Srivatsa, Kaushal Kumar Maurya, Ekaterina Kochmar
  • Can LLMs be Good Graph Judge for Knowledge Graph Construction?
    Haoyu Huang, Chong Chen, Zeang Sheng, Yang Li, Wentao Zhang
  • NeuroAda: Activating Each Neuron’s Potential for Parameter-Efficient Fine-Tuning
    Zhi Zhang, Yixian Shen, Congfeng Cao, Ekaterina Shutova
  • NileChat: Towards Linguistically Diverse and Culturally Aware LLMs for Local Communities
    Abdellah EL MEKKI, Houdaifa Atou, OMER NACAR, Shady Shehata, Muhammad Abdul-Mageed
  • A Computational Simulation of Language Production in First Language Acquisition
    Yuan Gao
  • Long-Form Information Alignment Evaluation Beyond Atomic Facts
    Danna Zheng, Mirella Lapata, Jeff Z. Pan
  • Voice of a Continent: Mapping Africa’s Speech Technology Frontier
    AbdelRahim A. Elmadany, Sang Yun Kwon, Hawau Olamide Toyin, Alcides Alcoba Inciarte, Hanan Aldarmaki, Muhammad Abdul-Mageed
  • Cache-Efficient Posterior Sampling for Reinforcement Learning with LLM-Derived Priors Across Discrete and Continuous Domains
    Ibne Farabi Shihab, Sanjeda Akter, Anuj Sharma
  • Circuit Complexity Bounds for RoPE-based Transformer Architecture
    Bo Chen, Xiaoyu Li, Yingyu Liang, Jiangxuan Long, Zhenmei Shi, Zhao Song, Jiahao Zhang
  • Efficient Unstructured Pruning of Mamba State-Space Models for Resource-Constrained Environments
    Ibne Farabi Shihab, Sanjeda Akter, Anuj Sharma
  • Towards Infinite-Long Prefix in Transformer
    Yingyu Liang, Zhenmei Shi, Zhao Song, Chiwun Yang
  • LATTE: Learning to Think with Vision Specialists
    Zixian Ma, Jianguo Zhang, Zhiwei Liu, Jieyu Zhang, Juntao Tan, Manli Shu, Juan Carlos Niebles, Shelby Heinecke, Huan Wang, Caiming Xiong, Ranjay Krishna, silvio savarese
  • SUA: Stealthy Multimodal Large Language Model Unlearning Attack
    Xianren Zhang, Hui Liu, Delvin Ce Zhang, Xianfeng Tang, Qi He, Dongwon Lee, Suhang Wang
  • ResFormer: All-Time Reservoir Memory for Long Sequence Classification
    Hongbo Liu, Jia Xu
  • Back Attention: Understanding and Enhancing Multi-Hop Reasoning in Large Language Models
    Zeping Yu, Yonatan Belinkov, Sophia Ananiadou
  • Interdisciplinary Research in Conversation: A Case Study in Computational Morphology for Language Documentation
    Enora Rice, Katharina von der Wense, Alexis Palmer
  • Analyzing Uncertainty of LLM-as-a-Judge: Interval Evaluations with Conformal Prediction
    Huanxin Sheng, Xinyi Liu, Hangfeng He, Jieyu Zhao, Jian Kang
  • AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
    Junyu Zhang, Runpei Dong, Han Wang, Xuying Ning, Haoran Geng, Peihao Li, Xialin He, Yutong Bai, Jitendra Malik, Saurabh Gupta, Huan Zhang
  • Dual-Path Dynamic Fusion with Learnable Query for Multimodal Sentiment Analysis
    Miao Zhou, Lina Yang, Thomas Wu, Dongnan Yang, Xinru Zhang
  • CaKE: Circuit-aware Editing Enables Generalizable Knowledge Learners
    Yunzhi Yao, Jizhan Fang, Jia-Chen Gu, Ningyu Zhang, Shumin Deng, Huajun Chen, Nanyun Peng
  • DEL-ToM: Inference-Time Scaling for Theory-of-Mind Reasoning via Dynamic Epistemic Logic
    Yuheng Wu, Jianwen Xie, Denghui Zhang, Zhaozhuo Xu
  • Collaborative Beam Search: Enhancing LLM Reasoning via Collective Consensus
    Yangyifan Xu, Shuo Ren, Jiajun Zhang
  • Deriving Strategic Market Insights with Large Language Models: A Benchmark for Forward Counterfactual Generation
    Keane Ong, Rui Mao, Deeksha varshney, Paul Pu Liang, Erik Cambria, Gianmarco Mengaldo
  • Towards Statistical Factuality Guarantee for Large Vision-Language Models
    Zhuohang Li, Chao Yan, Nicholas J Jackson, Wendi Cui, Bo Li, Jiaxin Zhang, Bradley A. Malin
  • Unlearning vs. Obfuscation: Are We Truly Removing Knowledge?
    Guangzhi Sun, Potsawee Manakul, Xiao Zhan, Mark Gales
  • Reward-Shifted Speculative Sampling Is An Efficient Test-Time Weak-to-Strong Aligner
    Bolian Li, Yanran Wu, Xinyu Luo, Ruqi Zhang
  • Stimulate the Critical Thinking of LLMs via Debiasing Discussion
    Ruiyu Xiao, Lei Wu, Yuanxing Liu, Weinan Zhang, Ting Liu
  • Toward Multi-Session Personalized Conversation: A Large-Scale Dataset and Hierarchical Tree Framework for Implicit Reasoning
    Xintong Li, Jalend Bantupalli, Ria Dharmani, Yuwei Zhang, Jingbo Shang
  • Improving Instruct Models for Free: A Study on Partial Adaptation
    Ozan Irsoy, Pengxiang Cheng, Jennifer L Chen, Daniel Preotiuc-Pietro, Shiyue Zhang, Duccio Pappadopulo
  • CoMMIT: Coordinated Multimodal Instruction Tuning
    Xintong Li, Junda Wu, Tong Yu, Rui Wang, Yu Wang, Xiang Chen, Jiuxiang Gu, Lina Yao, Julian McAuley, Jingbo Shang
  • Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge
    Tianhao Wu, Weizhe Yuan, Olga Golovneva, Jing Xu, Yuandong Tian, Jiantao Jiao, Jason E Weston, Sainbayar Sukhbaatar
  • AnyMAC: Cascading Flexible Multi-Agent Collaboration via Next-Agent Prediction
    Song Wang, Zhen Tan, Zihan Chen, Shuang Zhou, Tianlong Chen, Jundong Li
  • A Good Plan is Hard to Find: Aligning Models with Preferences is Misaligned with What Helps Users
    Nishant Balepur, Matthew Shu, Yoo Yeon Sung, Seraphina Goldfarb-Tarrant, Shi Feng, Fumeng Yang, Rachel Rudinger, Jordan Lee Boyd-Graber
  • Words Like Knives: Backstory-Personalized Modeling and Detection of Violent Communication
    Jocelyn J Shen, Akhila Yerukola, Xuhui Zhou, Cynthia Breazeal, Maarten Sap, Hae Won Park
  • Separate the Wheat from the Chaff: Winnowing Down Divergent Views in Retrieval Augmented Generation
    Song Wang, Zihan Chen, Peng Wang, Zhepei Wei, Zhen Tan, Yu Meng, Cong Shen, Jundong Li
  • Cognitive Linguistic Identity Fusion Score (CLIFS): A Scalable Cognition‑Informed Approach to Quantifying Identity Fusion from Text
    Devin R. Wright, Jisun An, Yong-Yeol Ahn
  • SilVar: Speech-Driven Multimodal Model for Reasoning Visual Question Answering and Object Localization
    Tan-Hanh Pham, Le Hoang Nam, Phu-Vinh Nguyen, Chris Ngo, Truong-Son Hy
  • CEMTM: Contextual Embedding-based Multimodal Topic Modeling
    Amirhossein Abaskohi, Raymond Li, Chuyuan Li, Shafiq Joty, Giuseppe Carenini
  • RedHerring Attack: Testing the Reliability of Attack Detection
    Jonathan Rusert
  • Modeling Bottom-up Information Quality during Language Processing
    Cui Ding, Yanning Yin, Lena Ann Jäger, Ethan Wilcox
  • Data Drives Unstable Hierarchical Generalization in LMs
    Tian Qin, Naomi Saphra, David Alvarez-Melis
  • EmoAgent: Assessing and Safeguarding Human-AI Interaction for Mental Health Safety
    Jiahao Qiu, Yinghui He, Xinzhe Juan, Yimin Wang, Yuhan Liu, Zixin Yao, Yue Wu, xun jiang, Ling Yang, Mengdi Wang
  • Polysemantic Dropout: Conformal OOD Detection for Specialized LLMs
    Ayush Gupta, Ramneet Kaur, Anirban Roy, Adam D. Cobb, Rama Chellappa, Susmit Jha
  • Facilitating Cognitive Accessibility with LLMs: A Multi-Task Approach to Easy-to-Read Text Generation
    François Ledoyen, Gaël Dias, Jeremie Pantin, Fabrice Maurel, Alexis Lechervy, Youssef Chahir
  • D-CoDe: Scaling Image-Pretrained VLMs to Video via Dynamic Compression and Question Decomposition
    Yiyang Huang, Yizhou Wang, Yun Fu
  • ReEvalMed: Rethinking Medical Report Evaluation by Aligning Metrics with Real-World Clinical Judgment
    Ruochen Li, Jun Li, Bailiang Jian, Kun yuan, Youxiang Zhu
  • MultiMed-ST: Large-scale Many-to-many Multilingual Medical Speech Translation
    Khai Le-Duc, Tuyen Tran, Bach Phan Tat, Nguyen Kim Hai Bui, Quan Dang Anh, Hung-Phong Tran, Thanh Thuy Nguyen, Ly Nguyen, Tuan Minh Phan, Thi Thu Phuong Tran, Chris Ngo, Khanh Xuan Nguyen, Thanh Nguyen-Tang
  • Beyond Checkmate: Exploring the Creative Choke Points for AI Generated Texts
    Nafis Irtiza Tripto, Saranya Venkatraman, Mahjabin Nahar, Dongwon Lee
  • MoR: Better Handling Diverse Queries with a Mixture of Sparse, Dense, and Human Retrievers
    Jushaan Singh Kalra, Xinran Zhao, To Eun Kim, Fengyu Cai, Fernando Diaz, Tongshuang Wu
  • Learning Contextual Retrieval for Robust Conversational Search
    Seunghan Yang, Juntae Lee, Jihwan Bang, Kyuhong Shim, Minsoo Kim, Simyung Chang
  • LIDDIA: Language-based Intelligent Drug Discovery Agent
    Reza Averly, Frazier N. Baker, Xia Ning
  • Agentic-R1: Distilled Dual-Strategy Reasoning
    Weihua Du, Pranjal Aggarwal, Sean Welleck, Yiming Yang
  • Proactive Assistant Dialogue Generation from Streaming Egocentric Videos
    Yichi Zhang, Xin Luna Dong, Zhaojiang Lin, Andrea Madotto, Anuj Kumar, Babak Damavandi, Joyce Chai, Seungwhan Moon
  • Should I Share this Translation? Evaluating Quality Feedback for User Reliance on Machine Translation
    Dayeon Ki, Kevin Duh, Marine Carpuat
  • ChartGaze: Enhancing Chart Understanding in LVLMs with Eye-Tracking Guided Attention Refinement
    Ali Salamatian, Amirhossein Abaskohi, Wan-Cyuan Fan, Mir Rayat Imtiaz Hossain, Leonid Sigal, Giuseppe Carenini
  • LogiCoL: Logically-Informed Contrastive Learning for Set-based Dense Retrieval
    Yanzhen Shen, Sihao Chen, Xueqiang Xu, Yunyi Zhang, Chaitanya Malaviya, Dan Roth
  • ModalPrompt: Towards Efficient Multimodal Continual Instruction Tuning with Dual-Modality Guided Prompt
    Fanhu Zeng, Fei Zhu, Haiyang Guo, Xu-Yao Zhang, Cheng-Lin Liu
  • Skip-Thinking: Chunk-wise Chain-of-Thought Distillation Enable Smaller Language Models to Reason Better and Faster
    Xiaoshu Chen, Sihang Zhou, KE LIANG, Xiaoyu Sun, Xinwang Liu
  • Can an Individual Manipulate the Collective Decisions of Multi-Agents?
    Fengyuan Liu, Rui Zhao, Shuo Chen, Guohao Li, Philip Torr, Lei Han, Jindong Gu
  • Toxicity Red-Teaming: Benchmarking LLM Safety in Singapore’s Low-Resource Languages
    Yujia Hu, Ming Shan Hee, Preslav Nakov, Roy Ka-Wei Lee
  • Improving Clustering with Positive Pairs Generated from LLM-Driven Labels
    Xiaotong Zhang, Ying Li
  • Gamma-Guard: Lightweight Residual Adapters for Robust Guardrails in Large Language Models
    Lijia Lv, Yuanshu Zhao, Guan Wang, Xuehai Tang, Wen Jie, Jizhong Han, Songlin Hu
  • Facilitating Long Context Understanding via Supervised Chain-of-Thought Reasoning
    Jingyang Lin, Andy Wong, Tian Xia, Shenghua He, Hui Wei, Mei Han, Jiebo Luo
  • Dynamic Energy-Based Contrastive Learning with Multi-Stage Knowledge Verification for Event Causality Identification
    Ya Su, Hu zhang, Yue Fan, Guangjun Zhang, YuJie Wang, Ru Li, Hongye Tan
  • ICG: Improving Cover Image Generation via MLLM-based Prompting and Personalized Preference Alignment
    Zhipeng Bian, Jieming Zhu, Qijiong Liu, Wang Lin, Guohao Cai, Zhaocheng Du, Jiacheng Sun, Zhou Zhao, Zhenhua Dong
  • From Long to Lean: Performance-aware and Adaptive Chain-of-Thought Compression via Multi-round Refinement
    JianZhi Yan, Le Liu, Youcheng Pan, Shiwei Chen, Zike Yuan, Yang Xiang, Buzhou Tang
  • A Symbolic Adversarial Learning Framework for Evolving Fake News Generation and Detection
    Chong Tian, Qirong Ho, Xiuying Chen
  • RareSyn: Health Record Synthesis for Rare Disease Diagnosis
    Huimin WANG, Yutian Zhao, Yefeng Zheng, Xian Wu
  • Sticker-TTS: Learn to Utilize Historical Experience with a Sticker-driven Test-Time Scaling Framework
    Jie Chen, Jinhao Jiang, Yingqian Min, Zican Dong, Shijie Wang, Xin Zhao, Ji-Rong Wen
  • CMHG: A Dataset and Benchmark for Headline Generation of Minority Languages in China
    Guixian Xu, Zeli Su, Ziyin Zhang, Jianing Liu, Xu Han, Ting Zhang, Yushuang Dong
  • Understanding the Information Propagation Effects of Communication Topologies in LLM-based Multi-Agent Systems
    Xu Shen, Yixin Liu, Yiwei Dai, Yili Wang, Rui Miao, Yue Tan, Shirui Pan, Xin Wang
  • Boosting Data Utilization for Multilingual Dense Retrieval
    Chao Huang, Fengran Mo, Yufeng Chen, Changhao Guan, Zhenrui Yue, Xinyu Wang, Jinan Xu, Kaiyu Huang
  • Self-Augmented Preference Alignment for Sycophancy Reduction in LLMs
    Chien Hung Chen, Hen-Hsen Huang, Hsin-Hsi Chen
  • TP-RAG: Benchmarking Retrieval-Augmented Large Language Model Agents for Spatiotemporal-Aware Travel Planning
    Hang Ni, Fan Liu, Xinyu Ma, Lixin Su, Shuaiqiang Wang, Dawei Yin, Hui Xiong, Hao Liu
  • Recontextualizing Revitalization: A Mixed Media Approach to Reviving the Nüshu Language
    Ivory Yang, Xiaobo Guo, Yuxin Wang, Hefan Zhang, Yaning Jia, William Dinauer, Soroush Vosoughi
  • Towards Advanced Mathematical Reasoning for LLMs via First-Order Logic Theorem Proving
    Chuxue Cao, Mengze Li, Juntao Dai, Jinluan Yang, Zijian Zhao, Shengyu Zhang, Weijie Shi, Chengzhong LIU, Sirui Han, Yike Guo
  • From Tens of Hours to Tens of Thousands: Scaling Back-Translation for Speech Recognition
    Tianduo Wang, Lu Xu, Wei Lu, Shanbo Cheng
  • CityEQA: A Hierarchical LLM Agent on Embodied Question Answering Benchmark in City Space
    Yong Zhao, Kai Xu, Zhengqiu Zhu, Yue Hu, Zhiheng Zheng, Yingfeng Chen, Yatai Ji, Chen Gao, Yong Li, Jincai Huang
  • Mitigating Hallucinations in Vision-Language Models through Image-Guided Head Suppression
    Sreetama Sarkar, Yue Che, Alex Gavin, Peter Anthony Beerel, Souvik Kundu
  • Examining False Positives under Inference Scaling for Mathematical Reasoning
    Yu Wang, Nan Yang, Liang Wang, Furu Wei, Fuli Feng
  • Translationese-index: Using Likelihood Ratios for Graded and Generalizable Measurement of Translationese
    Yikang Liu, Wanyang Zhang, Yiming Wang, Jialong Tang, Pei Zhang, Baosong Yang, Fei Huang, Rui Wang, Hai Hu
  • Exploring the Limitations of Mamba in COPY and CoT Reasoning
    Ruifeng Ren, Zhicong Li, Yong Liu
  • ProcWorld: Benchmarking Large Model Planning in Reachability-Constrained Environments
    Dong Wang, Xinghang Li, Zhengshen Zhang, Jirong Liu, Xiao Ma, Hanbo Zhang, Tao Kong, Huaping Liu
  • R2I-Bench: Benchmarking Reasoning-Driven Text-to-Image Generation
    Kaijie Chen, Zihao Lin, Zhiyang Xu, Ying Shen, Yuguang Yao, Joy Rimchala, Jiaxin Zhang, Lifu Huang
  • Can GRPO Boost Complex Multimodal Table Understanding?
    Xiaoqiang Kang, Shengen Wu, Zimu Wang, Yilin Liu, Xiaobo Jin, Kaizhu Huang, Wei Wang, Yutao Yue, Xiaowei Huang, Qiufeng Wang
  • MoMoE: Mixture of Moderation Experts Framework for AI-Assisted Online Governance
    Agam Goyal, Xianyang Zhan, Yilun Chen, Koustuv Saha, Eshwar Chandrasekharan
  • Following the Autoregressive Nature of LLM Embeddings via Compression and Alignment
    Jingcheng Deng, Zhongtao Jiang, Liang Pang, Zihao Wei, Liwei Chen, Kun Xu, Yang Song, Huawei Shen, Xueqi Cheng
  • Evaluating LLM-Generated Diagrams as Graphs
    Chumeng Liang, Jiaxuan You
  • Breaking Bad Tokens: Detoxification of LLMs Using Sparse Autoencoders
    Agam Goyal, Vedant Rathi, William Yeh, Yian Wang, Yuen Chen, Hari Sundaram
  • VCSearch: Bridging the Gap Between Well-Defined and Ill-Defined Problems in Mathematical Reasoning
    Shi-Yu Tian, Zhi Zhou, Kun-Yang Yu, Ming Yang, Lin-Han Jia, Lan-Zhe Guo, Yu-Feng Li
  • How do autoregressive transformers solve full addition?
    WANG PEIXU, Chen Yu, Yu Ming, Cheng Xiang
  • MAIN: Mutual Alignment Is Necessary for instruction tuning
    Fanyi Yang, Jianfeng Liu, Xin Zhang, Haoyu Liu, Xixin Cao, Yuefeng Zhan, Hao Sun, Weiwei Deng, Feng Sun, Qi Zhang
  • Expanding before Inferring: Enhancing Factuality in Large Language Models through Premature Layers Interpolation
    Dingwei Chen, Ziqiang Liu, Feiteng Fang, Chak Tou Leong, Shiwen Ni, Ahmadreza Argha, Hamid Alinejad-Rokny, Min Yang, Chengming Li
  • DeepWell-Adol: A Scalable Expert-Based Dialogue Corpus for Adolescent Positive Mental Health and Wellbeing Promotion
    Wenyu Qiu, Yuxiong Wang, Jiajun Tan, Hanchao Hou, Qinda Liu, WEI YAO, Shiguang NI
  • Data to Defense: The Role of Curation in Aligning Large Language Models Against Safety Compromise
    Xiaoqun Liu, Jiacheng Liang, Luoxi Tang, Muchao Ye, Weicheng Ma, Zhaohan Xi
  • Speculative Safety-Aware Decoding
    Xuekang Wang, Shengyu Zhu, Xueqi Cheng
  • PanicToCalm: A Proactive Counseling Agent for Panic Attacks
    Jihyun Lee, Yejin Min, San Kim, Yejin Jeon, Sung Jun Yang, Hyounghun Kim, Gary Lee
  • CoPL: Collaborative Preference Learning for Personalizing LLMs
    Youngbin Choi, Seunghyuk Cho, Minjong Lee, MoonJeong Park, Yesong Ko, Jungseul Ok, Dongwoo Kim
  • Dynamic Collaboration of Multi-Language Models based on Minimal Complete Semantic Units
    Chao Hao, Zezheng Wang, Yanhua Huang, Ruiwen Xu, Wenzhe Niu, Xin Liu, Zitong YU
  • AI Chatbots as Professional Service Agents: Developing a Professional Identity
    Wenwen Li, Kangwei Shi, YidongChai
  • DeepResonance: Enhancing Multimodal Music Understanding via Music-centric Multi-way Instruction Tuning
    Zhuoyuan Mao, Mengjie Zhao, Qiyu Wu, Hiromi Wakaki, Yuki Mitsufuji
  • Advancing Oversight Reasoning across Languages for Audit Sycophantic Behaviour via X-Agent
    Leonardo Ranaldi, Giulia Pucci
  • CAFE: Retrieval Head-based Coarse-to-Fine Information Seeking to Enhance Multi-Document QA Capability
    Han Peng, Jinhao Jiang, Zican Dong, Xin Zhao, LEI FANG
  • SSA-COMET: Do LLMs Outperform Learned Metrics in Evaluating MT for Under-Resourced African Languages?
    Senyu Li, Jiayi Wang, Felermino D. M. A. Ali, Colin Cherry, Daniel Deutsch, Eleftheria Briakou, Rui Sousa-Silva, Henrique Lopes Cardoso, Pontus Stenetorp, David Ifeoluwa Adelani
  • FaithUn: Toward Faithful Forgetting in Language Models by Investigating the Interconnectedness of Knowledge
    Nakyeong Yang, Minsung Kim, Seunghyun Yoon, Joongbo Shin, Kyomin Jung
  • Calibrating Pseudo-Labeling with Class Distribution for Semi-supervised Text Classification
    Weiyi Yang, Richong Zhang, Junfan Chen, Jiawei Sheng
  • Coarse-to-Fine Grounded Memory for LLM Agent Planning
    Wei Yang, Jinwei Xiao, Hongming Zhang, Qingyang Zhang, Yanna Wang, bo xu
  • From A and B to A+B: Can Large Language Models Solve Compositional Math Problems?
    Xisheng Xiao, Hanlin Zhao
  • Sycophancy Mitigation Through Reinforcement Learning with Uncertainty-Aware Adaptive Reasoning Trajectories
    Mohammad Beigi, Ying Shen, Parshin Shojaee, Qifan Wang, Zichao Wang, Chandan K. Reddy, Ming Jin, Lifu Huang
  • SimVBG: Simulating Individual Values by Backstory Generation
    Bangde Du, Ziyi Ye, Zhijing Wu, Monika A. Jankowska, Shuqi Zhu, Qingyao Ai, Yujia Zhou, Yiqun LIU
  • EvolveSearch: An Iterative Self-Evolving Search Agent
    Ding-Chu Zhang, Yida Zhao, Jialong Wu, Liwen Zhang, Baixuan Li, Wenbiao Yin, Yong Jiang, Yu-Feng Li, Kewei Tu, Pengjun Xie, Fei Huang
  • Syntax-Aware Retrieval Augmentation for Neural Symbolic Regression
    Canmiao Zhou, Han Huang
  • Merge then Realign: Simple and Effective Modality-Incremental Continual Learning for Multimodal LLMs
    Dingkun Zhang, Shuhan Qi, Xinyu Xiao, Kehai Chen, Xuan Wang
  • Graceful Forgetting in Generative Language Models
    Chunyang Jiang, Chi-Min Chan, Yiyang Cai, Yulong Liu, Wei Xue, Yike Guo
  • Answering Narrative-Driven Recommendation Queries via a Retrieve–Rank Paradigm and the OCG-Agent
    Yunxiao Shi, Haoning Shang, Xing Zi, Wujiang Xu, Yue Feng, Min Xu
  • Direct Value Optimization: Improving Chain-of-Thought Reasoning in LLMs with Refined Values
    Hongbo Zhang, Han Cui, Guangsheng Bao, Linyi Yang, Jun Wang, Yue Zhang
  • Jailbreak-Tuning: Models Efficiently Learn Jailbreak Susceptibility
    Brendan Murphy, Dillon Bowen, Shahrad Mohammadzadeh, Tom Tseng, Julius Broomfield, Adam Gleave, Kellin Pelrine
  • Neural Topic Modeling via Contextual and Graph Information Fusion
    Jiyuan Liu, Jiaxing Yan, Chunjiang Zhu, Xingyu Liu, Li Qing, Yanghui Rao
  • CARE: A Disagreement Detection Framework with Concept Alignment and Reasoning Enhancement
    Jiyuan Liu, Jielin Song, Yunhe Pang, Zhiyu Shen, Yanghui Rao
  • Beyond Task-Oriented and Chitchat Dialogues: Proactive and Transition-Aware Conversational Agents
    Yejin Yoon, Yuri Son, Namyeong So, Minseo Kim, Minsoo Cho, Chanhee Park, Seungshin Lee, Taeuk Kim
  • LightThinker: Thinking Step-by-Step Compression
    Jintian Zhang, Yuqi Zhu, Mengshu Sun, Yujie Luo, Shuofei Qiao, Lun Du, Da Zheng, Huajun Chen, Ningyu Zhang
  • How Is LLM Reasoning Distracted by Irrelevant Context? An Analysis Using a Controlled Benchmark
    Minglai Yang, Ethan Huang, Liang Zhang, Mihai Surdeanu, William Yang Wang, Liangming Pan
  • Investigating Pedagogical Teacher and Student LLM Agents: Genetic Adaptation Meets Retrieval-Augmented Generation Across Learning Styles
    Debdeep Sanyal, Agniva Maiti, Umakanta Maharana, Dhruv Kumar, Ankur Mali, C. Lee Giles, Murari Mandal
  • GeoEdit: Geometric Knowledge Editing for Large Language Models
    Yujie Feng, Li-Ming Zhan, ZEXIN LU, Yongxin Xu, Xu Chu, Yasha Wang, Jiannong Cao, Philip S. Yu, Xiao-Ming Wu
  • A Generative Pre-Trained Language Model for Channel Prediction in Wireless Communications Systems
    Bo Lin, Huanming Zhang, Yuhua Jiang, Yucong Wang, Tengyu Zhang, Shaoqiang Yan, Hongyao Li, Yihong Liu, Feifei Gao
  • AIMMerging: Adaptive Iterative Model Merging Using Training Trajectories for Language Model Continual Learning
    Yujie Feng, Jian Li, Xiaoyu DONG, Pengfei Xu, Xiaohui Zhou, Yujia Zhang, ZEXIN LU, Yasha Wang, Alan Zhao, Xu Chu, Xiao-Ming Wu
  • R-PRM: Reasoning-Driven Process Reward Modeling
    Shuaijie She, Junxiao Liu, Yifeng Liu, Jiajun Chen, Xin Huang, Shujian Huang
  • RLAE: Reinforcement Learning-Assisted Ensemble for LLMs
    Yuqian Fu, Yuanheng Zhu, Jiajun Chai, Guojun Yin, Wei Lin, Qichao Zhang, Dongbin Zhao
  • Do Large Language Models Truly Grasp Addition? A Rule-Focused Diagnostic Using Two-Integer Arithmetic
    Yang Yan, Yu Lu, Renjun Xu, Zhenzhong Lan
  • AskToAct: Enhancing LLMs Tool Use via Self-Correcting Clarification
    Xuan Zhang, Yongliang Shen, Zhe Zheng, Linjuan Wu, Wenqi Zhang, Yuchen Yan, Qiuying Peng, Jun Wang, Weiming Lu
  • START: Self-taught Reasoner with Tools
    Chengpeng Li, Mingfeng Xue, Zhenru Zhang, Jiaxi Yang, Beichen Zhang, Bowen Yu, Binyuan Hui, Junyang Lin, Xiang Wang, Dayiheng Liu
  • The Impact of Negated Text on Hallucination with Large Language Models
    Jaehyung Seo, Hyeonseok Moon, Heuiseok Lim
  • A Probabilistic Inference Scaling Theory for LLM Self-Correction
    Zhe Yang, Yichang Zhang, Yudong Wang, Ziyao Xu, Junyang Lin, Zhifang Sui
  • MentalGLM Series: Explainable Large Language Models for Mental Health Analysis on Chinese Social Media
    Wei Zhai, Nan Bai, Qing Zhao, Jianqiang Li, Fan Wang, Hongzhi Qi, Meng Jiang, Xiaoqin Wang, Bing Xiang Yang, Guanghui FU
  • Knowledge-Aware Co-Reasoning for Multidisciplinary Collaboration
    xurui li, wanghaijiao, Kaisong Song, Rui Zhu, Haixu Tang
  • Astra: Efficient Transformer Architecture and Contrastive Dynamics Learning for Embodied Instruction Following
    Yueen Ma, DaFeng Chi, Shiguang Wu, Yuecheng Liu, Yuzheng Zhuang, Irwin King
  • MAVL: A Multilingual Audio-Video Lyrics Dataset for Animated Song Translation
    Woohyun Cho, Youngmin Kim, Sunghyun Lee, Youngjae Yu
  • MuTIS: Enhancing Reasoning Efficiency through Multi Turn Intervention Sampling in Reinforcement Learning
    Wenshuo Zhao, Haoxing Zhai, Xinyu Qiu, Zhenting Qi, Shuhe Li, Linchao Zhu
  • PRIM: Towards Practical In-Image Multilingual Machine Translation
    Yanzhi Tian, Zeming Liu, Zhengyang Liu, Chong Feng, Xin Li, Heyan Huang, Yuhang Guo
  • Mind the Inclusivity Gap: Multilingual Gender-Neutral Translation Evaluation with mGeNTE
    beatrice savoldi, Giuseppe Attanasio, Eleonora Cupin, Eleni Gkovedarou, Janiça Hackenbuchner, Anne Lauscher, Matteo Negri, Andrea Piergentili, Manjinder Thind, Luisa Bentivogli
  • DiplomacyAgent: Do LLMs Balance Interests and Ethical Principles in International Events?
    Jianxiang Peng, Ling Shi, Xinwei Wu, Hanwen Zhang, Fujiang Liu, Haocheng Lyu, Deyi Xiong
  • DisLoRA: Task-specific Low-Rank Adaptation via Orthogonal Basis from Singular Value Decomposition
    She Yifei, Xinhao Wei, Yulong Wang
  • Unmasking Deceptive Visuals: Benchmarking Multimodal Large Language Models on Misleading Chart Question Answering
    Zixin CHEN, Sicheng Song, KaShun SHUM, Yanna Lin, Rui SHENG, Weiqi Wang, Huamin Qu
  • Textual Aesthetics in Large Language Models
    Lingjie Jiang, Shaohan Huang, Xun Wu, Furu Wei
  • Section-Level Simplification of Biomedical Abstracts
    Jan Bakker, Jaap Kamps
  • PoseStitch-SLT: Linguistically Inspired Pose-Stitching for End-to-End Sign Language Translation
    Abhinav Joshi, Vaibhav Sharma, Sanjeet Singh, Ashutosh Modi
  • Few-Shot Open-Set Classification via Reasoning-Aware Decomposition
    Avyav Kumar Singh, Helen Yannakoudakis
  • Translation in the Hands of Many: Centering Lay Users in Machine Translation Interactions
    beatrice savoldi, Alan Ramponi, Matteo Negri, Luisa Bentivogli
  • iTool: Reinforced Fine-Tuning with Dynamic Deficiency Calibration for Advanced Tool Use
    Yirong Zeng, Xiao Ding, Yuxian Wang, Weiwen Liu, Yutai Hou, Wu Ning, Xu Huang, Duyu Tang, Dandan Tu, Bing Qin, Ting Liu
  • Transplant Then Regenerate: A New Paradigm for Text Data Augmentation
    Guangzhan Wang, Hongyu Zhang, Beijun Shen, Xiaodong Gu
  • Compositional Generalisation for Explainable Hate Speech Detection
    Agostina Calabrese, Tom Sherborne, Björn Ross, Mirella Lapata
  • CCQA: Generating Question from Solution Can Improve Inference-Time Reasoning in SLMs
    Jinyoung Kim, Ji Won Yoon
  • TVQACML: Benchmarking Text-Centric Visual Question Answering in Multilingual Chinese Minority Languages
    shajiu, Mengxiao Zhu, Chong Feng, LAMA jIe
  • Transparent and Coherent Procedural Mistake Detection
    Shane Storks, Itamar Bar-Yossef, Yayuan Li, Zheyuan Zhang, Jason J Corso, Joyce Chai
  • Teaching Your Models to Understand Code via Focal Preference Alignment
    Jie Wu, Haoling Li, Xin Zhang, Xiao Liu, Yangyu Huang, Jianwen Luo, Yizhen Zhang, Zuchao Li, Ruihang Chu, Yujiu Yang, Scarlett Li
  • MoLoRAG: Bootstrapping Document Understanding via Multi-modal Logic-aware Retrieval
    Xixi Wu, Yanchao Tan, Nan Hou, Ruiyang Zhang, Hong Cheng
  • Vision-Free Retrieval: Rethinking Multimodal Search with Textual Scene Descriptions
    Ioanna Ntinou, ALEXANDROS XENOS, Yassine Ouali, Adrian Bulat, Georgios Tzimiropoulos
  • TableRAG: A Retrieval Augmented Generation Framework for Heterogeneous Document Reasoning
    Xiaohan Yu, Pu Jian, Chong Chen
  • Retrieval Enhanced Feedback via In-context Neural Error-book
    Jongyeop Hyun, Bumsoo Kim
  • Improve LLM-as-a-Judge Ability as a General Ability
    Jiachen Yu, Shaoning Sun, Xiaohui Hu, Jiaxu Yan, Kaidong Yu, Xuelong Li
  • G2: Guided Generation for Enhanced Output Diversity in LLMs
    Zhiwen Ruan, Yixia Li, Yefeng Liu, Yun Chen, Weihua Luo, Peng Li, Yang Liu, Guanhua Chen
  • ToolSafety: A Comprehensive Dataset for Enhancing Safety in LLM-Based Agent Tool Invocations
    Yuejin Xie, Youliang Yuan, Wenxuan Wang, Fan Mo, Jianmin Guo, Pinjia He
  • Learning to See through Sound: From VggCaps to Multi2Cap for Richer Automated Audio Captioning
    Sangyeon Cho, Mingi Kim, Jinkwon Hwang, Jaehoon Go, Minuk Ma, Sunjae Yoon, Junyeong Kim
  • Towards Optimal Evaluation Efficiency for Large Language Models
    Guohong Li, Deyi Xiong
  • MMAPG: A Training-Free Framework for Multimodal Multi-hop Question Answering via Adaptive Planning Graphs
    Yiheng Hu, Xiaoyang Wang, Qing Liu, Xiwei Xu, Qian Fu, Wenjie Zhang, Liming Zhu
  • Mixture-of-Clustered-Experts: Advancing Expert Specialization and Generalization in Instruction Tuning
    Sugyeong Eo, Jung Jun Lee, Chanjun Park, Heuiseok Lim
  • Process-Supervised Reinforcement Learning for Code Generation
    Yufan Ye, Ting Zhang, Wenbin Jiang, Hua Huang
  • MuCAL: Contrastive Alignment for Preference-Driven KG-to-Text Generation
    Yifei Song, Claire Gardent
  • Advancing Fine-Grained Visual Understanding with Multi-Scale Alignment in Multi-Modal Models
    Wei Wang, Zhaowei Li, Qi Xu, Linfeng Li, YiQing Cai, Botian Jiang, Hang Song, Xingcan Hu, Pengyu Wang, Li Xiao
  • Thought calibration: Efficient and confident test-time scaling
    Menghua Wu, Cai Zhou, Stephen Bates, Tommi Jaakkola
  • Can LLMs Reason Abstractly Over Math Word Problems Without CoT? Disentangling Abstract Formulation From Arithmetic Computation
    Ziling Cheng, Meng Cao, Leila Pishdad, Yanshuai Cao, Jackie CK Cheung
  • QCRD: Quality-guided Contrastive Rationale Distillation for Large Language Models
    Wei Wang, Zhaowei Li, Qi Xu, YiQing Cai, Hang Song, Qi Qi, Ran Zhou, Zhida Huang, Tao Wang, Li Xiao
  • SHARP: Steering Hallucination in LVLMs via Representation Engineering
    Junfei Wu, Yue Ding, Guofan Liu, Tianze Xia, Ziyue Huang, Dianbo Sui, Qiang Liu, Shu Wu, Liang Wang, Tieniu Tan
  • Think, Verbalize, then Speak: Bridging Complex Thoughts and Comprehensible Speech
    Sang Hoon Woo, Sehun Lee, Kang-wook Kim, Gunhee Kim
  • Warm Up Before You Train: Unlocking General Reasoning in Resource-Constrained Settings
    Safal Shrestha, Minwu Kim, Aadim Nepal, Anubhav Shrestha, Keith W. Ross
  • PPTAgent: Generating and Evaluating Presentations Beyond Text-to-Slides
    Hao Zheng, Xinyan Guan, Hao Kong, Wenkai Zhang, Jia Zheng, Weixiang Zhou, Hongyu Lin, Yaojie Lu, Xianpei Han, Le Sun
  • SWAM: Adaptive Sliding Window and Memory-Augmented Attention Model for Rumor Detection
    Mei Guo, Chen Chen, Chunyan Hou, Yike Wu, Xiaojie Yuan
  • HydraRAG: Structured Cross-Source Enhanced Large Language Model Reasoning
    Xingyu Tan, Xiaoyang Wang, Qing Liu, Xiwei Xu, Xin Yuan, Liming Zhu, Wenjie Zhang
  • VRoPE: Rotary Position Embedding for Video Large Language Models
    Zikang Liu, Longteng Guo, Yepeng Tang, Tongtian Yue, Junxian Cai, Kai Ma, Qingbin Liu, Xi Chen, Jing Liu
  • SciNLP: A Domain-Specific Benchmark for Full-Text Scientific Entity and Relation Extraction in NLP
    Decheng Duan, Jitong Peng, Yingyi Zhang, Chengzhi Zhang
  • Think and Recall: Layer-Level Prompting for Lifelong Model Editing
    Jinke Wang, Zenan Ying, Qi Liu, Wei Chen, Tong Xu, huijun hou, Zhi Zheng
  • SPIRIT: Patching Speech Language Models against Jailbreak Attacks
    Amirbek Djanibekov, Nurdaulet Mukhituly, Kentaro Inui, Hanan Aldarmaki, Nils Lukas
  • FIRE: Flexible Integration of Data Quality Ratings for Effective Pretraining
    Xu Liangyu, Xuemiao Zhang, Feiyu Duan, Sirui Wang, Rongxiang Weng, Jingang Wang, Xunliang Cai
  • Multi-Domain Explainability of Preferences
    Nitay Calderon, Liat Ein-Dor, Roi Reichart
  • Tuning Less, Prompting More: In-Context Preference Learning Pipeline for Natural Language Transformation
    Shuyun Yang, Yan Zhang, Zhengmao Ye, Lei Duan, Mingjie Tang
  • IL-PCSR: Legal Corpus for Prior Case and Statute Retrieval
    Shounak Paul, Dhananjay Ghumare, Pawan Goyal, Saptarshi Ghosh, Ashutosh Modi
  • ESGenius: Benchmarking LLMs on Environmental, Social, and Governance (ESG) and Sustainability Knowledge
    Chaoyue He, Xin Zhou, Yi Wu, Xinjia Yu, yan zhang, Lei Zhang, Di Wang, Shengfei Lyu, Hong Xu, Wang Xiaoqiao, Wei Liu, Chunyan Miao
  • How Sememic Components Can Benefit Link Prediction for Lexico-Semantic Knowledge Graphs?
    Hansi Wang, Yue Wang, Qiliang Liang, Yang Liu
  • WISE: Weak-Supervision-Guided Step-by-Step Explanations for Multimodal LLMs in Image Classification
    Yiwen Jiang, Deval Mehta, Siyuan Yan, Yaling Shen, Zimu Wang, Zongyuan Ge
  • Calibration Across Layers: Understanding Calibration Evolution in LLMs
    Abhinav Joshi, Areeb Ahmad, Ashutosh Modi
  • The discordance between embedded ethics and cultural inference in large language models
    Aida Ramezani, Yang Xu
  • SSA: Semantic Contamination of LLM-Driven Fake News Detection
    Cheng Xu, Nan Yan, Shuhao Guan, Yuke Mei, Tahar Kechadi
  • Logits-Based Finetuning
    Jingyao Li, Senqiao Yang, Sitong Wu, Han Shi, Chuanyang Zheng, Hong Xu, Jiaya Jia
  • STARE at the Structure: Steering ICL Exemplar Selection with Structural Alignment
    Jiaqian Li, Qisheng Hu, Jing Li, Wenya Wang
  • PPC-GPT: Federated Task-Specific Compression of Large Language Models via Pruning and Chain-of-Thought Distillation
    Tao Fan, GuoqiangMa, Yuanfeng SONG, Lixin Fan, Qiang Yang
  • Efficient Beam Search for Large Language Models Using Trie-Based Decoding
    Brian J Chan, Mao-xun Huang, Jui-Hung Cheng, Chao-Ting Chen, Hen-Hsen Huang
  • Power doesn’t reside in size: A Low Parameter Hybrid Language Model (HLM) for Sentiment Analysis in Code-mixed data
    Pavan Sai Balaga, Nagasamudram Karthik, Challa Vishwanath, Raksha Sharma, Rudra Murthy, Ashish Mittal
  • Evaluating Taxonomy Free Character Role Labeling (TF-CRL) in News Stories using Large Language Models
    David G Hobson, Derek Ruths, Andrew Piper
  • MIRROR: Multimodal Cognitive Reframing Therapy for Rolling with Resistance
    Subin Kim, Hoonrae Kim, Jihyun Lee, Yejin Jeon, Gary Lee
  • RETAIL: Towards Real-world Travel Planning for Large Language Models
    Bin Deng, Yizhe Feng, Zeming Liu, Qing Wei, Xiangrong Zhu, Shuai Chen, Yuanfang Guo, Yunhong Wang
  • Unraveling Interwoven Roles of Large Language Models in Authorship Privacy: Obfuscation, Mimicking, and Verification
    Tuc Nguyen, Yifan Hu, Thai Le
  • Reward Model Perspectives: Whose Opinions Do Reward Models Reward?
    Elle
  • FLRC: Fine-grained Low-Rank Compressor for Efficient LLM Inference
    Yu-Chen Lu, Chong-Yan Chen, Chi-Chih Chang, Yu-Fang Hu, Kai-Chiang Wu
  • Do You Know About My Nation? Investigating Multilingual Language Models’ Cultural Literacy Through Factual Knowledge
    Eshaan Tanwar, Anwoy Chatterjee, Michael Saxon, Alon Albalak, William Yang Wang, Tanmoy Chakraborty
  • CoEvo: Coevolution of LLM and Retrieval Model for Domain-Specific Information Retrieval
    Ang Li, Yiquan Wu, Yinghao Hu, Lizhi Qing, Shihang Wang, Chengyuan Liu, Tao Wu, Adam Jatowt, Ming Cai, Fei Wu, Kun Kuang
  • Conan-Embedding-v2: Training an LLM from Scratch for Text Embeddings
    Shiyu Li, Yang Tang, Ruijie Liu, Shi-Zhe Chen, Xi Chen
  • Vision-and-Language Navigation with Analogical Textual Descriptions in LLMs
    Yue Zhang, Tianyi Ma, Zun Wang, Yanyuan Qiao, Parisa Kordjamshidi
  • MUCAR: Benchmarking Multilingual Cross-Modal Ambiguity Resolution for Multimodal Large Language Models
    Xiaolong Wang, Zhaolu Kang, Wangyuxuan Zhai, Xinyue Lou, Yunghwei Lai, Ziyue Wang, Yawen Wang, Kaiyu Huang, Yile Wang, Peng Li, Yang Liu
  • Mind the Gap: How BabyLMs Learn Filler-Gap Dependencies
    Chi-Yun Chang, Xueyang Huang, Humaira Nasir, Shane Storks, Olawale Akingbade, Huteng Dai
  • Paths Not Taken: Understanding and Mending the Multilingual Factual Recall Pipeline
    Meng Lu, Ruochen Zhang, Carsten Eickhoff, Ellie Pavlick
  • BTC-SAM: Leveraging LLMs for Generation of Bias Test Cases for Sentiment Analysis Models
    Zsolt T. Kardkovács, LYNDA DJENNANE, Anna Field, Boualem Benatallah, Yacine GACI, Fabio Casati, Walid Gaaloul
  • Debate-to-Detect: Reformulating Misinformation Detection as a Real-World Debate with Large Language Models
    Chen Han, Wenzhen Zheng, Xijin Tang
  • Controllable Memorization in LLMs via Weight Pruning
    Chenjie Ni, Zhepeng Wang, Runxue Bao, Shangqian Gao, Yanfu Zhang
  • Tracing L1 Interference in English Learner Writing: A Longitudinal Corpus with Error Annotations
    Poorvi Acharya, J. Elizabeth Liebl, Dhiman Goswami, Kai North, Marcos Zampieri, Antonios Anastasopoulos
  • DCIS: Efficient Length Extrapolation of LLMs via Divide-and-Conquer Scaling Factor Search
    Lei Yang, Shaoyang Xu, Jianxiang Peng, shaolin Zhu, Deyi Xiong
  • Who is in the Spotlight: The Hidden Bias Undermining Multimodal Retrieval-Augmented Generation
    Jiayu Yao, Shenghua Liu, Yiwei Wang, Lingrui Mei, Baolong Bi, Yuyao Ge, Zhecheng Li, Xueqi Cheng
  • Let’s Play Across Cultures: A Large Multilingual, Multicultural Benchmark for Assessing Language Models’ Understanding of Sports
    Punit kumar singh, Nishant Kumar, Akash Ghosh, Kunal Pasad, Khushi Soni, Manisha Jaishwal, Sriparna Saha, Syukron Abu Ishaq Alfarozi, Asres Temam Abagissa, Kitsuchart Pasupa, Jose G Moreno, Haiqin Yang
  • Multilingual Federated Low-Rank Adaptation for Collaborative Content Anomaly Detection across Multilingual Social Media Participants
    Jiaxin Li, Geng Zhao
  • M3Retrieve: Benchmarking Multimodal Retrieval for Medicine
    Arkadeep Acharya, Akash Ghosh, Pradeepika Verma, Kitsuchart Pasupa, Sriparna Saha, Dr Priti Singh
  • The Hidden Strength of Disagreement: Unraveling the Consensus-Diversity Tradeoff in Adaptive Multi-Agent Systems
    Zengqing Wu, Takayuki Ito
  • Friend or Foe? A Computational Investigation of Semantic False Friends across Romance Languages
    Ana Sabina Uban, Liviu P Dinu, Ioan-Bogdan Iordache, Simona Georgescu, Claudia Vlad
  • KLAAD: Refining Attention Mechanisms to Reduce Societal Bias in Generative Language Models
    Seorin Kim, Dongyoung Lee, Jaejin Lee
  • SeMob: Semantic Synthesis for Dynamic Urban Mobility Prediction
    Runfei Chen, Shuyang Jiang, Wei Huang
  • DyePack: Provably Flagging Test Set Contamination in LLMs Using Backdoors
    Yize Cheng, Wenxiao Wang, Mazda Moayeri, Soheil Feizi
  • Minimal, Local, and Robust: Embedding-Only Edits for Implicit Bias in T2I Models
    Feng He, Chao Zhang, Zhixue Zhao
  • Journalism-Guided Agentic In-context Learning for News Stance Detection
    Dahyun Lee, Jonghyeon Choi, Jiyoung Han, Kunwoo Park
  • Less Is MuRE: Revisiting Shallow Knowledge Graph Embeddings
    Victor Charpenay, Steven Schockaert
  • Jailbreak LLMs through Internal Stance Manipulation
    Shuangjie Fu, Du Su, Beining Huang, Fei Sun, Jingang Wang, Wei Chen, Huawei Shen, Xueqi Cheng
  • Pierce the Mists, Greet the Sky: Decipher Knowledge Overshadowing via Knowledge Circuit Analysis
    Haoming Huang, Yibo Yan, Jiahao Huo, Xin Zou, Xinfeng Li, Kun Wang, Xuming Hu
  • Complex Numerical Reasoning with Numerical Semantic Pre-training Framework
    Jun Zhang, Haihong E, Tianyi Hu, Yifan Zhu, Meina Song, Haoran Luo
  • Automated Knowledge Graph Construction using Large Language Models and Sentence Complexity Modelling
    Sydney Anuyah, Mehedi Mahmud Kaushik, Sri Rama Krishna Reddy Dwarampudi, Rakesh Shiradkar, Arjan Durresi, Sunandan Chakraborty
  • OntologyRAG-Q: Resource Development and Benchmarking for Retrieval-Augmented Question Answering in Qur’anic Tafsir
    Sadam Al-Azani, Maad Alowaifeer, Alhanoof Alhunief, Ahmed Abdelali
  • The Practical Impacts of Theoretical Constructs on Empathy Modeling
    Allison Lahnala, Charles Welch, David Jurgens, Lucie Flek
  • RecBase: Generative Foundation Model Pretraining for Zero-Shot Recommendation
    Sashuai Zhou, Weinan Gan, Qijiong Liu, Ke Lei, Jieming Zhu, Hai Huang, Yan Xia, Ruiming Tang, Zhenhua Dong, Zhou Zhao
  • Grouping Entities with Shared Properties using Multi-Facet Prompting and Property Embeddings
    Amit Gajbhiye, Thomas Bailleux, Zied Bouraoui, Luis Espinosa-Anke, Steven Schockaert
  • Context-Aware Hierarchical Taxonomy Generation for Scientific Papers via LLM-Guided Multi-Aspect Clustering
    kun Zhu, Lizi Liao, Yuxuan Gu, Lei Huang, Xiaocheng Feng, Bing Qin
  • Benchmark Profiling: Mechanistic Diagnosis of LLM Benchmarks
    Dongjun Kim, Gyuho Shim, Yongchan Chun, Minhyuk Kim, Chanjun Park, Heuiseok Lim
  • TreeReview: A Dynamic Tree of Questions Framework for Deep and Efficient LLM-based Scientific Peer Review
    Yuan Chang, Ziyue Li, Hengyuan Zhang, Yuanbo Kong, Yanru Wu, Zhijiang Guo, Ngai Wong
  • Improving Chemical Understanding of LLMs via SMILES Parsing
    Yunhui Jang, Jaehyung Kim, Sungsoo Ahn
  • Can Large Language Models Tackle Graph Partitioning?
    Yiheng Wu, Ningchao Ge, Yanmin Li, Liwei Qian, Mengna Zhu, Haoyu Yang, Haiwen Chen, JibingWu
  • To See a World in a Spark of Neuron: Disentangling Multi-Task Interference for Training-Free Model Merging
    Zitao Fang, Guodong DU, Shuyang Yu, Yifei Guo, Yiwei Zhang, Yiyao Cao, Jing Li, Ho-Kin Tang, Sim Kuan Goh
  • What You Read Isn’t What You Hear: Linguistic Sensitivity in Deepfake Speech Detection
    Binh Nguyen, Shuju Shi, Ryan Ofman, Thai Le
  • Task-Aware Resolution Optimization for Visual Large Language Models
    Weiqing Luo, Zhen Tan, Yifan Li, Xinyu Zhao, Kwonjoon Lee, Behzad Dariush, Tianlong Chen
  • CheckEval: A reliable LLM-as-a-Judge framework for evaluating text generation using checklists
    Yukyung Lee, JoongHoon Kim, Jaehee Kim, Hyowon Cho, Jaewook Kang, Pilsung Kang, Najoung Kim
  • A Necessary Step toward Faithfulness: Measuring and Improving Consistency in Free-Text Explanations
    Lingjun Zhao, Hal Daumé III
  • Boosting Multi-modal Keyphrase Prediction with Dynamic Chain-of-Thought in Vision-Language Models
    Qihang Ma, Shengyu Li, Jie Tang, Dingkang Yang, Chenshaodong, Yingyi Zhang, ChaoFeng, Ran Jiao
  • Chart2Code53: A Large-Scale Diverse and Complex Dataset for Enhancing Chart-to-Code Generation
    Tianhao Niu, Yiming Cui, Baoxin Wang, Xiao Xu, Xin Yao, Qingfu Zhu, Dayong Wu, Shijin Wang, Wanxiang Che
  • The State of Multilingual LLM Safety Research: From Measuring The Language Gap To Mitigating It
    Zheng Xin Yong, Beyza Ermis, Marzieh Fadaee, Stephen Bach, Julia Kreutzer
  • AIP: Subverting Retrieval-Augmented Generation via Adversarial Instructional Prompt
    Saket Sanjeev Chaturvedi, Gaurav Bagwe, Lan Emily Zhang, Xiaoyong Yuan
  • From Capabilities to Performance: Evaluating Key Functional Properties of LLM Architectures in Penetration Testing
    Lanxiao Huang, Daksh Dave, Tyler Cody, Peter A. Beling, Ming Jin
  • Editing Across Languages: A Survey of Multilingual Knowledge Editing
    Nadir Durrani, Basel Mousi, Fahim Dalvi
  • Your RAG is Unfair: Exposing Fairness Vulnerabilities in Retrieval-Augmented Generation via Backdoor Attacks
    Gaurav Bagwe, Saket Sanjeev Chaturvedi, Xiaolong Ma, Xiaoyong Yuan, Kuang-Ching Wang, Lan Emily Zhang
  • Drift-Adapter: A Practical Approach to Near Zero-Downtime Embedding Model Upgrades in Vector Databases
    Harshil Vejendla
  • The Staircase of Ethics: Probing LLM Value Priorities through Multi-Step Induction to Complex Moral Dilemmas
    Ya Wu, Qiang Sheng, Danding Wang, Guang Yang, Yifan Sun, Zhengjia Wang, Yuyan Bu, Juan Cao
  • SliceMoE: Routing Embedding Slices Instead of Tokens for Fine-Grained and Balanced Transformer Scaling
    Harshil Vejendla
  • ReSo: A Reward-driven Self-organizing LLM-based Multi-Agent System for Reasoning Tasks
    Heng Zhou, Hejia Geng, Xiangyuan Xue, Li Kang, Yiran Qin, Zhiyong Wang, Zhenfei Yin, LEI BAI
  • ConstraintLLM: A Neuro-Symbolic Framework for Industrial-Level Constraint Programming
    Weichun Shi, Minghao Liu, Wanting Zhang, Langchen Shi, Fuqi Jia, Feifei Ma, Jian Zhang
  • VisEscape: A Benchmark for Evaluating Exploration-driven Decision-making in Virtual Escape Rooms
    Seungwon Lim, Sungwoong Kim, Jihwan Yu, Sungjae Lee, Jiwan Chung, Youngjae Yu
  • ESC-Judge: A Framework for Comparing Emotional Support Conversational Agents
    Navid Madani
  • Neuron-Level Differentiation of Memorization and Generalization in Large Language Models
    Ko-Wei Huang, Yi-Fu Fu, Ching-Yu Tsai, Yu-Chieh Tu, TZU-LING CHENG, Cheng-Yu Lin, Yi-Ting Yang, Heng-Yi Liu, Keng-Te Liao, Da-Cheng Juan, Shou-De Lin
  • Sparse Neurons Carry Strong Signals of Question Ambiguity in LLMs
    Zhuoxuan Zhang, Jinhao Duan, Edward Kim, Kaidi Xu
  • Do Slides Help? Multi-modal Context for Automatic Transcription of Conference Talks
    Supriti Sinhamahapatra, Jan Niehues
  • Promote, Suppress, Iterate: How Language Models Answer One-to-Many Factual Queries
    Tianyi Lorena Yan, Robin Jia
  • Out of Sight, Not Out of Context? Egocentric Spatial Reasoning in VLMs Across Disjoint Frames
    Sahithya Ravi, Gabriel Herbert Sarch, Vibhav Vineet, Andrew D Wilson, Balasaravanan Thoravi Kumaravel
  • Enhancing Chain-of-Thought Reasoning via Neuron Activation Differential Analysis
    Yiru Tang, Kun Zhou, Yingqian Min, Jing Sha, Zhichao Sheng, Shijin Wang, Xin Zhao
  • PakBBQ: A Culturally Adapted Bias Benchmark for QA
    Abdullah Hashmat, Muhammad Arham Mirza, Agha Ali Raza
  • MULTIGUARD: An Efficient Approach for AI Safety Moderation Across Languages and Modalities
    Sahil Verma, Keegan Hines, Jeff Bilmes, Charlotte Siska, Luke Zettlemoyer, Hila Gonen, Chandan Singh
  • Comparing human and LLM politeness strategies in free production
    Haoran Zhao, Robert D. Hawkins
  • ASTRA: A Negotiation Agent with Adaptive and Strategic Reasoning via Tool-integrated Action for Dynamic Offer Optimization
    Deuksin Kwon, Jiwon Hae, Emma Clift, Daniel Shamsoddini, Jonathan Gratch, Gale Lucas
  • CARMA: Enhanced Compositionality in LLMs via Advanced Regularisation and Mutual Information Alignment
    Nura Aljaafari, Danilo Carvalho, Andre Freitas
  • MEPT: Mixture of Expert Prompt Tuning as a Manifold Mapper
    Runjia Zeng, Guangyan Sun, Qifan Wang, Tong Geng, Sohail Dianat, Xiaotian Han, Raghuveer Rao, XUELING ZHANG, Cheng Han, Lifu Huang, Dongfang Liu
  • KG-CQR: Leveraging Structured Relation Representations in Knowledge Graphs for Contextual Query Retrieval
    Chi Minh Bui, Ngoc Mai Thieu, Vinh Van Nguyen, Jason J. Jung, Khac-Hoai Nam Bui
  • SABER: Uncovering Vulnerabilities in Safety Alignment via Cross-Layer Residual Connection
    Maithili Joshi, Palash Nandi, Tanmoy Chakraborty
  • When Truthful Representations Flip Under Deceptive Instructions?
    Xianxuan Long, Yao Fu, Runchao Li, Mu Sheng, Haotian Yu, Xiaotian Han, Pan Li
  • Can LLMs simulate the same correct solutions to free-response math problems as real students?
    Yuya Asano, Diane Litman, Erin Walker
  • Evaluating Behavioral Alignment in Conflict Dialogue: A Multi-Dimensional Comparison of LLM Agents and Humans
    Deuksin Kwon, Kaleen Shrestha, Bin Han, Gale Lucas
  • RECALL: REpresentation-aligned Catastrophic-forgetting ALLeviation via Hierarchical Model Merging
    Bowen Wang, Haiyuan Wan, 石力文, Chen Yang, Peng He, Yue MA, Haochen Han, Wenhao Li, Tiao Tan, Yongjian Li, Fangming Liu, Gong Yifan, Sheng Zhang
  • Not-Just-Scaling Laws: Towards a Better Understanding of the Downstream Impact of Language Model Design Decisions
    Emmy Liu, Amanda Bertsch, Lintang Sutawika, Lindia Tjuatja, Patrick Fernandes, Lara Marinov, Michael Chen, Shreya Singhal, Carolin Lawrence, Aditi Raghunathan, Kiril Gashteovski, Graham Neubig
  • Synthetic Socratic Debates: Examining Persona Effects on Moral Decision and Persuasion Dynamics
    Jiarui Liu, Yueqi Song, Yunze Xiao, Mingqian Zheng, Lindia Tjuatja, Jana Schaich Borg, Mona T. Diab, Maarten Sap
  • Linear-Time Demonstration Selection for In-Context Learning via Gradient Estimation
    Ziniu Zhang, Zhenshuo Zhang, Dongyue Li, Lu Wang, Jennifer Dy, Hongyang R. Zhang
  • Speech Vecalign: an Embedding-based Method for Aligning Parallel Speech Documents
    Chutong Meng, Philipp Koehn
  • TurBLiMP: A Turkish Benchmark of Linguistic Minimal Pairs
    Ezgi Başar, Francesca Padovani, Jaap Jumelet, Arianna Bisazza
  • DynamicNER: A Dynamic, Multilingual, and Fine-Grained Dataset for LLM-based Named Entity Recognition
    Hanjun Luo, Yingbin Jin, Yiran Wang, Xinfeng Li, Tong Shang, Xuecheng Liu, Ruizhe Chen, Kun Wang, Hanan Salam, Qingsong Wen, Zuozhu Liu
  • Reliable and Cost-Effective Exploratory Data Analysis via Graph-Guided RAG
    Mossad Helali, Yutai Luo, Tae Jun Ham, Jim Plotts, Ashwin Chaugule, Jichuan Chang, Parthasarathy Ranganathan, Essam Mansour
  • Med-PRM: Medical Reasoning Models with Stepwise, Guideline-verified Process Rewards
    Jaehoon Yun, Jiwoong Sohn, Jungwoo Park, Hyunjae Kim, Xiangru Tang, Daniel Shao, Yong Hoe Koo, Ko Minhyeok, Qingyu Chen, Mark Gerstein, Michael Moor, Jaewoo Kang
  • Graders Should Cheat: Privileged Information Enables Expert-Level Automated Evaluations
    Jin Peng Zhou, Séb Arnold, Nan Ding, Kilian Q Weinberger, Nan Hua, Fei Sha
  • SAMULE: Self-Learning Agents Enhanced by Multi-level Reflection
    Yubin Ge, Salvatore Romeo, Jason Cai, MONICA SUNKARA, Yi Zhang
  • Database-Augmented Query Representation for Information Retrieval
    Soyeong Jeong, Jinheon Baek, Sukmin Cho, Sung Ju Hwang, Jong C. Park
  • The Enemy from Within: A Study of Political Delegitimization Discourse in Israeli Political Speech
    Naama Rivlin-Angert, Guy Mor-Lan
  • Attention Eclipse: Manipulating Attention to Bypass LLM Safety-Alignment
    Pedram Zaree, Md Abdullah Al Mamun, Quazi Mishkatul Alam, Yue Dong, Ihsen Alouani, Nael Abu-Ghazaleh
  • Representation Potentials of Foundation Models for Multimodal Alignment: A Survey
    Jianglin Lu, Hailing Wang, Yi Xu, Yizhou Wang, Kuo Yang, Yun Fu
  • Draft Model Knows When to Stop: Self-Verification Speculative Decoding for Long-Form Generation
    Ziyin Zhang, Jiahao Xu, Tian Liang, Xingyu Chen, Zhiwei He, Rui Wang, Zhaopeng Tu
  • Visual-Aware Speech Recognition for Noisy Scenarios
    Balaji Darur, Karan Singla
  • Advancing Arabic Diacritization: Improved Datasets, Benchmarking, and State-of-the-Art Models
    Abubakr Mohamed, Hamdy Mubarak
  • Implicit Values Embedded in How Humans and LLMs Complete Subjective Everyday Tasks
    Arjun Arunasalam, Madison Pickering, Z. Berkay Celik, Blase Ur
  • Dynamic Retriever for In-Context Knowledge Editing via Policy Optimization
    Mahmud Wasif Nafee, Maiqi JIANG, Haipeng Chen, Yanfu Zhang
  • LVLMs are Bad at Overhearing Human Referential Communication
    Zhengxiang Wang, Weiling Li, Panagiotis Kaliosis, Susan Brennan, Owen Rambow
  • Let’s Reason Formally: Natural-Formal Hybrid Reasoning Enhances LLM’s Math Capability
    Ruida WANG, Yuxin Li, Yi R. Fung, Tong Zhang
  • TORSO: Template-Oriented Reasoning Towards General Tasks
    Minhyuk Kim, Seungyoon Lee, Heuiseok Lim
  • Prototypical Human-AI Collaboration Behaviors from LLM-Assisted Writing in the Wild
    Sheshera Mysore, Debarati Das, Hancheng Cao, Bahareh Sarrafzadeh
  • WildScore: Benchmarking MLLMs in-the-Wild Symbolic Music Reasoning
    Gagan Mundada, Yash Vishe, Amit Namburi, Xin Xu, Zachary Novack, Julian McAuley, Junda Wu
  • TRIAL: Token Relations and Importance Aware Late-interaction for Accurate Text Retrieval
    Hyukkyu Kang, Injung Kim, Wook-Shin Han
  • Do Large Language Models excel in Complex Logical Reasoning with Formal Language?
    Jin Jiang, Jianing Wang, Yuchen Yan, Yang Liu, Jianhua Zhu, Mengdi Zhang, Liangcai Gao
  • Fair or Framed? Political Bias in News Articles Generated by LLMs
    Junho Yoo
  • ReviewRL: Towards Automated Scientific Review with RL
    Sihang Zeng, Kai Tian, Kaiyan Zhang, Yuru wang, Junqi Gao, Runze Liu, Sa Yang, Jingxuan Li, Xinwei Long, Jiaheng Ma, Biqing Qi, Bowen Zhou
  • Grammar Pruning: Enabling Low-Latency Zero-Shot Task-Oriented Language Models for Edge AI
    Octavian Alexandru Trifan, Jason Lee Weber, Marc Titus Trifan, Alexandru Nicolau, Alexander Veidenbaum
  • Calibrating LLMs for Text-to-SQL Parsing by Leveraging Sub-clause Frequencies
    Terrance Liu, Shuyi Wang, Daniel Preotiuc-Pietro, Yash Chandarana, Chirag Gupta
  • REACT: Representation Extraction And Controllable Tuning to Overcome Overfitting in LLM Knowledge Editing
    Haitian Zhong, Yuhuan Liu, Ziyang Xu, Guofan Liu, Qiang Liu, Shu Wu, Zhe Zhao, Liang Wang, Tieniu Tan
  • ThinkEdit: Interpretable Weight Editing to Mitigate Overly Short Thinking in Reasoning Models
    Chung-En Sun, Ge Yan, Tsui-Wei Weng
  • Incorporating Diverse Perspectives in Cultural Alignment: Survey of Evaluation Benchmarks Through A Three-Dimensional Framework
    Meng-Chen Wu, Si-Chi Chin, Tess Wood, Ayush Goyal, Narayanan Sadagopan
  • Are Large Language Models Chronically Online Surfers? A Dataset for Chinese Internet Meme Explanation
    Yubo Xie, Chenkai Wang, Zongyang Ma, Fahui Miao
  • RoDEval: A Robust Word Sense Disambiguation Evaluation Framework for Large Language Models
    Luyang Zhang, Shuaimin Li, Yishuo Li, Kunpeng Kang, Kaiyuan Zhang, Cong Wang, Wenpeng Lu
  • PychoAgent: Psychology-driven LLM Agents for Explainable Panic Prediction on Social Media during Sudden Disaster Events
    Mengzhu Liu, Zhengqiu Zhu, Chuan Ai, Chen Gao, Xinghong Li, Lingnan He, Kaisheng Lai, Yingfeng Chen, Xin Lu, Yong Li, Quanjun Yin
  • Stepwise Reasoning Checkpoint Analysis: A Test Time Scaling Method to Enhance LLMs’ Reasoning
    Zezhong WANG, Xingshan Zeng, Weiwen Liu, Yufei Wang, Liangyou Li, Yasheng Wang, Lifeng Shang, Xin Jiang, Qun Liu, Kam-Fai Wong
  • Inter-sentence Context Modeling and Structure-aware Representation Enhancement for Conversational Sentiment Quadruple Extraction
    Yu Zhang, Zhaoman Zhong, Huihui LV
  • Igniting Creative Writing in Small Language Models: LLM-as-a-Judge versus Multi-Agent Refined Rewards
    Xiaolong Wei, Bo Lu, Xingyu Zhang, Zhejun Zhao, Dongdong Shen, Long Xia, Dawei Yin
  • Governance in Motion: Co-evolution of Constitutions and AI models for Scalable Safety
    Chenhao Huang, Ziyu Shen, Yicong Ren, Huiyuan Zheng, Jiazheng Zhang, Mingxu Chai, Ming Zhang, Shihan Dou, Fan Mo, Jie Shi, Tao Gui, Qi Zhang, Xuanjing Huang
  • Web Intellectual Property at Risk: Preventing Unauthorized Real-Time Retrieval by Large Language Models
    Yisheng Zhong, Yizhu Wen, Junfeng Guo, Mehran Kafai, Heng Huang, Hanqing Guo, Zhuangdi Zhu
  • SciEvent: Benchmarking Multi-domain Scientific Event Extraction
    Bofu Dong, Pritesh Shah, Sumedh Sonawane, Tiyasha Banerjee, Erin Brady, Xinya Du, Ming Jiang
  • Media Source Matters More Than Content: Unveiling Political Bias in LLM-Generated Citations
    Sunhao Dai, Zhanshuo Cao, Wenjie Wang, Liang Pang, Jun Xu, See-Kiong Ng, Tat-Seng Chua
  • RJE: A Retrieval-Judgment-Exploration Framework for Efficient Knowledge Graph Question Answering with LLMs
    Can Lin, Zhengwang Jiang, Ling Zheng, Qi Zhao, Yuhang Zhang, Qi Song, Wangqiu Zhou
  • Bias Mitigation or Cultural Commonsense? Evaluating LLMs with a Japanese Dataset
    Taisei Yamamoto, Ryoma Kumon, Danushka Bollegala, Hitomi Yanaka
  • Chameleon LLMs: User Personas Influence Chatbot Personality Shifts
    Jane Xing, Tianyi Niu, Shashank Srivastava
  • GuessingGame: Measuring the Informativeness of Open-Ended Questions in Large Language Models
    Dylan Hutson, Daniel Vennemeyer, Aneesh Deshmukh, Justin Zhan, Tianyu Jiang
  • SynC-LLM: Generation of Large-Scale Synthetic Circuit Code with Hierarchical Language Models
    Shang Liu, Yao Lu, Wenji Fang, Jing Wang, Zhiyao Xie
  • Why Stop at One Error? Benchmarking LLMs as Data Science Code Debuggers for Multi-Hop and Multi-Bug Errors
    Zhiyu Yang, Shuo Wang, Yukun Yan, Yang Deng
  • Dovetail: A CPU/GPU Heterogeneous Speculative Decoding for LLM inference
    Libo Zhang, Zhaoning Zhang, xubaizhou, Rui Li, Zhiliang Tian, Songzhu Mei, Dongsheng Li
  • V-SEAM: Visual Semantic Editing and Attention Modulating for Causal Interpretability of Vision-Language Models
    Qidong Wang, Junjie Hu, Ming Jiang
  • LORAXBENCH: A Multitask, Multilingual Benchmark Suite for 20 Indonesian Languages
    Alham Fikri Aji, Trevor Cohn
  • SAFE: Schema-Driven Approximate Distance Join for Efficient Knowledge Graph Querying
    Sangoh Lee, Sungho Park, Wook-Shin Han
  • Structured Preference Optimization for Vision-Language Long-Horizon Task Planning
    Xiwen Liang, Min Lin, Weiqi Ruan, Rongtao Xu, Yuecheng Liu, Jiaqi Chen, Bingqian Lin, Yuzheng Zhuang, Xiaodan Liang
  • Position: LLMs Can be Good Tutors in English Education
    Jingheng Ye, Shen Wang, Deqing Zou, Yibo Yan, Kun Wang, Hai-Tao Zheng, Ruitong Liu, Zenglin Xu, Irwin King, Philip S. Yu, Qingsong Wen
  • CLLMate: A Multimodal Benchmark for Weather and Climate Events Forecasting
    Haobo Li, Zhaowei Wang, Jiachen Wang, Yueya WANG, Alexis Kai Hon Lau, Huamin Qu
  • Extracting and Combining Abilities For Building Multi-lingual Ability-enhanced Large Language Models
    Zhipeng Chen, Kun Zhou, Liang Song, Xin Zhao, Bingning Wang, Weipeng Chen, Ji-Rong Wen
  • Evaluating the Effectiveness and Scalability of LLM-Based Data Augmentation for Retrieval
    Pranjal A Chitale, Bishal Santra, Yashoteja Prabhu, Amit Sharma
  • Temporal Referential Consistency: Do LLMs Favor Sequences Over Absolute Time References?
    Ashutosh Bajpai, Tanmoy Chakraborty
  • MemeArena: Automating Context-Aware Unbiased Evaluation of Harmfulness Understanding for Multimodal Large Language Models
    Zixin Chen, Hongzhan Lin, Kaixin Li, Ziyang Luo, Yayue Deng, Jing Ma
  • Multi-perspective Analysis of Large Language Model Domain Specialization: An Experiment in Accounting Audit Procedures Generation
    Yusuke Noro
  • Generator-Assistant Stepwise Rollback Framework for Large Language Model Agent
    Xingzuo Li, Kehai Chen, Yunfei Long, Xuefeng Bai, Yong Xu, Min Zhang
  • DocAgent: An Agentic Framework for Multi-Modal Long-Context Document Understanding
    Li Sun, Liu He, Shuyue Jia, Yangfan He, Chenyu You
  • EasyRec: Simple yet Effective Language Models for Recommendation
    Xubin Ren, Chao Huang
  • From Automation to Autonomy: A Survey on Large Language Models in Scientific Discovery
    Tianshi Zheng, Zheye Deng, Hong Ting Tsang, Weiqi Wang, Jiaxin Bai, Zihao Wang, Yangqiu Song
  • Mapping the Minds of LLMs: A Graph-Based Analysis of Reasoning LLMs
    Zhen Xiong, Yujun Cai, Zhecheng Li, Yiwei Wang
  • ViPE: Visual Perception in Parameter Space for Efficient Video-Language Understanding
    Shichen Lu, Tongtian Yue, Longteng Guo, Handong Li, Xingjian He, Si Liu, Jing Liu
  • Alignment for Efficient Tool Calling of Large Language Models
    Hongshen Xu, Zihan Wang, Zichen Zhu, Lei Pan, Xingyu Chen, Shuai Fan, Lu Chen, Kai Yu
  • ToM: Leveraging Tree-oriented MapReduce for Long-Context Reasoning in Large Language Models
    Jiani Guo, Zuchao Li, Jie Wu, Qianren Wang, Yun Li, Lefei Zhang, hai zhao, Yujiu Yang
  • BANMIME : Misogyny Detection with Metaphor Explanation on Bangla Memes
    Md Ayon Mia, Akm Moshiur Rahman Mazumder, Khadiza Sultana Sayma, Md Fahim, Md Tahmid Hasan Fuad, MUHAMMAD IBRAHIM KHAN, AKMMAHBUBUR RAHMAN
  • Phi: Preference Hijacking in Multi-modal Large Language Models at Inference Time
    Yifan Lan, Yuanpu Cao, Weitong Zhang, Lu Lin, Jinghui Chen
  • Retrieval-augmented GUI Agents with Generative Guidelines
    Ran Xu, Kaixin Ma, Wenhao Yu, Hongming Zhang, Joyce C. Ho, Carl Yang, Dong Yu
  • COAS2W: A Chinese Older-Adults Spoken-to-Written Transformation Corpus with Context Awareness
    Chun Kang, Zhigu Qian, Zhen Fu, Jiaojiao Fu, Yangfan Zhou
  • Answer Convergence as a Signal for Early Stopping in Reasoning
    Xin Liu, Lu Wang
  • VeriFact: Enhancing Long-Form Factuality Evaluation with Refined Fact Extraction and Reference Facts
    Xin Liu, Lechen Zhang, Sheza Munir, Yiyang Gu, Lu Wang
  • SQUAB: Evaluating LLM robustness to Ambiguous and Unanswerable Questions in Semantic Parsing
    Simone Papicchio, Luca Cagliero, Paolo Papotti
  • Reliable Evaluation and Benchmarks for Statement Autoformalization
    Auguste Poiroux, Gail Weiss, Viktor Kunčak, Antoine Bosselut
  • VisBias: Measuring Explicit and Implicit Social Biases in Vision Language Models
    Jen-tse Huang, Jiantong Qin, Jianping Zhang, Youliang Yuan, Wenxuan Wang, Jieyu Zhao
  • Less Is More? Examining Fairness in Pruned Large Language Models for Summarising Opinions
    Nannan Huang, Haytham M. Fayek, Xiuzhen Zhang
  • AI Sees Your Location—But With A Bias Toward The Wealthy World
    Jingyuan Huang, Jen-tse Huang, Ziyi Liu, Xiaoyuan Liu, Wenxuan Wang, Jieyu Zhao
  • Faster In-Context Learning for LLMs via N-Gram Trie Speculative Decoding
    Jinglin Chen, Qiwei Li, Zuchao Li, Baoyuan Qi, Liu Guoming, Haojun Ai, hai zhao, Ping Wang
  • From Surveys to Narratives: Rethinking Cultural Value Adaptation in LLMs
    Muhammad Farid Adilazuarda, Chen Cecilia Liu, Iryna Gurevych, Alham Fikri Aji
  • Iterative Prompt Refinement for Safer Text-to-Image Generation
    Jinwoo Jeon, JunHyeok Oh, Hayeong Lee, Byung-Jun Lee
  • Language Models as Continuous Self-Evolving Data Engineers
    Peidong Wang, Ming Wang, Zhiming Ma, Xiaocui Yang, Shi Feng, Daling Wang, Yifei Zhang, Kaisong Song
  • Unilaw-R1: A Large Language Model for Legal Reasoning with Reinforcement Learning and Iterative Inference
    Hua Cai, Shuang Zhao, liang zhang, Xuli Shen, Qing Xu, Weilin Shen, ZihaoWen, Tianke Ban
  • Exploring Response Uncertainty in MLLMs: An Empirical Evaluation under Misleading Scenarios
    Yunkai Dang, Mengxi Gao, Yibo Yan, Xin Zou, Yanggan Gu, Jungang Li, Jingyu Wang, Peijie Jiang, Aiwei Liu, Jia Liu, Xuming Hu
  • Evaluating and Aligning Human Economic Risk Preferences in LLMs
    Jiaxin Liu, Yixuan Tang, Yi Yang, KAR YAN TAM
  • Ensembling Prompting Strategies for Zero-Shot Hierarchical Text Classification with Large Language Models
    Mingxuan Xia, Zhijie Jiang, Haobo Wang, Junbo Zhao, Tianlei Hu, Gang Chen
  • Improbable Bigrams Expose Vulnerabilities of Incomplete Tokens in Byte-Level Tokenizers
    Eugene Jang, Kimin Lee, Jin-Woo Chung, Keuntae Park, Seungwon Shin
  • UI-Hawk: Unleashing the Screen Stream Understanding for Mobile GUI Agents
    Jiwen Zhang, Ya-Qi Yu, Minghui Liao, WenTao Li, Jihao Wu, zhongyu wei
  • UniDebugger: Hierarchical Multi-Agent Framework for Unified Software Debugging
    Cheryl Lee, Chunqiu Steven Xia, Longji Yang, Jen-tse Huang, Zhouruixing Zhu, LINGMING ZHANG, Michael R. Lyu
  • Understanding the Thinking Process of Reasoning Models: A Perspective from Schoenfeld’s Episode Theory
    Nan Zhang, Ming Li, Chenrui Fan, Hong Jiao, Yanbin Fu, Sydney Peters, Qingshu Xu, Robert Lissitz, Tianyi Zhou
  • Thread: A Logic-Based Data Organization Paradigm for How-To Question Answering with Retrieval Augmented Generation
    Kaikai An, Fangkai Yang, Liqun Li, Junting Lu, Sitao Cheng, Shuzheng Si, Lu Wang, Pu Zhao, Lele Cao, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang, Baobao Chang
  • Unsupervised Word-level Quality Estimation for Machine Translation Through the Lens of Annotators (Dis)agreement
    Gabriele Sarti, Vilém Zouhar, Malvina Nissim, Arianna Bisazza
  • STEER-BENCH: A Benchmark for Evaluating the Steerability of Large Language Models
    Kai Chen, Zihao He, Taiwei Shi, Kristina Lerman
  • Combining Constrained and Unconstrained Decoding via Boosting: BoostCD and Its Application to Information Extraction
    Marija Sakota, Robert West
  • MultiLogicNMR(er): A Benchmark and Neural-Symbolic Framework for Non-monotonic Reasoning with Multiple Extensions
    Yeliang Xiu, Yongmei Liu
  • Beyond Demographics: Enhancing Cultural Value Survey Simulation with Multi-Stage Personality-Driven Cognitive Reasoning
    Haijiang Liu, Qiyuan Li, Chao Gao, Yong Cao, Xiangyu Xu, XUN WU, Daniel Hershcovich, Jinguang Gu
  • CrystalICL: Enabling In-Context Learning for Crystal Generation
    Ruobing Wang, Qiaoyu Tan, Yili Wang, Ying Wang, Xin Wang
  • Towards a Unified Paradigm of Concept Editing in Large Language Models
    Zhuowen Han, Xinwei Wu, Dan Shi, Renren Jin, Deyi Xiong
  • Step-level Verifier-guided Hybrid Test-Time Scaling for Large Language Models
    Kaiyan Chang, Yonghao Shi, Chenglong Wang, Hang Zhou, Chi Hu, Xiaoqian Liu, yingfeng luo, Yuan Ge, Tong Xiao, JingBo Zhu
  • Dynamic Expert Specialization: Towards Catastrophic Forgetting-Free Multi-Domain MoE Adaptation
    Junzhuo Li, Bo Wang, Xiuze Zhou, Xuming Hu
  • RRInf: Efficient Influence Function Estimation via Ridge Regression for Large Language Models and Text-to-Image Diffusion Models
    Zhuozhuo Tu, Cheng Chen, Yuxuan Du
  • Evaluating Spatiotemporal Consistency in Automatically Generated Sewing Instructions
    Luisa Geiger, Mareike Hartmann, Michael Sullivan, Alexander Koller
  • MaZO: Masked Zeroth-Order Optimization for Multi-Task Fine-Tuning of Large Language Models
    Zhen Zhang, Yifan Yang, Kai Zhen, Nathan Susanj, Athanasios Mouchtaris, Siegfried Kunzmann, Zheng Zhang
  • Procedural Environment Generation for Tool-Use Agents
    Michael Sullivan, Mareike Hartmann, Alexander Koller
  • FacLens: Transferable Probe for Foreseeing Non-Factuality in Fact-Seeking Question Answering of Large Language Models
    Yanling Wang, Haoyang Li, Hao Zou, Jing Zhang, Xinlei He, Qi Li, Ke Xu
  • OMS: On-the-fly, Multi-Objective, Self-Reflective Ad Keyword Generation via LLM Agent
    Bowen Chen, Zhao Wang, Shingo Takamatsu
  • Med-VRAgent: A Framework for Medical Visual Reasoning-Enhanced Agents
    guangfu guo, Xiaoqian Lu, Yue Feng
  • TrojanWave: Exploiting Prompt Learning for Stealthy Backdoor Attacks on Large Audio-Language Models
    Asif Hanif, Maha Tufail Agro, Fahad Shamshad, Karthik Nandakumar
  • Can LLMs be Literary Companions?: Analysing LLMs on Bengali Figures of Speech Identification
    Sourav Das, Kripabandhu Ghosh
  • Group-SAE: Efficient Training of Sparse Autoencoders for Large Language Models via Layer Groups
    Davide Ghilardi, Federico Belotti, Marco Molinari, Tao Ma, Matteo Palmonari
  • Retrieval over Classification: Integrating Relation Semantics for Multimodal Relation Extraction
    Lei Hei, Tingjing Liao, peiyingxin, Yiyang Qi, Jiaqi Wang, Ruiting Li, Feiliang Ren
  • PunMemeCN: A Benchmark to Explore Vision-Language Models’ Understanding of Chinese Pun Memes
    Zhijun Xu, Siyu Yuan, Yiqiao Zhang, Jingyu Sun, Tong Zheng, Deqing Yang
  • UltraIF: Advancing Instruction Following from the Wild
    Kaikai An, Li Sheng, Ganqu Cui, Shuzheng Si, Ning Ding, Yu Cheng, Baobao Chang
  • Identifying Pre-training Data in LLMs: A Neuron Activation-Based Detection Framework
    HONGYI TANG, Zhihao Zhu, Yi Yang
  • TreeRare: Syntax Tree-Guided Retrieval and Reasoning for Knowledge-Intensive Question Answering
    Boyi Zhang, Zhuo Liu, Hangfeng He
  • Mapping Toxic Comments Across Demographics: A Dataset from German Public Broadcasting
    Jan Fillies, Michael Peter Hoffmann, Rebecca Reichel, Roman Salzwedel, Sven Bodemer, Adrian Paschke
  • Small Models, Big Results: Achieving Superior Intent Extraction through Decomposition
    Danielle Cohen, Yoni Halpern, Anatoly Efros, Noam Kahlon, Joel Oren, Omri Berkovitch, Sapir Caduri, Ido Dagan
  • On Pruning State-Space LLMs
    Tamer Ghattas, Michael Hassid, Roy Schwartz
  • An Orthogonal High-Rank Adaptation for Large Language Models
    Xin Zhang, Guang-Ze Chen, Shuzhen Li, zhulin liu, C.L.Philip Chen, Tong Zhang
  • BSFA: Leveraging the Subspace Dichotomy to Accelerate Neural Network Training
    WenJie Zhou, Bohan Wang, Wei Chen, Xueqi Cheng
  • Debatable Intelligence: Benchmarking LLM Judges via Debate Speech Evaluation
    Noy Sternlicht, Ariel Gera, Roy Bar-Haim, Tom Hope, Noam Slonim
  • METok: Multi-Stage Event-based Token Compression for Efficient Long Video Understanding
    Mengyue Wang, Shuo Chen, Kristian Kersting, Volker Tresp, Yunpu Ma
  • VisiPruner: Decoding Discontinuous Cross-Modal Dynamics for Efficient Multimodal LLMs
    Yingqi Fan, Anhao Zhao, Jinlan Fu, Junlong Tong, Hui Su, Yijie Pan, Wei Zhang, Xiaoyu Shen
  • Beyond Static Testbeds: An Interaction-Centric Agent Simulation Platform for Dynamic Recommender Systems
    Song Jin, Juntian Zhang, Yuhan Liu, Xun Zhang, Yufei zhang, Guojun Yin, Fei Jiang, Wei Lin, Rui Yan
  • SheetDesigner: MLLM-Powered Spreadsheet Layout Generation with Rule-Based and Vision-Based Reflection
    Qin Chen, Yuanyi Ren, Xiaojun Ma, Mugeng Liu, Shi Han, Dongmei Zhang
  • CAIR: Counterfactual-based Agent Influence Ranker for Agentic AI Workflows
    Amit Giloni, Chiara Picardi, Roy Betser, Shamik Bose, Aishvariya Priya Rathina Sabapathy, Roman Vainshtein
  • ReSURE: Regularizing Supervision Unreliability for Multi-turn Dialogue Fine-tuning
    Yiming Du, Yifan Xiang, Bin Liang, Dahua Lin, Kam-Fai Wong, Fei Tan
  • Precise In-Parameter Concept Erasure in Large Language Models
    Yoav Gur-Arieh, Clara Haya Suslik, Yihuai Hong, Fazl Barez, Mor Geva
  • PhonoThink: Improving Large Language Models’ Reasoning on Chinese Phonological Ambiguities
    Jianfei Ma, Zhaoxin Feng, Emmanuele Chersoni, Huacheng Song, Ziqi Zhang
  • SAFE-SQL: Self-Augmented In-Context Learning with Fine-grained Example Selection for Text-to-SQL
    Jimin Lee, Ingeol Baek, Byeongjeong Kim, Hyunkyung Bae, Hwanhee Lee
  • ExpandR: Teaching Dense Retrievers Beyond Queries with LLM Guidance
    Sijia Yao, Pengcheng Huang, Zhenghao Liu, Yu Gu, Yukun Yan, Shi Yu, Ge Yu
  • Anecdoctoring: Automated Red-Teaming Across Language and Place
    Alejandro Cuevas, Saloni Dash, Dan Vann, Madeleine I. G. Daepp
  • ACING: Actor-Critic for Instruction Learning in Black-Box LLMs
    Salma Kharrat, Fares Fourati, Marco Canini
  • Women, Infamous, and Exotic Beings: A Comparative Study of Honorific Usages in Wikipedia and LLMs for Bengali and Hindi
    Sourabrata Mukherjee, Atharva Mehta, Sougata Saha, Akhil Arora, Monojit Choudhury
  • Process-Supervised Reward Models for Verifying Clinical Note Generation: A Scalable Approach Guided by Domain Expertise
    Hanyin Wang, Chufan Gao, Qiping Xu, Bolun Liu, Guleid Hussein, Hariprasad Reddy Korsapati, Mohamad El Labban, Kingsley Iheasirim, Mohamed Hassan, Gokhan Anil, Brian Bartlett, Jimeng Sun
  • GCML: Gradient Coherence Guided Meta-Learning for Cross-Domain Emerging Topic Rumor Detection
    Zejiang He, jingyuan huang, Menglong Lu, Zhen Huang, Shanshan Liu, Zhiliang Tian, Dongsheng Li
  • Can LLMs Generate and Solve Linguistic Olympiad Puzzles?
    Neh Majmudar, Elena Filatova
  • E2LLM: Encoder Elongated Large Language Models for Long-Context Understanding and Reasoning
    Zihan Liao, Jun Wang, Hang Yu, Lingxiao Wei, Jianguo Li, Jun Wang, Wei Zhang
  • DivScore: Zero-Shot Detection of LLM-Generated Text in Specialized Domains
    Zhihui Chen, Kai He, Yucheng Huang, Yunxiao Zhu, Mengling Feng
  • Multi-Document Event Extraction Using Large and Small Language Models
    Qingkai Min, Zitian Qu, Qipeng Guo, Xiangkun Hu, Zheng Zhang, Yue Zhang
  • MA-GTS: A Multi-Agent Framework for Solving Complex Graph Problems in Real-World Applications
    Zike Yuan, Ming Liu, Hui Wang, Bing Qin
  • Enhancing Speech Large Language Models with Prompt-Aware Mixture of Audio Encoders
    Weiqiao Shan, Yuang Li, Yuhao Zhang, yingfeng luo, Chen Xu, Xiaofeng Zhao, Long Meng, Yunfei Lu, Min Zhang, Hao Yang, Tong Xiao, JingBo Zhu
  • CIKT: A Collaborative and Iterative Knowledge Tracing Framework with Large Language Models
    Runze Li, siyu wu, Jun Wang, Wei Zhang
  • Mitigating Hallucinations in LM-Based TTS Models via Distribution Alignment Using GFlowNets
    Chenlin Liu, Jiqing Han, Minghui Fang, Wei Zhou, Jie Gao
  • MolErr2Fix: Benchmarking LLM Trustworthiness in Chemistry via Modular Error Detection, Localization, Explanation, and Correction
    Yuyang Wu, Jinhui Ye, Shuhao Zhang, Lu Dai, Yonatan Bisk, Olexandr Isayev
  • Shared Path: Unraveling Memorization in Multilingual LLMs through Language Similarities
    Xiaoyu Luo, Yiyi Chen, Johannes Bjerva, Qiongxiu Li
  • Embedding Domain Knowledge for Large Language Models via Reinforcement Learning from Augmented Generation
    Chaojun Nie, Jun Zhou, Guanxiang Wang, Shisong Wu, Zichen Wang
  • LLM-Driven Completeness and Consistency Evaluation for Cultural Heritage Data Augmentation in Cross-Modal Retrieval
    Jian Zhang, Junyi Guo, Junyi Yuan, Huanda Lu, Yanlin Zhou, Fangyu Wu, Qiufeng Wang, Dongming Lu
  • Artificial Impressions: Evaluating Large Language Model Behavior Through the Lens of Trait Impressions
    Nicholas Deas, Kathleen McKeown
  • Mitigating Hallucinations in Large Vision-Language Models via Entity-Centric Multimodal Preference Optimization
    Jiulong Wu, Zhengliang Shi, Shuaiqiang Wang, Jizhou Huang, Dawei Yin, Lingyong Yan, Min Cao, Min Zhang
  • 3DS: Medical Domain Adaptation of LLMs via Decomposed Difficulty-based Data Selection
    Hongxin Ding, Yue Fang, Runchuan Zhu, Xinke Jiang, Jinyang Zhang, Yongxin Xu, Weibin Liao, Xu Chu, Junfeng Zhao, Yasha Wang
  • InfiniBench: A Benchmark for Large Multi-Modal Models in Long-Form Movies and TV Shows
    Kirolos Ataallah, Eslam Mohamed BAKR, Mahmoud Ahmed, Chenhui Gou, Khushbu Pahwa, Jian Ding, Mohamed Elhoseiny
  • Intrinsic Test of Unlearning Using Parametric Knowledge Traces
    Yihuai Hong, Lei Yu, Haiqin Yang, Shauli Ravfogel, Mor Geva
  • Speculative Streaming: Efficient and Scalable Speculative Decoding with Multi-Stream Attention
    Nikhil Bhendawade, Irina Belousova, Qichen Fu, Henry Mason, Antonie Lin, Mohammad Rastegari, Mahyar Najibi
  • Evaluating Cognitive-Behavioral Fixation via Multimodal User Viewing Patterns on Social Media
    Yujie Wang, Yunwei Zhao, Jing Yang, Han han, Shiguang Shan, Jie Zhang
  • Mind the Gap: A Closer Look at Tokenization for Multiple-Choice Question Answering with LLMs
    Mario Sanz-Guerrero, Minh Duc Bui, Katharina von der Wense
  • VocalNet: Speech LLMs with Multi-Token Prediction for Faster and High-Quality Generation
    Yuhao Wang, Heyang Liu, Ziyang Cheng, Ronghua Wu, Qunshan Gu, Yanfeng Wang, Yu Wang
  • Path Drift in Large Reasoning Models: How First-Person Commitments Override Safety
    Yuyi Huang
  • CBP-Tuning: Efficient Local Customization for Black-box Large Language Models
    Jiaxuan Zhao, Naibin Gu, Yuchen Feng, Xiyu Liu, Peng Fu, Zheng Lin, Weiping Wang
  • Beyond the Score: Uncertainty-Calibrated LLMs for Automated Essay Assessment
    Ahmed Karim, Zheng Yuan, Qiao Wang
  • Humans Hallucinate Too: Language Models Identify and Correct Subjective Annotation Errors With Label-in-a-Haystack Prompts
    Georgios Chochlakis, Peter Wu, Tikka Arjun Singh Bedi, Marcus Ma, Kristina Lerman, Shrikanth Narayanan
  • $\textit{Do It Yourself (DIY)}$: Modifying Images for Poems in a Zero-Shot Setting Using Weighted Prompt Manipulation
    Sofia Jamil, Kotla Sai Charan, Sriparna Saha, Koustava Goswami, Joseph K J
  • Looking Beyond Text: Reducing Language Bias in Large Vision-Language Models via Multimodal Dual-Attention and Soft-Image Guidance
    Haozhe Zhao, Shuzheng Si, Liang Chen, Yichi Zhang, Maosong Sun, Baobao Chang, Minjia Zhang
  • Who Holds the Pen? Caricature and Perspective in LLM Retellings of History
    Lubna Zahan Lamia, Mabsur Fatin Bin Hossain, Md Mosaddek Khan
  • DSMoE: Matrix-Partitioned Experts with Dynamic Routing for Computation-Efficient Dense LLMs
    Minxuan Lv, Zhenpeng Su, Leiyu Pan, Yizhe Xiong, Zijia Lin, Hui Chen, Wei Zhou, Jungong Han, Guiguang Ding, Wenwu Ou, Di ZHANG, Kun Gai, Songlin Hu
  • Search Wisely: Mitigating Sub-optimal Agentic Searches By Reducing Uncertainty
    Peilin Wu, Mian Zhang, Xinlu Zhang, Xinya Du, Zhiyu Chen
  • Child-Directed Language Does Not Consistently Boost Syntax Learning in Language Models
    Francesca Padovani, Jaap Jumelet, Yevgen Matusevych, Arianna Bisazza
  • Benchmarking Debiasing Methods for LLM-based Parameter Estimates
    Nicolas Audinet de Pieuchon, Adel Daoud, Connor Thomas Jerzak, Moa Johansson, Richard Johansson
  • (Almost) Free Modality Stitching of Foundation Models
    Jaisidh Singh, Diganta Misra, Boris Knyazev, Antonio Orvieto
  • VERITAS: Leveraging Vision Priors and Expert Fusion to Improve Multimodal Data
    Tingqiao Xu, Ziru Zeng, Jiayu Chen
  • Rescorla-Wagner Steering of LLMs for Undesired Behaviors over Disproportionate Inappropriate Context
    Rushi Wang, Jiateng Liu, Cheng Qian, Yifan Shen, Yanzhou Pan, Zhaozhuo Xu, Ahmed Abbasi, Heng Ji, Denghui Zhang
  • Exploring Artificial Image Generation for Stance Detection
    Zhengkang Zhang, Zhongqing Wang, Guodong Zhou
  • Hope vs. Hate: Understanding User Interactions with LGBTQ+ News Content in Mainstream US News Media through the Lens of Hope Speech
    Jonathan Pofcher, Christopher M Homan, Randall Sell, Ashiqur R. KhudaBukhsh
  • Flaw or Artifact? Rethinking Prompt Sensitivity in Evaluating LLMs
    Andong Hua, Kenan Tang, Chenhe Gu, Jindong Gu, Eric Wong, Yao Qin
  • Topic Coverage-based Demonstration Retrieval for In-Context Learning
    Wonbin Kweon, SeongKu Kang, Runchu Tian, Pengcheng Jiang, Jiawei Han, Hwanjo Yu
  • On the Same Wavelength? Evaluating Pragmatic Reasoning in Language Models across Broad Concepts
    Linlu Qiu, Cedegao E. Zhang, Joshua B. Tenenbaum, Yoon Kim, Roger P. Levy
  • MuseScorer: Idea Originality Scoring At Scale
    Ali Sarosh Bangash, Krish Veera, Ishfat Abrar Islam, Raiyan Abdul Baten
  • SAFENUDGE: Safeguarding Large Language Models in Real-time with Tunable Safety-Performance Trade-offs
    Joao Fonseca, Andrew Bell, Julia Stoyanovich
  • RaDeR: Reasoning-aware Dense Retrieval Models
    DEBRUP DAS, Sam O’Nuallain, Razieh Rahimi
  • A Culturally-diverse Multilingual Multimodal Video Benchmark & Model
    Bhuiyan Sanjid Shafique, Ashmal Vayani, Muhammad Maaz, Hanoona Abdul Rasheed, Dinura Dissanayake, Mohammed Irfan Kurpath, Yahya Hmaiti, Go Inoue, Jean Lahoud, Md. Safirur Rashid, Shadid Intisar Quasem, Maheen Fatima, Franco Vidal, Mykola Maslych, Ketan Pravin More, Sanoojan Baliah, Hasindri Watawana, Yuhao Li, Fabian Farestam, Leon Schaller, Roman Tymtsiv, Simon Weber, Hisham Cholakkal, Ivan Laptev, Shin’ichi Satoh, Michael Felsberg, Mubarak Shah, Salman Khan, Fahad Shahbaz Khan
  • DRES: Fake news detection by dynamic representation and ensemble selection
    Faramarz Farhangian, Leandro Augusto Ensina, George D C Cavalcanti, Rafael M. O. Cruz
  • A Graph-Theoretical Framework for Analyzing the Behavior of Causal Language Models
    Rashin Rahnamoun, Mehrnoush Shamsfard
  • Membership and Memorization in LLM Knowledge Distillation
    Ziqi Zhang, Ali Shahin Shamsabadi, Hanxiao Lu, Yifeng Cai, Hamed Haddadi
  • Balanced Multi-Factor In-Context Learning for Multilingual Large Language Models
    Masahiro Kaneko, Alham Fikri Aji, Timothy Baldwin
  • Efficient Context Selection for Long-Context QA: No Tuning, No Iteration, Just Adaptive‑$k$
    Chihiro Taguchi, Seiji Maekawa, Nikita Bhutani
  • Languages Still Left Behind: Toward a Better Multilingual Machine Translation Benchmark
    Chihiro Taguchi, Seng Mai, Keita Kurabe, Yusuke Sakai, Georgina Agyei, Soudabeh Eslami, David Chiang
  • Think Globally, Group Locally: Evaluating LLMs Using Multi-Lingual Word Grouping Games
    César Guerra-Solano, Zhuochun Li, Xiang Lorraine Li
  • Pointing to a Llama and Call it a Camel: On the Sycophancy of Multimodal Large Language Models
    Renjie Pi, Kehao Miao, LI PEIHANG, Runtao Liu, Jiahui Gao, Jipeng Zhang, Xiaofang Zhou
  • MR. Judge: Multimodal Reasoner as a Judge
    Renjie Pi, Haoping Bai, Qibin Chen, Xiaoming Simon Wang, Jiulong Shan, Xiaojiang Liu, Meng Cao
  • MobiZO: Enabling Efficient LLM Fine-Tuning at the Edge via Inference Engines
    Lei Gao, Amir Ziashahabi, Yue Niu, Salman Avestimehr, Murali Annavaram
  • Fann or Flop: A Multigenre, Multiera Benchmark for Arabic Poetry Understanding in LLMs
    Wafa Al Ghallabi, Ritesh Thawkar, Sara Ghaboura, Ketan Pravin More, Omkar Thawakar, Hisham Cholakkal, Salman Khan, Rao Muhammad Anwer
  • CoMAT: Chain of Mathematically Annotated Thought Improves Mathematical Reasoning
    Joshua Ong Jun Leang, Aryo Pradipta Gema, Shay B Cohen
  • s1: Simple test-time scaling
    Niklas Muennighoff, Zitong Yang, Weijia Shi, Xiang Lisa Li, Li Fei-Fei, Hannaneh Hajishirzi, Luke Zettlemoyer, Percy Liang, Emmanuel Candes, Tatsunori Hashimoto
  • Learning Subjective Label Distributions via Sociocultural Descriptors
    MOHAMMED FAYIZ PARAPPAN, Ricardo Henao
  • COM-BOM: Bayesian Exemplar Search for Efficiently Exploring the Accuracy-Calibration Pareto Frontier
    Gaoxiang Luo, Aryan Deshwal
  • ML-Promise: A Multilingual Dataset for Corporate Promise Verification
    Yohei Seki, Hakusen Shu, Anaïs Lhuissier, Hanwool Lee, Juyeon Kang, Min-Yuh Day, Chung-Chi Chen
  • Reading Between the Prompts: How Stereotypes Shape LLM’s Implicit Personalization
    Vera Neplenbroek, Arianna Bisazza, Raquel Fernández
  • Paired by the Teacher: Turning Unpaired Data into High-Fidelity Pairs for Low-Resource Text Generation
    Yen-Ju Lu, Thomas Thebaud, Laureano Moro-Velazquez, Najim Dehak, Jesus Villalba
  • Please Translate Again: Two Simple Experiments on Whether Human-Like Reasoning Helps Translation
    Di Wu, Seth Aycock, Christof Monz
  • How Do Large Vision-Language Models See Text in Image? Unveiling the Distinctive Role of OCR Heads
    Ingeol Baek, Hwan Chang, Sunghyun Ryu, Hwanhee Lee
  • Explainability and Interpretability of Multilingual Large Language Models: A Survey
    Lucas Resck, Isabelle Augenstein, Anna Korhonen
  • Decoding the Rule Book: Extracting Hidden Moderation Criteria from Reddit Communities
    Youngwoo Kim, Himanshu Beniwal, Steven L. Johnson, Thomas Hartvigsen
  • AcT2I: Evaluating and Improving Action Depiction in Text-to-Image Models
    Vatsal Malaviya, Agneet Chatterjee, Maitreya Patel, Yezhou Yang
  • Assessing French Readability for Adults with Low Literacy: A Global and Local Perspective
    Wafa Aissa, Thibault Bañeras-Roux, Elodie Vanzeveren, GAO Lingyun, Rodrigo Wilkens, Thomas François
  • LILaC: Late Interacting in Layered Component Graph for Open-domain Multimodal Multihop Retrieval
    Joohyung Yun, Doyup Lee, Wook-Shin Han
  • DiCoRe: Enhancing Zero-shot Event Detection via Divergent-Convergent LLM Reasoning
    Tanmay Parekh, Kartik Mehta, Ninareh Mehrabi, Kai-Wei Chang, Nanyun Peng
  • SNaRe: Domain-aware Data Generation for Low-Resource Event Detection
    Tanmay Parekh, Yuxuan Dong, Lucas Bandarkar, Artin Kim, I-Hung Hsu, Kai-Wei Chang, Nanyun Peng
  • Table-R1: Inference-Time Scaling for Table Reasoning Tasks
    Zheyuan Yang, Lyuhao Chen, Arman Cohan, Yilun Zhao
  • LimRank: Less is More for Reasoning-Intensive Information Reranking
    Tingyu Song, Yilun Zhao, Siyue Zhang, Chen Zhao, Arman Cohan
  • PlanGEN: A Multi-Agent Framework for Generating Planning and Reasoning Trajectories for Complex Problem Solving
    Mihir Parmar, Xin Liu, Palash Goyal, Yanfei Chen, Long Le, Swaroop Mishra, Hossein Mobahi, Jindong Gu, Zifeng Wang, Hootan Nakhost, Chitta Baral, Chen-Yu Lee, Tomas Pfister, Hamid Palangi
  • An Empirical Study on Strong-Weak Model Collaboration for Repo-level Code Generation
    Shubham Gandhi, Atharva Naik, Yiqing Xie, Carolyn Rose
  • What are Foundation Models Cooking in the Post-Soviet World?
    Anton Lavrouk, Tarek Naous, Alan Ritter, Wei Xu
  • LogiDynamics: Unraveling the Dynamics of Inductive, Abductive and Deductive Logical Inferences in LLM Reasoning
    Tianshi Zheng, Cheng Jiayang, Chunyang Li, Haochen Shi, Zihao Wang, Jiaxin Bai, Yangqiu Song, Ginny Wong, Simon See
  • EcoLoRA: Communication-Efficient Federated Fine-Tuning of Large Language Models
    Han Liu, Ruoyao Wen, Srijith Nair, Jia Liu, Wenjing Lou, Chongjie Zhang, William Yeoh, Yevgeniy Vorobeychik, Ning Zhang
  • Memorization $\neq$ Understanding: Do Large Language Models Have the Ability of Scenario Cognition?
    Boxiang Ma, Ru Li, Wang Yuanlong, Hongye Tan, Xiaoli Li
  • Priority on High-Quality: Selecting Instruction Data via Consistency Verification of Noise Injection
    Hong Zhang, Feng Zhao, Ruilin Zhao, Cheng Yan, Kangzheng Liu
  • Can Prompts Rewind Time for LLMs? Evaluating the Effectiveness of Prompted Knowledge Cutoffs
    Xin Gao, Ruiyi Zhang, Sai Ashish Somayajula, Daniel Du, Saurabh Mahindre, Pengtao Xie
  • DSVD: Dynamic Self-Verify Decoding for Faithful Generation in Large Language Models
    YiQiu Guo, Yuchen Yang, Zhe Chen, Pingjie Wang, Yusheng Liao, Ya Zhang, Yanfeng Wang, Yu Wang
  • Metric Calculating Benchmark: Code-Verifiable Complicate Instruction Following Benchmark for Large Language Models
    Hyeonseok Moon, Seongtae Hong, Jaehyung Seo, Heuiseok Lim
  • Generative Annotation for ASR Named Entity Correction
    Yuanchang Luo, Daimeng Wei, Shaojun Li, Hengchao Shang, Jiaxin GUO, Zongyao Li, Zhanglin Wu, Xiaoyu Chen, Zhiqiang Rao, Jinlong Yang, Hao Yang
  • SOLAR: Towards Characterizing Subjectivity of Individuals through Modeling Value Conflicts and Trade-offs
    Younghun Lee, Dan Goldwasser
  • LogicTree: Structured Proof Exploration for Coherent and Rigorous Logical Reasoning with Large Language Models
    Kang He, Kaushik Roy
  • Unmasking Fake Careers: Detecting Machine-Generated Career Trajectories via Multi-layer Heterogeneous Graphs
    Michiharu Yamashita, Thanh Tran, Delvin Ce Zhang, Dongwon Lee
  • GAP: a Global Adaptive Pruning Method for Large Language Models
    Zhihua Ban, Haotian Ma, Siheng Zhang, Shengyu Liu, Xichen Chen, Ming Yang
  • Distribution Prompting: Understanding the Expressivity of Language Models Through the Next-Token Distributions They Can Produce
    Haojin Wang, Zining Zhu, Freda Shi
  • LGA: LLM-GNN Aggregation for Temporal Evolution Attribute Graph Prediction
    Feng Zhao, Ruoyu Chai, Kangzheng Liu, Xianggan Liu
  • EIFBENCH: Extremely Complex Instruction Following Benchmark for Large Language Models
    Tao Zou, Xinghua Zhang, Haiyang Yu, Minzheng Wang, Fei Huang, Yongbin Li
  • Tool Preferences in Agentic LLMs are Unreliable
    Kazem Faghih, Wenxiao Wang, Yize Cheng, Siddhant Bharti, Gaurang Sriramanan, Sriram Balasubramanian, Parsa Hosseini, Soheil Feizi
  • Enhancing Large Language Model for Knowledge Graph Completion via Structure-Aware Alignment-Tuning
    Yu Liu, Yanan Cao, Xixun Lin, Yanmin Shang, Shi Wang, Shirui Pan
  • MultiDocFusion : Hierarchical and Multimodal Chunking Pipeline for Enhanced RAG on Long Industrial Documents
    Joong Min Shin, Chanjun Park, Jeongbae Park, Jaehyung Seo, Heuiseok Lim
  • Attention-guided Self-reflection for Zero-shot Hallucination Detection in Large Language Models
    Qiang Liu, Xinlong Chen, Yue Ding, Bowen Song, Weiqiang Wang, Shu Wu, Liang Wang
  • ‘Rich Dad, Poor Lad’: How do Large Language Models Contextualize Socioeconomic Factors in College Admission ?
    Huy Nghiem, Phuong-Anh Nguyen-Le, John Prindle, Rachel Rudinger, Hal Daumé III
  • Understanding and Mitigating Overrefusal in LLMs from an Unveiling Perspective of Safety Decision Boundary
    Licheng Pan, Yongqi Tong, Xin Zhang, Xiaolu Zhang, JUN ZHOU, Zhixuan Chu
  • MMAG: Multimodal Learning for Mucus Anomaly Grading in Nasal Endoscopy via Semantic Attribute Prompting
    Xinpan Yuan, Mingzhu Huang, Liujie Hua, JianuoJu, XuZhang
  • The Emperor’s New Reasoning: Format Imitation Overshadows Genuine Mathematical Understanding in SFT
    Linyao Yang, Jian-Tao Huang, Yafei Lu, Zhenhui Jessie Li, Guirong Xue
  • Step Guided Reasoning: Improving Mathematical Reasoning using Guidance Generation and Step Reasoning
    Lang Cao, Yingtian Zou, Chao Peng, Renhong Chen, Wu Ning, Yitong Li
  • Flexibly Utilize Memory for Long-Term Conversation via a Fragment-then-Compose Framework
    Cai Ke, Yiming Du, Bin Liang, Yifan Xiang, Lin Gui, Zhongyang Li, Baojun Wang, Yue Yu, Hui Wang, Kam-Fai Wong, Ruifeng Xu
  • STRICT: Stress-Test of Rendering Image Containing Text
    Tianyu Zhang, Xinyu Wang, Zhenghan Tai, Lu Li, Jijun Chi, Jingrui Tian, Hailin He, Suyuchen Wang
  • A Sequential Multi-Stage Approach for Code Vulnerability Detection via Confidence- and Collaboration-based Decision Making
    Chung-Nan Tsai, Xin Wang, Cheng-Hsiung Lee, Ching-Sheng Lin
  • Leveraging Large Models to Evaluate Novel Content: A Case Study on Advertisement Creativity
    Zhaoyi Joey Hou, Adriana Kovashka, Xiang Lorraine Li
  • BIRD: Bronze Inscription Restoration and Dating
    Wenjie Hua, Hoang H Nguyen, Gangyan Ge
  • DCP: Dual-Cue Pruning for Efficient Large Vision-Language Models
    Lei Jiang, Zixun Zhang, Yuting Zeng, Chunzhao Xie, Tongxuan Liu, Zhen Li, Lechao Cheng, Xiaohua Xu
  • Improving Context Fidelity via Native Retrieval-Augmented Reasoning
    Suyuchen Wang, Jinlin Wang, Xinyu Wang, Shiqi Li, Xiangru Tang, Sirui Hong, Xiao-Wen Chang, Chenglin Wu, Bang Liu
  • Koel-TTS: Enhancing LLM based Speech Generation with Preference Alignment and Classifier Free Guidance
    Shehzeen Samarah Hussain, Paarth Neekhara, Xuesong Yang, Edresson Casanova, Subhankar Ghosh, Roy Fejgin, Mikyas T. Desta, Rafael Valle, Jason Li
  • Mixing Inference-time Experts for Enhancing LLM Reasoning
    Soumya Sanyal, Tianyi Xiao, Xiang Ren
  • Reinforced Query Reasoners for Reasoning-intensive Retrieval Tasks
    Xubo Qin, Jun Bai, Jiaqi Li, Zixia Jia, Zilong Zheng
  • TokenSelect: Efficient Long-Context Inference and Length Extrapolation for LLMs via Dynamic Token-Level KV Cache Selection
    Wei Wu, Zhuoshi Pan, Kun Fu, Chao Wang, Liyi Chen, Yunchu Bai, Tianfu Wang, Zheng Wang, Hui Xiong
  • MUSE: MCTS-Driven Red Teaming Framework for Enhanced Multi-Turn Dialogue Safety in Large Language Models
    Siyu Yan, Long Zeng, Xuecheng Wu, Chengcheng Han, Kongcheng Zhang, Chong Peng, Xuezhi Cao, Xunliang Cai, Chenjuan Guo
  • EnAnchored-X2X: English-Anchored Optimization for Many-to-Many Translation
    Sen Yang, Yu Bao, Yu Lu, Jiajun Chen, Shujian Huang, Shanbo Cheng
  • “I’ve Decided to Leak”: Probing Internals Behind Prompt Leakage Intents
    Jianshuo Dong, Yutong Zhang, Liu Yan, Zhenyu Zhong, Tao Wei, Tianwei Zhang, Ke Xu, Minlie Huang, Chao Zhang, Han Qiu
  • Nullspace Disentanglement for Red Teaming Language Models
    Yi Han, Yuanxing Liu, Weinan Zhang, Ting Liu
  • Supervised Attention Mechanism for Low-quality Multimodal Data
    Sijie Mai, Shiqin Han, Haifeng Hu
  • Reinforcement Learning for Large Language Models via Group Preference Reward Shaping
    Huaisheng Zhu, Siyuan Xu, Hangfan Zhang, Teng Xiao, Zhimeng Guo, Shijie Zhou, Shuyue Hu, Vasant G Honavar
  • zFLoRA: Zero-Latency Fused Low-Rank Adapters
    Dhananjaya Gowda, Seoha Song, Harshith Goka, Junhyun Lee
  • PLAN-TUNING: Post-Training Language Models to Learn Step-by-Step Planning for Complex Problem Solving
    Mihir Parmar, Palash Goyal, Xin Liu, Yiwen Song, Mingyang Ling, Chitta Baral, Hamid Palangi, Tomas Pfister
  • Semantic Inversion, Identical Replies: Revisiting Negation Blindness in Large Language Models
    Jinsung Kim, Seonmin Koo, Heuiseok Lim
  • AMACE: Automatic Multi-Agent Chart Evolution for Iteratively Tailored Chart Generation
    Hyuk Namgoong, Jeesu Jung, Hyeonseok Kang, Sangkeun Jung
  • ActionStudio: A Lightweight Framework for Data and Training of Large Action Models
    Jianguo Zhang, Thai Quoc Hoang, Ming Zhu, Zuxin Liu, Shiyu Wang, Tulika Manoj Awalgaonkar, Akshara Prabhakar, Haolin Chen, Weiran Yao, Zhiwei Liu, Juntao Tan, Juan Carlos Niebles, Shelby Heinecke, Huan Wang, Silvio Savarese, Caiming Xiong
  • Interpretation Meets Safety: A Survey on Interpretation Methods and Tools for Improving LLM Safety
    Seongmin Lee, Aeree Cho, Grace C. Kim, ShengYun Peng, Mansi Phute, Duen Horng Chau
  • Unveiling the Response of Large Vision-Language Models to Visually Absent Tokens
    Sohee Kim, Soohyun Ryu, Joonhyung Park, Eunho Yang
  • Improving Task Diversity in Label Efficient Supervised Finetuning of LLMs
    Abhinav Arabelly, Jagrut Nemade, Robert D Nowak, Jifan Zhang
  • Look Beyond Feeling: Unveiling Latent Needs from Implicit Expressions for Proactive Emotional Support
    Xing Fu, Haozhen Li, Bichen Wang, Hao Yang, Yanyan Zhao, Bing Qin
  • s3: You Don’t Need That Much Data to Train a Search Agent via RL
    Pengcheng Jiang, Xueqiang Xu, Jiacheng Lin, Jinfeng Xiao, Zifeng Wang, Jimeng Sun, Jiawei Han
  • FuseChat: Knowledge Fusion of Chat Models
    Fanqi Wan, Longguang Zhong, Ziyi Yang, Ruijun Chen, Xiaojun Quan
  • Continuous-Time Attention: PDE-Guided Mechanisms for Long-Sequence Transformers
    YUKUN ZHANG, Xueqing Zhou
  • Forget What You Know about LLMs Evaluations - LLMs are Like a Chameleon
    Nurit Cohen Inger, Yehonatan Elisha, Bracha Shapira, Lior Rokach, Seffi Cohen
  • Memorization or Reasoning? Exploring the Idiom Understanding of LLMs
    Jisu Kim, Youngwoo Shin, Uiji Hwang, Jihun Choi, richeng xuan, Taeuk Kim
  • RD-MCSA: A Multi-Class Sentiment Analysis Approach Integrating In-Context Classification Rationales and Demonstrations
    Haihua Xie, Yinzhu Cheng, Yaqing Wang, Miao He, Mingming Sun
  • Puzzled by Puzzles: When Vision-Language Models Can’t Take a Hint
    Heekyung Lee, Jiaxin Ge, Tsung-Han Wu, Minwoo Kang, Trevor Darrell, David M. Chan
  • CREPE: Rapid Chest X-ray Report Evaluation by Predicting Multi-category Error Counts
    Gihun Cho, Seunghyun Jang, Hanbin Ko, Inhyeok Baek, Chang Min Park
  • TIDES: Technical Information Discovery and Extraction System
    Jihee Kim, Subeen Park, Hakyung Lee, YongTaek Lim, Hyo-won Suh, Kyungwoo Song
  • Learning to Ask: When LLM Agents Meet Unclear Instruction
    Wenxuan Wang, SHI Juluan, Zixuan Ling, Yuk-Kit Chan, Chaozheng Wang, Cheryl Lee, Youliang Yuan, Jen-tse Huang, Wenxiang Jiao, Michael R. Lyu
  • RICO: Improving Accuracy and Completeness in Image Recaptioning via Visual Reconstruction
    Yuchi Wang, Yishuo Cai, Shuhuai Ren, Sihan Yang, Linli Yao, Yuanxin Liu, Yuanxing Zhang, Pengfei Wan, Xu Sun
  • StepSearch: Igniting LLMs Search Ability via Step-Wise Proximal Policy Optimization
    Xuhui Zheng, Kang An, Ziliang Wang, Yuhang Wang, Yichao Wu
  • Dynamic Model-Bank Test-Time Adaptation for Automatic Speech Recognition
    Yanshuo Wang, Yanghao Zhou, Yukang Lin, Haoxing Chen, Jin Zhang, Wentao Zhu, Jie Hong, Xuesong Li
  • Mitigating Catastrophic Forgetting in Large Language Models with Forgetting-aware Pruning
    Wei Huang, Anda Cheng, Yinggui Wang
  • Does Localization Inform Unlearning? A Rigorous Examination of Local Parameter Attribution for Knowledge Unlearning in Language Models
    Hwiyeong Lee, Uiji Hwang, Hyelim Lim, Taeuk Kim
  • ArgCMV: An Argument Summarization Benchmark for the LLM-era
    Omkar Gurjar, Agam Goyal, Eshwar Chandrasekharan
  • VistaWise: Building Cost-Effective Agent with Cross-Modal Knowledge Graph for Minecraft
    Honghao Fu, Junlong Ren, Qi Chai, Deheng Ye, Yujun Cai, Hao Wang
  • GraphKV: Breaking the Static Selection Paradigm with Graph-Based KV Cache Eviction
    Xuelin Li, Xiangqi Jin, Linfeng Zhang
  • Joint Modeling of Entities and Discourse Relations for Coherence Assessment
    Wei Liu, Michael Strube
  • Understanding and Leveraging the Expert Specialization of Context Faithfulness in Mixture-of-Experts LLMs
    Jun Bai, Minghao Tong, Yang Liu, Zixia Jia, Zilong Zheng
  • HMoE: Heterogeneous Mixture of Experts for Language Modeling
    An Wang, Xingwu Sun, Ruobing Xie, Shuaipeng Li, Jiaqi Zhu, Zhen Yang, Pinxue Zhao, Weidong Han, Zhanhui Kang, Di Wang, Naoaki Okazaki, Cheng-zhong Xu
  • The Ranking Blind Spot: Decision Hijacking in LLM-based Text Ranking
    Yaoyao Qian, Yifan Zeng, Yuchao Jiang, Chelsi Jain, Huazheng Wang
  • Uniform Information Density and Syntactic Reduction: Revisiting that-Mentioning in English Complement Clauses
    Hailin Hao, Elsi Kaiser
  • GRIT: Guided Relational Integration for Efficient Multi-Table Understanding
    Yujin Kang, Park Seong Woo, Yoon-Sik Cho
  • RPDR: A Round-trip Prediction-Based Data Augmentation Framework for Long-Tail Question Answering
    Yiming Zhang, Siyue Zhang, Junbo Zhao, Chen Zhao
  • Discrepancy Detection at the Data Level: Toward Consistent Multilingual Question Answering
    Lorena Calvo-Bartolomé, Valérie Aldana, Karla Cantarero, Alonso Madroñal de Mesa, Jerónimo Arenas-García, Jordan Lee Boyd-Graber
  • Data-Efficient Selection via Grammatical Complexity in Continual Pre-training of Domain-Specific LLMs
    Yizhou Ying, Geng Zhang, Cui Danxin, Chengyu Du, Guanglei Yue, Sihang Jiang, Jiaqing Liang, Yifei Fu, Hailin Hu, Yanghua Xiao
  • Comprehensive and Efficient Distillation for Lightweight Sentiment Analysis Models
    Guangyu Xie, Yice Zhang, Jianzhu Bao, Qianlong Wang, Yang Sun, Bingbing Wang, Ruifeng Xu
  • One Planner To Guide Them All ! Learning Adaptive Conversational Planners for Goal-oriented Dialogues
    Huy Quang Dao, Lizi Liao
  • Unsupervised Hallucination Detection by Inspecting Reasoning Processes
    Ponhvoan Srey, Xiaobao Wu, Anh Tuan Luu
  • Multimodal Neural Machine Translation: A Survey of the State of the Art
    Yi Feng, Chuanyi Li, Jiatong He, Zhenyu Hou, Vincent Ng
  • Lemmatization of Polish Multi-word Expressions
    Magdalena Król, Aleksander Smywiński-Pohl, Zbigniew Kaleta, Paweł Lewkowicz
  • Targeted Distillation for Sentiment Analysis
    Yice Zhang, Guangyu Xie, Jingjie Lin, Jianzhu Bao, Qianlong Wang, Xi Zeng, Ruifeng Xu
  • DiffusionAttacker: Diffusion-Driven Prompt Manipulation for LLM Jailbreak
    Hao Wang, Hao Li, Junda Zhu, Xinyuan Wang, Chengwei Pan, Minlie Huang, Lei Sha
  • Rank-Awareness and Angular Constraints: A New Perspective on Learning Sentence Embeddings from NLI Data
    Zicheng Zhou, Min Huang, Qinghai Miao
  • LLM-Guided Semantic Relational Reasoning for Multimodal Intent Recognition
    Qianrui Zhou, Hua Xu, Yifan Wang, Xinzhi Dong, Hanlei Zhang
  • Seeing Culture: A Benchmark for Visual Reasoning and Grounding
    Burak Satar, Zhixin Ma, Patrick Amadeus Irawan, Wilfried Ariel Mulyawan, Jing Jiang, Ee-Peng Lim, Chong-Wah Ngo
  • GRADA: Graph-based Reranking against Adversarial Documents Attack
    Jingjie Zheng, Aryo Pradipta Gema, Giwon Hong, Xuanli He, Pasquale Minervini, Youcheng Sun, Qiongkai Xu
  • Orchestrating Audio: Multi-Agent Framework for Long-Video Audio Synthesis
    Yehang Zhang, Xinli Xu, Xiaojie Xu, Doudou ZHANG, Li Liu, Ying-Cong Chen
  • MADAWSD: Multi-Agent Debate Framework for Adversarial Word Sense Disambiguation
    Kaiyuan Zhang, Qian Liu, Luyang Zhang, Chaoqun Zheng, Shuaimin Li, Bing Xu, Muyun Yang, Xinxiao Qiao, Wenpeng Lu
  • Interpretable Text Embeddings and Text Similarity Explanation: A Survey
    Juri Opitz, Lucas Moeller, Andrianos Michail, Sebastian Padó, Simon Clematide
  • Dyve: Thinking Fast and Slow for Dynamic Process Verification
    Jianyuan Zhong, Zeju Li, Zhijian Xu, Xiangyu Wen, Qiang Xu
  • PERSEVAL: A Framework for Perspectivist Classification Evaluation
    Soda Marem Lo, Silvia Casola, Erhan Sezerer, Valerio Basile, Franco Sansonetti, Antonio Uva, Davide Bernardi
  • Massive Supervised Fine-tuning Experiments Reveal How Data, Layer, and Training Factors Shape LLM Alignment Quality
    Yuto Harada, Yusuke Yamauchi, Yusuke Oda, Yohei Oseki, Yusuke Miyao, Yu Takagi
  • IndiGEC: Multilingual Grammar Error Correction for Low-Resource Indian Languages
    Ujjwal Sharma, Pushpak Bhattacharyya
  • Bias Beware: The Impact of Cognitive Biases on LLM-Driven Product Recommendations
    Giorgos Filandrianos, Angeliki Dimitriou, Maria Lymperaiou, Konstantinos Thomas, Giorgos Stamou
  • T2R-BENCH: A Benchmark for Real World Table-to-Report Task
    JieZhangChinaTele, Changzai Pan, Sishi Xiong, Kaiwen Wei, Yu Zhao, xiangyu Li, Jiaxin Peng, Xiaoyan Gu, Jian Yang, Wenhan Chang, Zhenhe Wu, Jiang Zhong, Shuangyong Song, Xuelong Li
  • TCP: a Benchmark for Temporal Constraint-Based Planning
    Zifeng Ding, Sikuan Yan, Moy Yuan, Xianglong Hu, Fangru Lin, Andreas Vlachos
  • The Role of Outgoing Connection Heterogeneity in Feedforward Layers of Large Language Models
    Felix Stahlberg, Shankar Kumar
  • Follow the Flow: Fine-grained Flowchart Attribution with Neurosymbolic Agents
    Manan Suri, Puneet Mathur, Nedim Lipka, Franck Dernoncourt, Ryan A. Rossi, Vivek Gupta, Dinesh Manocha
  • Collaborative Rational Speech Act: Pragmatic Reasoning for Multi-Turn Dialog
    Lautaro Estienne, Gabriel Ben Zenou, Nona Naderi, Jackie CK Cheung, Pablo Piantanida
  • Understanding Subword Compositionality of Large Language Models
    Qiwei Peng, Yekun Chai, Anders Søgaard
  • Internal Chain-of-Thought: Empirical Evidence for Layer‑wise Subtask Scheduling in LLMs
    Zhipeng Yang, Junzhuo Li, Siyu Xia, Xuming Hu
  • From Understanding to Generation: An Efficient Shortcut for Evaluating Language Models
    Viktor Hangya, Fabian Küch, Darina Gold
  • Debiasing Multilingual LLMs in Cross-lingual Latent Space
    Qiwei Peng, Guimin Hu, Yekun Chai, Anders Søgaard
  • Context is Gold to find the Gold Passage: Evaluating and Training Contextual Document Embeddings
    Max Conti, Manuel Faysse, Gautier Viaud, Antoine Bosselut, CELINE HUDELOT, Pierre Colombo
  • MS-RAG: Simple and Effective Multi-Semantic Retrieval-Augmented Generation
    Xiaozhou You, Yahui Luo, Lihong Gu
  • Transitive self-consistency evaluation of NLI models without gold labels
    Wei Wu, Mark Last
  • MiLQ: Benchmarking IR Models for Bilingual Web Search with Mixed Language Queries
    Jonghwi Kim, Deokhyung Kang, Seonjeong Hwang, Yunsu Kim, Jungseul Ok, Gary Lee
  • Enhancing Chinese Offensive Language Detection with Homophonic Perturbation
    Junqi Wu, Jishujie, Kang Zhong, Huiling Peng, Zhendongxiao, Xiongding Liu, Wu Wei
  • Persona-Augmented Benchmarking: Evaluating LLMs Across Diverse Writing Styles
    Kimberly Truong, Riccardo Fogliato, Hoda Heidari, Steven Wu
  • Computational Analysis of Character Development in Holocaust Testimonies
    Esther Shizgal, Eitan Wagner, Renana Keydar, Omri Abend
  • TASO: Task-Aligned Sparse Optimization for Parameter-Efficient Model Adaptation
    Daiye Miao, Yufang Liu, Jie Wang, Changzhi Sun, Yunke Zhang, Demei Yan, Shaokang Dong, Qi Zhang, Yuanbin Wu
  • Dual-Path Counterfactual Integration for Multimodal Aspect-Based Sentiment Classification
    Rui Liu, Jiahao Cao, Jiaqian Ren, Xu Bai, Yanan Cao
  • Job Unfair: An Investigation of Gender and Occupational Bias in Free-Form Text Completions by LLMs
    Camilla Casula, Sebastiano Vecellio Salto, Elisa Leonardelli, Sara Tonelli
  • C3: A Bilingual Benchmark for Spoken Dialogue Models Exploring Challenges in Complex Conversations
    Chengqian Ma, Wei Tao, Steven Y. Guo
  • Understanding LLMs’ Cross-Lingual Context Retrieval: How Good It Is And Where It Comes From
    Changjiang Gao, Hankun Lin, Xin Huang, Xue Han, Junlan Feng, Chao Deng, Jiajun Chen, Shujian Huang
  • Blind Men and the Elephant: Diverse Perspectives on Gender Stereotypes in Benchmark Datasets
    Mahdi Zakizadeh, Mohammad Taher Pilehvar
  • Linguistic and Embedding-Based Profiling of Texts Generated by Humans and Large Language Models
    Sergio E. Zanotto, Segun Aroyehun
  • An Interdisciplinary Approach to Human-Centered Machine Translation
    Marine Carpuat, Omri Asscher, Kalika Bali, Luisa Bentivogli, Fred Blain, Lynne Bowker, Monojit Choudhury, Hal Daumé III, Kevin Duh, Ge Gao, Alvin C Grissom II, Marzena Karpinska, Elaine C Khoong, William D. Lewis, Andre Martins, Mary Nurminen, Douglas W. Oard, Maja Popovic, Michel Simard, François Yvon
  • Exploring the Hidden Capacity of LLMs for One-Step Text Generation
    Gleb Mezentsev, Ivan Oseledets
  • Mixture of Weight-shared Heterogeneous Group Attention Experts for Dynamic Token-wise KV Optimization
    Guanghui Song, Dongping Liao, Yiren Zhao, Kejiang Ye, Cheng-zhong Xu, Xitong Gao
  • PathwiseRAG: Multi-Dimensional Exploration and Integration Framework
    Hengrui Zhang, Pin-Siang Huang, Zhen Zhang, Peican Lin, Yao-Ching Yu, Bo Hu, Yulu Du
  • “Mm, Wat?” Detecting Other-intiated Repair Requests in Dialogue
    Anh Ha Ngo, Nicolas Rollet, Catherine Pelachaud, Chloé Clavel
  • R-BPE: Improving BPE-Tokenizers with Token Reuse
    Nancy Hamdan, Osama Rakan Al Mraikhat, Fadi zaraket
  • Language Models Can be Efficiently Steered via Minimal Embedding Layer Transformations
    Diogo Tavares, David Semedo, Joao Magalhaes, Alexander Rudnicky
  • Adversarial Attacks Against Automated Fact-Checking: A Survey
    Fanzhen Liu, Sharif Abuadbba, Kristen Moore, Surya Nepal, Cecile Paris, Jia Wu, Jian Yang, Quan Z. Sheng
  • WildDoc: How Far Are We from Achieving Comprehensive and Robust Document Understanding in the Wild?
    An-Lan Wang, Jingqun Tang, Lei Liao, Hao Feng, Qi Liu, Xiang Fei, Jinghui Lu, Han Wang, Hao Liu, Yuliang Liu, Xiang Bai, Can Huang
  • DCR: Quantifying Data Contamination in LLMs Evaluation
    Cheng Xu, Nan Yan, Shuhao Guan, Changhong Jin, Yuke Mei, Yibing Guo, Tahar Kechadi
  • Building Trust in Clinical LLMs: Bias Analysis and Dataset Transparency
    Svetlana Maslenkova, Clement Christophe, Marco AF Pimentel, Tathagata Raha, Muhammad Umar Salman, Ahmed Al Mahrooqi, Avani Gupta, Shadab Khan, Ronnie Rajan, Praveenkumar Kanithi
  • Surprise Calibration for Better In-Context Learning
    Zhihang Tan, Jingrui Hou, Ping Wang, Qibiao Hu, Peng Zhu
  • SPARK: Simulating the Co-evolution of Stance and Topic Dynamics in Online Discourse with LLM-based Agents
    Bowen Zhang, Yi Yang, Fuqiang Niu, Xianghua Fu, Genan Dai, Hu Huang
  • Drivel-ology: Challenging LLMs with Interpreting Nonsense with Depth
    Yang Wang, Chenghao Xiao, Chia-Yi Hsiao, Zi Yan Chang, Chi-Li Chen, Tyler Loakman, Chenghua Lin
  • Can Large Language Models be Effective Online Opinion Miners?
    Ryang Heo, Yongsik Seo, JunseongLee, Dongha Lee
  • Can Large Language Models Translate Unseen Languages in Underrepresented Scripts?
    Dianqing Lin, Aruukhan, Hongxu Hou, shuo sun, Wei Chen, Yichen Yang, Guo dong Shi
  • KCS: Diversify Multi-hop Question Generation with Knowledge Composition Sampling
    Yangfan Wang, Jie Liu, Chen Tang, Lian Yan, Jingchi Jiang
  • Fooling the LVLM Judges: Visual Biases in LVLM-Based Evaluation
    Yerin Hwang, Dongryeol Lee, Kyungmin Min, taegwan kang, Yongil Kim, Kyomin Jung
  • Disentangled Information Bottleneck for Adversarial Text Defense
    Yidan Xu, Xinghao Yang, Wei Liu, Bao-di Liu, Weifeng Liu
  • How do Language Models Reshape Entity Alignment? A Survey of LM-Driven EA Methods: Advances, Benchmarks, and Future
    Zerui Chen, huiming fan, Qianyu Wang, Tao He, Ming Liu, Heng Chang, Weijiang Yu, Ze Li, Bing Qin
  • Enhancing LLM-Based Social Bot via an Adversarial Learning Framework
    Fanqi Kong, Xiaoyuan Zhang, Xinyu Chen, Yaodong Yang, Song-Chun Zhu, Xue Feng
  • GER-LLM: Efficient and Effective Geospatial Entity Resolution with Large Language Model
    Haojia Zhu, Zhicheng Li, Jiahui Jin
  • CodeRAG: Finding Relevant and Necessary Knowledge for Retrieval-Augmented Repository-Level Code Completion
    Sheng Zhang, Yifan Ding, Shuquan Lian, Shun Song, Hui Li
  • Searching for the Most Human-like Emergent Language
    Brendon Boldt, David R. Mortensen
  • Does Context Matter? A Prosodic Comparison of English and Spanish in Monolingual and Multilingual Discourse Settings
    Debasmita Bhattacharya, David Sasu, Michela Marchini, Natalie Schluter, Julia Hirschberg
  • ZERA: Zero-init Instruction Evolving Refinement Agent – From Zero Instructions to Structured Prompts via Principle-based Optimization
    Seungyoun Yi, Minsoo Khang, Sungrae Park
  • Toward Machine Interpreting: Lessons from Human Interpreting Studies
    Matthias Sperber, Maureen de Seyssel, Jiajun Bao, Matthias Paulik
  • FlashAdventure: A Benchmark for GUI Agents Solving Full Story Arcs in Diverse Adventure Games
    Jaewoo Ahn, Junseo Kim, Heeseung Yun, Jaehyeon Son, Dongmin Park, Jaewoong Cho, Gunhee Kim
  • FLARE: Faithful Logic-Aided Reasoning and Exploration
    Erik Arakelyan, Pasquale Minervini, Patrick Lewis, Pat Verga, Isabelle Augenstein
  • Discourse-Driven Code-Switching: Analyzing the Role of Content and Communicative Function in Spanish-English Bilingual Speech
    Debasmita Bhattacharya, Juan Junco, Divya Tadimeti, Julia Hirschberg
  • Can Large Language Models Translate Spoken-Only Languages through International Phonetic Transcription?
    Jiale Chen, Xuelian Dong, Qihao Yang, Wenxiu Xie, Tianyong Hao
  • ClimateViz: A Benchmark for Statistical Reasoning and Fact Verification on Scientific Charts
    Ruiran Su, Jiasheng Si, Zhijiang Guo, Janet B. Pierrehumbert
  • Bridging the Gap Between Molecule and Textual Descriptions via Substructure-aware Alignment
    Hyuntae Park, Yeachan Kim, SangKeun Lee
  • SLlama: Parameter-Efficient Language Model Architecture for Enhanced Linguistic Competence Under Strict Data Constraints
    Victor Adelakun Omolaoye, Babajide Alamu Owoyele, Gerard de Melo
  • What You See is What You Ask: Evaluating Audio Descriptions
    Divy Kala, Eshika Khandelwal, Makarand Tapaswi
  • TAPS: Tool-Augmented Personalisation via Structured Tagging
    Ekaterina Taktasheva, Jeff Dalton
  • Investigating How Pre-training Data Leakage Affects Models’ Reproduction and Detection Capabilities
    Masahiro Kaneko, Timothy Baldwin
  • Walk and Read Less: Improving the Efficiency of Vision-and-Language Navigation via Tuning-Free Multimodal Token Pruning
    Wenda Qin, Andrea Burns, Bryan A. Plummer, Margrit Betke
  • Connecting the Knowledge Dots: Retrieval-augmented Knowledge Connection for Commonsense Reasoning
    Junho Kim, Soyeon Bak, Mingyu Lee, Minju Hong, Songha Kim, Tae-Eui Kam, SangKeun Lee
  • Agent-as-Judge for Factual Summarization of Long Narratives
    Yeonseok Jeong, Minsoo Kim, seung-won hwang, Byung-Hak Kim
  • DnDScore: Decontextualization and Decomposition for Factuality Verification in Long-Form Text Generation
    Miriam Wanner, Benjamin Van Durme, Mark Dredze
  • RAcQUEt: Unveiling the Dangers of Overlooked Referential Ambiguity in Visual LLMs
    Alberto Testoni, Barbara Plank, Raquel Fernández
  • Resource-Rational Noisy-Channel Language Processing: Testing the Effect of Algorithmic Constraints on Inferences
    Thomas Hikaru Clark, Jacob Hoover Vigly, Edward Gibson, Roger P. Levy
  • In Benchmarks We Trust … Or Not?
    Ine Gevers, Victor De Marez, Jens Van Nooten, Jens Lemmens, Andriy Kosar, Ehsan Lotfi, Nikolay Banar, Pieter Fivez, Luna De Bruyne, Walter Daelemans
  • Video2Roleplay: A Multimodal Dataset and Framework for Video-Guided Role-playing Agents
    Xueqiao Zhang, Chao Zhang, Jingtao Xu, Yifan Zhu, Xin Shi, Yi Yang, Yawei Luo
  • Discriminating Form and Meaning in Multilingual Models with Minimal-Pair ABX Tasks
    Maureen de Seyssel, Jie Chi, Skyler Seto, Maartje Ter Hoeve, Masha Fedzechkina, Natalie Schluter
  • Rethinking Text-based Protein Understanding: Retrieval or LLM?
    Juntong Wu, Zijing Liu, He CAO, Li Hao, Bin Feng, Zishan Shu, Ke Yu, Li Yuan, Yu Li
  • Grounded Semantic Role Labelling from Synthetic Multimodal Data for Situated Robot Commands
    Claudiu Daniel Hromei, Antonio Scaiella, Danilo Croce, Roberto Basili
  • Easy as PIE? Identifying Multi-Word Expressions with LLMs
    Kai Golan Hashiloni, Ofri Hefetz, Kfir Bar
  • Query-Focused Retrieval Heads Improve Long-Context Reasoning and Re-ranking
    Wuwei Zhang, Fangcong Yin, Howard Yen, Danqi Chen, Xi Ye
  • Robust Adaptation of Large Multimodal Models for Retrieval Augmented Hateful Meme Detection
    Jingbiao Mei, Jinghong Chen, Guangyu Yang, Weizhe Lin, Bill Byrne
  • Audio-Reasoner: Improving Reasoning Capability in Large Audio Language Models
    Xie Zhifei, Mingbao Lin, Zihang Liu, Pengcheng Wu, Shuicheng YAN, Chunyan Miao
  • From perception to production: how acoustic invariance facilitates articulatory learning in a self-supervised vocal imitation model
    Marvin Lavechin, Thomas Hueber
  • REALM: Recursive Relevance Modeling for LLM-based Document Re-Ranking
    Pinhuan Wang, Zhiqiu Xia, Chunhua Liao, Feiyi Wang, Hang Liu
  • PLLuM-Align: Polish Preference Dataset for Large Language Model Alignment
    Karolina Seweryn, Anna Kołos, Agnieszka Karlińska, Katarzyna Lorenc, Katarzyna Dziewulska, Maciej Chrabaszcz, Aleksandra Krasnodębska, Paula Betscher, Zofia Cieślińska, Katarzyna Kowol, Julia Moska, Dawid Motyka, Paweł Walkowiak, Bartosz Żuk, Arkadiusz Janz
  • Graph-R1: Incentivizing the Zero-Shot Graph Learning Capability in LLMs via Explicit Reasoning
    Yicong Wu, Guangyue Lu, Yuan Zuo, Huarong Zhang, Junjie Wu
  • Scalable and Culturally Specific Stereotype Dataset Construction via Human-LLM Collaboration
    Weicheng Ma, John J. Guerrerio, Soroush Vosoughi
  • Can Large Language Models Be Good Language Teachers?
    LiQing Xu, Qiwei Li, Tianshuo Peng, Zuchao Li, hai zhao, Ping Wang
  • Empowering Math Problem Generation and Reasoning for Large Language Model via Synthetic Data based Continual Learning Framework
    Qian Wan, Wangzi Shi, Jintian Feng, Shengyingjie Liu, Luona Wei, Zhicheng Dai, Jianwen Sun
  • Tokenization and Representation Biases in Multilingual Models on Dialectal NLP Tasks
    Vani Kanjirangat, Tanja Samardzic, Ljiljana Dolamic, Fabio Rinaldi
  • Evaluating the Evaluators: Are readability metrics good measures of readability?
    Isabel Cachola, Daniel Khashabi, Mark Dredze
  • Text Takes Over: A Study of Modality Bias in Multimodal Intent Detection
    Ankan Mullick, Saransh Sharma, Abhik Jana, Pawan Goyal
  • What’s in a prompt? Language models encode literary style in prompt embeddings
    Raphaël Sarfati, Haley Moller, Toni J.B. Liu, Nicolas Boulle, Christopher Earls
  • Identifying and Answering Questions with False Assumptions: An Interpretable Approach
    Zijie Wang, Eduardo Blanco
  • VisFinEval: A Scenario-Driven Chinese Multimodal Benchmark for Holistic Financial Understanding
    Zhaowei Liu, Xin Guo, Haotian Xia, Lingfeng Zeng, Fangqi Lou, Jinyi Niu, Mengping Li, Qi Qi, Jiahuan Li, Wei Zhang, Yinglong Wang, Weige Cai, Weining Shen, Liwen Zhang
  • Socratic-MCTS: Test-Time Visual Reasoning by Asking the Right Questions
    David Acuna, Ximing Lu, Jaehun Jung, Hyunwoo Kim, Amlan Kar, Sanja Fidler, Yejin Choi
  • LLMs Don’t Know Their Own Decision Boundaries: The Unreliability of Self-Generated Counterfactual Explanations
    Harry Mayne, Ryan Othniel Kearns, Yushi Yang, Andrew M. Bean, Eoin D. Delaney, Chris Russell, Adam Mahdi
  • Grounding Multilingual Multimodal LLMs With Cultural Knowledge
    Jean de Dieu Nyandwi, Yueqi Song, Simran Khanuja, Graham Neubig
  • Following Length Constraints in Instructions
    Weizhe Yuan, Ilia Kulikov, Ping Yu, Kyunghyun Cho, Sainbayar Sukhbaatar, Jason E Weston, Jing Xu
  • Memory-QA: Answering Recall Questions Based on Multimodal Memories
    Hongda Jiang, Xinyuan Zhang, Siddhant Garg, Rishab Arora, Shiun-Zu Kuo, Jiayang Xu, AARON COLAK, Xin Luna Dong
  • NEXUS: Network Exploration for eXploiting Unsafe Sequences in Multi-Turn LLM Jailbreaks
    Javad Rafiei Asl, Sidhant Narula, Mohammad Ghasemigol, Eduardo Blanco, Daniel Takabi
  • Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching
    Simon A. Aytes, Jinheon Baek, Sung Ju Hwang
  • From Language to Cognition: How LLMs Outgrow the Human Language Network
    Badr AlKhamissi, Greta Tuckute, Yingtian Tang, Taha Osama A Binhuraib, Antoine Bosselut, Martin Schrimpf
  • Logos as a Well-Tempered Pre-train for Sign Language Recognition
    Ilya Ovodov, Petr Surovtsev, Karina Kvanchiani, Alexander Kapitanov, Alexander Nagaev
  • Hallucination Detection in LLMs Using Spectral Features of Attention Maps
    Jakub Binkowski, Denis Janiak, Albert Sawczyn, Bogdan Gabrys, Tomasz Jan Kajdanowicz
  • Composable Cross-prompt Essay Scoring by Merging Models
    Sanwoo Lee, Kun Liang, Yunfang Wu
  • Towards a Holistic and Automated Evaluation Framework for Multi-Level Comprehension of LLMs in Book-Length Contexts
    Yuho Lee, Jiaqi Deng, Nicole Hee-Yeon Kim, Hyangsuk Min, Taewon Yun, Minjeong Ban, Kim Yul, Hwanjun Song
  • Improving Large Language Models Function Calling and Interpretability via Guided-Structured Templates
    Hy Dang, Tianyi Liu, Zhuofeng Wu, Jingfeng Yang, Haoming Jiang, Tao Yang, Pei Chen, Zhengyang Wang, Helen Wang, Huasheng Li, Bing Yin, Meng Jiang
  • Evaluation and Facilitation of Online Discussions in the LLM Era: A Survey
    Katerina Korre, Dimitris Tsirmpas, Nikos Gkoumas, Emma Cabalé, Danai Myrtzani, Theodoros Evgeniou, Ion Androutsopoulos, John Pavlopoulos
  • Temporal Scaling Law for Large Language Models
    Yizhe Xiong, Xiansheng Chen, Xin Ye, Hui Chen, Zijia Lin, Haoran Lian, Zhenpeng Su, Wei Huang, Jianwei Niu, Jungong Han, Guiguang Ding
  • Reframe Your Life Story: Interactive Narrative Therapist and Innovative Moment Assessment with Large Language Models
    Yi Feng, Jiaqi Wang, Wenxuan Zhang, Zhuang Chen, Shen Yutong, Xiyao Xiao, Minlie Huang, Liping Jing, Jian Yu
  • From Word to World: Evaluate and Mitigate Culture Bias in LLMs via Word Association Test
    Xunlian Dai, Li Zhou, Benyou Wang, Haizhou Li
  • Mitigating the Privacy Issues in Retrieval-Augmented Generation (RAG) via Pure Synthetic Data
    Shenglai Zeng, Jiankun Zhang, Pengfei He, Jie Ren, Tianqi Zheng, Hanqing Lu, Han Xu, Hui Liu, Yue Xing, Jiliang Tang
  • AdaSteer: Your Aligned LLM is Inherently an Adaptive Jailbreak Defender
    Weixiang Zhao, Jiahe Guo, Yulin Hu, Yang Deng, An Zhang, Xingyu Sui, Xinyang Han, Yanyan Zhao, Bing Qin, Tat-Seng Chua, Ting Liu
  • Large Language Models Meet Knowledge Graphs for Question Answering: Synthesis and Opportunities
    Chuangtao Ma, Yongrui Chen, Tianxing Wu, Arijit Khan, Haofen Wang
  • TFDP: Token-Efficient Disparity Audits for Autoregressive LLMs via Single-Token Masked Evaluation
    Inderjeet Singh, Ramya Srinivasan, Roman Vainshtein, Hisashi Kojima
  • Hanfu-Bench: A Multimodal Benchmark on Cross-Temporal Cultural Understanding and Transcreation
    Li Zhou, Lutong Yu, Dongchu Xie, Shaohuan Cheng, Wenyan Li, Haizhou Li
  • MERMAID: Multi-perspective Self-reflective Agents with Generative Augmentation for Emotion Recognition
    Zhongyu Yang, Junhao Song, Siyang Song, Wei Pang, Yingfang Yuan
  • Personality Vector: Modulating Personality of Large Language Models by Model Merging
    Seungjong Sun, Seo Yeon Baek, Jang Hyun Kim
  • Beyond Outlining: Heterogeneous Recursive Planning for Adaptive Long-form Writing with Language Models
    Ruibin Xiong, Yimeng Chen, Dmitrii Khizbullin, Mingchen Zhuge, Jürgen Schmidhuber
  • Hidden in Plain Sight: Reasoning in Underspecified and Misspecified Scenarios for Multimodal LLMs
    Qianqi Yan, Hongquan Li, Shan Jiang, Yang Zhao, Xinze Guan, Ching-Chen Kuo, Xin Eric Wang
  • PrimeX: A Dataset of Worldview, Opinion, and Explanation
    Rik Koncel-Kedziorski, Brihi Joshi, Tim Paek
  • LASER: An LLM-based ASR Scoring and Evaluation Rubric
    Amruta Parulekar, Preethi Jyothi
  • Improving Zero-shot Sentence Decontextualisation with Content Selection and Planning
    Zhenyun Deng, Yulong Chen, Andreas Vlachos
  • Beyond Text: Unveiling Privacy Vulnerabilities in Multi-modal Retrieval-Augmented Generation
    Jiankun Zhang, Shenglai Zeng, Jie Ren, Tianqi Zheng, Hui Liu, Xianfeng Tang, Hui Liu, Yi Chang
  • Code Execution as Grounded Supervision for LLM Reasoning
    Dongwon Jung, Wenxuan Zhou, Muhao Chen
  • Subjective Behaviors and Preferences in LLM: Language of Browsing
    Sai Sundaresan, Harshita Chopra, Atanu R. Sinha, Koustava Goswami, Nagasai Saketh Naidu, Raghav Karan, N Anushka
  • Pixels Versus Priors: Controlling Knowledge Priors in Vision-Language Models through Visual Counterfacts
    Michal Golovanevsky, William Rudman, Michael A. Lepori, Amir Bar, Ritambhara Singh, Carsten Eickhoff
  • Balcony: A Lightweight Approach to Dynamic Inference of Generative Language Models
    Benyamin Jamialahmadi, Parsa Kavehzadeh, Mehdi Rezagholizadeh, Parsa Farinneya, Hossein Rajabzadeh, Aref Jafari, Boxing Chen, Marzieh S. Tahaei
  • Social Genome: Grounded Social Reasoning Abilities of Multimodal Models
    Leena Mathur, Marian Qian, Paul Pu Liang, Louis-Philippe Morency
  • Profiler: Black-box AI-generated Text Origin Detection via Context-aware Inference Pattern Analysis
    Hanxi Guo, Siyuan Cheng, Xiaolong Jin, ZHUO ZHANG, Guangyu Shen, Kaiyuan Zhang, Shengwei An, Guanhong Tao, Xiangyu Zhang
  • Speech Discrete Tokens or Continuous Features? A Comparative Analysis for Spoken Language Understanding in SpeechLLMs
    Dingdong WANG, Junan Li, Mingyu Cui, Dongchao Yang, Xueyuan Chen, Helen M. Meng
  • RAG-Zeval: Enhancing RAG Responses Evaluator through End-to-End Reasoning and Ranking-Based Reinforcement Learning​
    Kun LI, Yunxiang Li, Tianhua Zhang, Hongyin Luo, Xixin Wu, James R. Glass, Helen M. Meng
  • Mahānāma: A Unique Testbed for Literary Entity Discovery and Linking
    Sujoy Sarkar, Gourav Sarkar, Manoj Balaji Jagadeeshan, Jivnesh Sandhan, Amrith Krishna, Pawan Goyal
  • Adaptively profiling models with task elicitation
    Davis Brown, Prithvi Balehannina, Helen Jin, Shreya Havaldar, Hamed Hassani, Eric Wong
  • TactfulToM: Do LLMs have the Theory of Mind ability to understand White Lies?
    Yiwei Liu, Emma Jane Pretty, Jiahao Huang, Saku Sugawara
  • Don’t Sweat the Small Stuff: Segment-Level Meta-Evaluation Based on Pairwise Difference Correlation
    Colten DiIanni, Daniel Deutsch
  • SMART: Simulated Students Aligned with Item Response Theory for Question Difficulty Prediction
    Alexander Scarlatos, Nigel Fernandez, Christopher Ormerod, Susan Lottridge, Andrew Lan
  • HESEIA: A community-based dataset for evaluating social biases in large language models, co-designed in real school settings in Latin America
    Guido Ivetta, Marcos J Gomez, Sofía Martinelli, Pietro Palombini, M Emilia Echeveste, Nair Carolina Mazzeo, Beatriz Busaniche, Luciana Benotti
  • WebMMU: A Benchmark for Multimodal Multilingual Website Understanding and Code Generation
    Rabiul Awal, Mahsa Massoud, Aarash Feizi, Zichao Li, Suyuchen Wang, Christopher Pal, Aishwarya Agrawal, David Vazquez, Siva Reddy, Juan A. Rodriguez, Perouz Taslakian, Spandana Gella, Sai Rajeswar
  • Analyzing values about gendered language reform in LLMs’ revisions
    Jules Watson, Xi Wang, Raymond Liu, Suzanne Stevenson, Barend Beekhuizen
  • ALLabel: Three-stage Active Learning for LLM-based Entity Recognition using Demonstration Retrieval
    Zihan Chen, Lei Shi, Weize Wu, Qiji Zhou, Yue Zhang
  • HyperKGR: Knowledge Graph Reasoning in Hyperbolic Space with Graph Neural Network Encoding Symbolic Path
    Lihui Liu
  • LLaMP: Large Language Model Made Powerful for High-fidelity Materials Knowledge Retrieval
    Yuan Chiang, Elvis Hsieh, Chia-Hong Chou, Janosh Riebesell
  • ReSeeding Latent States for Sequential Language Understanding
    Stéphane Aroca-Ouellette, Katharina von der Wense, Alessandro Roncone
  • DPED: Multi-Layer Noise Distillation for Privacy-Preserving Text Embeddings
    Shuya Feng, Yuan Hong
  • Identifying & Interactively Refining Ambiguous User Goals for Data Visualization Code Generation
    Mert Inan, Anthony Sicilia, Alex Xie, Saujas Vaduguru, Daniel Fried, Malihe Alikhani
  • Morpheme Induction for Emergent Language
    Brendon Boldt, David R. Mortensen
  • Stepwise Informativeness Search for Improving LLM Reasoning
    Siyuan Wang, Enda Zhao, Xiang Ren
  • Social Good or Scientific Curiosity? Uncovering the Research Framing Behind NLP Artefacts
    Eric Chamoun, Nedjma Ousidhoum, Michael Sejr Schlichtkrull, Andreas Vlachos
  • FairGen: Controlling Sensitive Attributes for Fair Generations in Diffusion Models via Adaptive Latent Guidance
    Mintong Kang, Vinayshekhar Bannihatti Kumar, Shamik Roy, Abhishek Kumar, Sopan Khosla, Balakrishnan Murali Narayanaswamy, Rashmi Gangadharaiah
  • Contra4: Evaluating Contrastive Cross-Modal Reasoning in Audio, Video, Image, and 3D
    Artemis Panagopoulou, Le Xue, Honglu Zhou, silvio savarese, Ran Xu, Caiming Xiong, Chris Callison-Burch, Mark Yatskar, Juan Carlos Niebles
  • Proactive Hearing Assistants that Isolate Egocentric Conversations
    Guilin Hu, Malek Itani, Tuochao Chen, Shyamnath Gollakota
  • fLSA: Learning Semantic Structures in Document Collections Using Foundation Models
    Weijia Xu, Nebojsa Jojic, Nicolas Le Roux
  • SafeKey: Amplifying Aha-Moment Insights for Safety Reasoning
    Kaiwen Zhou, Xuandong Zhao, Jayanth Srinivasa, Gaowen Liu, Aosong Feng, Dawn Song, Xin Eric Wang
  • HypER: Literature-grounded Hypothesis Generation and Distillation with Provenance
    Rosni Vasu, Chandrayee Basu, Bhavana Dalvi Mishra, Cristina Sarasua, Peter Clark, Abraham Bernstein
  • Empowering GraphRAG with Knowledge Filtering and Integration
    Kai Guo, Harry Shomer, Shenglai Zeng, Haoyu Han, Yu Wang, Jiliang Tang
  • Interpretable Mnemonic Generation for Kanji Learning via Expectation-Maximization
    Jaewook Lee, Alexander Scarlatos, Andrew Lan
  • Refining Attention for Explainable and Noise-Robust Fact-Checking with Transformers
    Jean-Flavien Bussotti, Paolo Papotti
  • Harmful Prompt Laundering: Jailbreaking LLMs with Abductive Styles and Symbolic Encoding
    Seongho Joo, Hyukhun Koh, Kyomin Jung
  • Pathway to Relevance: How Cross-Encoders Implement a Semantic Variant of BM25
    Meng Lu, Catherine Chen, Carsten Eickhoff
  • Rewarding the Unlikely: Lifting GRPO Beyond Distribution Sharpening
    Andre Wang He, Daniel Fried, Sean Welleck
  • PhoniTale: Phonologically Grounded Mnemonic Generation for Typologically Distant Language Pairs
    Sana Kang, Myeongseok Gwon, Su Young Kwon, Jaewook Lee, Andrew Lan, Bhiksha Raj, Rita Singh
  • Amulet: Putting Complex Multi-Turn Conversations on the Stand with LLM Juries
    Sahana Ramnath, ANURAG MUDGIL, Brihi Joshi, Skyler Hallinan, Xiang Ren
  • Exploring Chain-of-Thought Reasoning for Steerable Pluralistic Alignment
    Yunfan Zhang, Kathleen McKeown, Smaranda Muresan
  • CMedCalc-Bench: A Fine-Grained Benchmark for Chinese Medical Calculations in LLM
    Yunyan Zhang, Zhihong Zhu, Xian Wu
  • Evaluating Robustness of Large Audio Language Models to Audio Injection: An Empirical Study
    Guanyu Hou, Jiaming He, Yinhang Zhou, Ji Guo, Yitong Qiao, Rui Zhang, Wenbo Jiang
  • How Far Can LLMs Improve from Experience? Measuring Test-Time Learning Ability in LLMs with Human Comparison
    Jiayin Wang, Zhiqiang Guo, Weizhi Ma, Min Zhang
  • Subtle Risks, Critical Failures: A Framework for Diagnosing Physical Safety of LLMs for Embodied Decision Making
    Yejin Son, Minseo Kim, Sungwoong Kim, Seungju Han, Jian Kim, Dongju Jang, Youngjae Yu, Chan Young Park
  • SwiftKV: Fast Prefill-Optimized Inference with Knowledge-Preserving Model Transformation
    Aurick Qiao, Zhewei Yao, Samyam Rajbhandari, Yuxiong He
  • Co-Eval: Augmenting LLM-based Evaluation with Machine Metrics
    Ling-I Wu, Weijie Wu, Minyu Chen, Jianxin Xue, Guoqiang Li
  • Sali4Vid: Saliency-Aware Video Reweighting and Adaptive Caption Retrieval for Dense Video Captioning
    MinJu Jeon, Si-Woo Kim, Ye-Chan Kim, HyunGee Kim, Dong-Jin Kim
  • Semantic Networks Extracted from Students’ Think-Aloud Data are Correlated with Students’ Learning Performance
    Pingjing Yang, Sullam Jeoung, Jennifer Cromley, Jana Diesner
  • Less is More: The Effectiveness of Compact Typological Language Representations
    York Hay Ng, Phuong Hanh Hoang, En-Shiun Annie Lee
  • Sparse Activation Editing for Reliable Instruction Following in Narratives
    Runcong Zhao, Chengyu Cao, Qinglin Zhu, Xiucheng Ly, Shun Shao, Lin Gui, Ruifeng Xu, Yulan He
  • Inceptive Transformers: Enhancing Contextual Representations through Multi-Scale Feature Learning Across Domains and Languages
    Asif Shahriar, Rifat Shahriyar, M Saifur Rahman
  • Causal Tree Extraction from Medical Case Reports: A Novel Task for Experts-like Text Comprehension
    Sakiko Yahata, Zhen Wan, Fei Cheng, Sadao Kurohashi, Hisahiko Sato, Ryozo Nagai
  • OWL: Probing Cross-Lingual Recall of Memorized Texts via World Literature
    Alisha Srivastava, Emir Kaan Korukluoglu, Minh Nhat Le, Duyen Tran, Chau Minh Pham, Marzena Karpinska, Mohit Iyyer
  • Enhanced Noun-Noun Compound Interpretation through Textual Enrichment
    Bingyang Ye, Jingxuan Tu, James Pustejovsky
  • ICL CIPHERS: Quantifying ‘‘Learning’’ in In-Context Learning via Substitution Ciphers
    Zhouxiang Fang, Aayush Mishra, Muhan Gao, Anqi Liu, Daniel Khashabi
  • Corrupted but Not Broken: Understanding and Mitigating the Negative Impacts of Corrupted Data in Visual Instruction Tuning
    Yunhao Gou, Hansi Yang, Zhili Liu, Kai Chen, Yihan Zeng, Lanqing HONG, Zhenguo Li, Qun Liu, Bo Han, James Kwok, Yu Zhang
  • Memory OS of AI Agent
    Jiazheng Kang, Mingming Ji, Zhe Zhao, Ting Bai
  • Rule Discovery for Natural Language Inference Data Generation Using Out-of-Distribution Detection
    Juyoung Han, Hyunsun Hwang, Changki Lee
  • Jigsaw-Puzzles: From Seeing to Understanding to Reasoning in Vision-Language Models
    Zesen Lyu, Dandan Zhang, Wei Ye, Fangdi Li, Zhihang Jiang, Yao Yang
  • Definition Generation for Word Meaning Modeling: Monolingual, Multilingual, and Cross-Lingual Perspectives
    Francesco Periti, Roksana Goworek, Haim Dubossarsky, Nina Tahmasebi
  • Language Model Based Text-to-Audio Generation: Anti-Causally Aligned Collaborative Residual Transformers
    Juncheng Wang, Chao Xu, Cheng Yu, Zhe Hu, Haoyu Xie, Guoqi Yu, Lei Shang, Shujun Wang
  • HELENE: Hessian Layer-wise Clipping and Gradient Annealing for Accelerating Fine-tuning LLM with Zeroth-order Optimization
    Huaqin Zhao, Jiaxi Li, Yi Pan, Shizhe Liang, Xiaofeng Yang, Fei Dou, Tianming Liu, Jin Lu
  • Zero-shot Multimodal Document Retrieval via Cross-modal Question Generation
    Yejin Choi, Jaewoo Park, Janghan Yoon, Saejin Kim, Jaehyun Jeon, Youngjae Yu
  • From Parameters to Performance: A Data-Driven Study on LLM Structure and Development
    Suqing Wang, Zuchao Li, Shi Luohe, Bo Du, hai zhao, Yun Li, Qianren Wang
  • Logical Reasoning with Outcome Reward Models for Test-Time Scaling
    Ramya Keerthy Thatikonda, Wray Buntine, Ehsan Shareghi
  • Speculating LLMs’ Chinese Training Data Pollution from Their Tokens
    Qingjie Zhang, Di Wang, Haoting Qian, Liu Yan, Tianwei Zhang, Ke Xu, Qi Li, Minlie Huang, Hewu Li, Han Qiu
  • NovelHopQA: Diagnosing Multi-Hop Reasoning Failures in Long Narrative Contexts
    Abhay Gupta, Kevin Zhu, Vasu Sharma, Sean O’Brien, Michael Lu
  • Weights-Rotated Preference Optimization for Large Language Models
    Chenxu Yang, Ruipeng Jia, Mingyu Zheng, Naibin Gu, Zheng Lin, Siyuan Chen, Weichong Yin, Hua Wu, Weiping Wang
  • The Stepwise Deception: Simulating the Evolution from True News to Fake News with LLM Agents
    Yuhan Liu, Zirui Song, Juntian Zhang, Xiaoqing Zhang, Xiuying Chen, Rui Yan
  • How to inject knowledge efficiently? Knowledge Infusion Scaling Law for Pre-training Large Language Models
    Kangtao Lv, Haibin Chen, Yujin Yuan, Langming Liu, Shilei Liu, Yongwei Wang, Wenbo Su, Bo Zheng
  • SMEC:Rethinking Matryoshka Representation Learning for Retrieval Embedding Compression
    Biao Zhang, Lixin Chen, Tong Liu, Bo Zheng
  • Reverse Prompt Engineering: A Zero-Shot, Genetic Algorithm Approach to Language Model Inversion
    Hanqing Li, Diego Klabjan
  • DiMo-GUI: Advancing Test-time Scaling in GUI Grounding via Modality-Aware Visual Reasoning
    Hang Wu, Hongkai Chen, Yujun Cai, Chang Liu, Qingwen Ye, Ming-Hsuan Yang, Yiwei Wang
  • SocioBench: Modeling Human Behavior in Sociological Surveys with Large Language Models
    jia WANG, Ziyu Zhao, Tingjuntao Ni, zhongyu wei
  • Financial Risk Relation Identification through Dual-view Adaptation
    Wei-Ning Chiu, Yu-Hsiang Wang, Andy Hsiao, Yu-Shiang Huang, Chuan-Ju Wang
  • CopySpec: Accelerating LLMs with Speculative Copy-and-Paste
    Razvan-Gabriel Dumitru, Minglai Yang, Vikas Yadav, Mihai Surdeanu
  • GRASP: Replace Redundant Layers with Adaptive Singular Parameters for Efficient Model Compression
    Kainan Liu, Yong Zhang, Ning Cheng, Zhitao Li, Shaojun Wang, Jing Xiao
  • GraphAgent: Agentic Graph Language Assistant
    Yuhao Yang, Jiabin Tang, Lianghao Xia, Xingchen Zou, Yuxuan Liang, Chao Huang
  • DDO: Dual-Decision Optimization for LLM-Based Medical Consultation via Multi-Agent Collaboration
    Zhihao Jia, Mingyi Jia, Junwen Duan, Jianxin Wang
  • FedMABench: Benchmarking Mobile GUI Agents on Decentralized Heterogeneous User Data
    WenHao Wang, Zijie Yu, Rui Ye, Jianqing Zhang, Guangyi Liu, Liang Liu, Siheng Chen, Yanfeng Wang
  • VLA-Mark: A cross modal watermark for large vision-language alignment models
    Shuliang Liu, Zheng Qi, Jesse Jiaxi Xu, Yibo Yan, He GENG, Junyan Zhang, Aiwei Liu, Peijie Jiang, Jia Liu, Yik-Cheung Tam, Xuming Hu
  • Sentence Smith: Controllable Edits for Evaluating Text Embeddings
    Hongji Li, Andrianos Michail, Reto Gubelmann, Simon Clematide, Juri Opitz
  • ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning
    Yu Sun, Xingyu Qian, Weiwen Xu, Hao Zhang, Chenghao Xiao, Long Li, Yu Rong, Wenbing Huang, Qifeng Bai, Tingyang Xu
  • Decoding Dense Embeddings: Sparse Autoencoders for Interpreting and Discretizing Dense Retrieval
    Seongwan park, Taeklim Kim, Youngjoong Ko
  • UICOMPASS: UI Map Guided Mobile Task Automation via Adaptive Action Generation
    Yuanzhang Lin, Zhe Zhang, He Rui, Qingao Dong, Mingyi Zhou, Jing Zhang, Xiang Gao, Hailong Sun
  • Leaky Thoughts: Large Reasoning Models Are Not Private Thinkers
    Tommaso Green, Martin Gubri, Haritz Puerto, Sangdoo Yun, Seong Joon Oh
  • Model Unlearning via Sparse Autoencoder Subspace Guided Projections
    Xu Wang, Zihao Li, Benyou Wang, Yan Hu, Difan Zou
  • ConvSearch-R1: Enhancing Query Reformulation for Conversational Search with Reasoning via Reinforcement Learning
    Changtai Zhu, Siyin Wang, Ruijun Feng, Kai Song, Xipeng Qiu
  • How to Make Large Language Models Generate 100% Valid Molecules?
    Wen Tao, Jing Tang, Alvin Chan, Bryan Hooi, Baolong Bi, Nanyun Peng, Yuansheng Liu, Yiwei Wang
  • Exploring Quality and Diversity in Synthetic Data Generation for Argument Mining
    Jianzhu Bao, Yuqi Huang, Yang Sun, Wenya Wang, Yice Zhang, Bojun Jin, Ruifeng Xu
  • Dynamic Jointly Batch Selection for Data Efficient Machine Translation Fine-Tuning
    Mohammad Amin Ghanizadeh, Mohammad Javad Dousti
  • 3MDBench: Medical Multimodal Multi-agent Dialogue Benchmark
    Ivan Sviridov, Amina Miftahova, Tereshchenko Artemiy Vladimirovich, Galina Zubkova, Pavel Blinov, Andrey Savchenko
  • OpenTuringBench: An Open-Model-based Benchmark and Framework for Machine-Generated Text Detection and Attribution
    Lucio La Cava, Andrea Tagarelli
  • CRITICTOOL: Evaluating Self-Critique Capabilities of Large Language Models in Tool-Calling Error Scenarios
    Shiting Huang, Zhen Fang, Zehui Chen, Siyu Yuan, Junjie Ye, Yu Zeng, Lin Chen, Qi Mao, Feng Zhao
  • Pre-trained Language Models Learn Remarkably Accurate Representations of Numbers
    Marek Kadlčík, Michal Štefánik, Timothee Mickus, Josef Kuchař, Michal Spiegel
  • Enhancing Large Vision-Language Models with Ultra-Detailed Image Caption Generation
    Yu Zeng, Yukun Qi, Yiming Zhao, Xikun Bao, Lin Chen, Zehui Chen, Shiting Huang, Jie Zhao, Feng Zhao
  • Translate Smart, not Hard: Cascaded Translation Systems with Quality-Aware Deferral
    António Farinhas, Nuno M Guerreiro, Sweta Agrawal, Ricardo Rei, Andre Martins
  • iVISPAR — An Interactive Visual-Spatial Reasoning Benchmark for VLMs
    Julius Mayer, Mohamad Ballout, Serwan Jassim, Farbod Nosrat Nezami, Elia Bruni
  • Are LLMs Better than Reported? Detecting Label Errors and Mitigating Their Effect on Model Performance
    Omer Nahum, Nitay Calderon, Orgad Keller, Idan Szpektor, Roi Reichart
  • Detecting Legal Citations in United Kingdom Court Judgments
    Holli Sargeant, Andreas Östling, Måns Magnusson
  • Large Language Models Badly Generalize across Option Length, Problem Types, and Irrelevant Noun Replacements
    Guangxiang Zhao, Saier Hu, Xiaoqi Jian, Wu Jinzhu, Yuhan Wu, Lin Sun, Xiangzheng Zhang
  • Studying the Role of Input-Neighbor Overlap in Retrieval-Augmented Language Models Training Efficiency
    Ehsan Doostmohammadi, Marco Kuhlmann
  • Principled Personas: Defining and Measuring the Intended Effects of Persona Prompting on Task Performance
    Pedro Henrique Luz de Araujo, Paul Röttger, Dirk Hovy, Benjamin Roth
  • HydraOpt: Navigating the Efficiency-Performance Trade-off of Adapter Merging
    Taha Ceritli, Ondrej Bohdal, Mete Ozay, Jijoong Moon, Kyenghun Lee, Hyeonmok Ko, Umberto Michieli
  • Parrot: A Training Pipeline Enhances Both Program CoT and Natural Language CoT for Reasoning
    Senjie Jin, Lu Chen, Zhiheng Xi, Yuhui Wang, Sirui Song, Yuhao Zhou, Xinbo Zhang, peng sun, Hong Lu, Tao Gui, Qi Zhang, Xuanjing Huang
  • Spec-VLA: Speculative Decoding for Vision-Language-Action Models with Relaxed Acceptance
    Songsheng Wang, Rucheng Yu, Zhihang Yuan, Chao Yu, Feng Gao, Yu Wang, Derek F. Wong
  • Leveraging Text-to-Text Transformers as Classifier Chain for Few-Shot Multi-Label Classification
    Quang Anh Nguyen, Nadi Tomeh, Mustapha Lebbah, Thierry Charnois, Hanane AZZAG
  • M-Wanda: Improving One-Shot Pruning for Multilingual LLMs
    Rochelle Choenni, Ivan Titov
  • Beyond Hate Speech: NLP’s Challenges and Opportunities in Uncovering Dehumanizing Language
    Hamidreza Saffari, Mohammadamin Shafiei, Hezhao Zhang, Lasana T. Harris, Nafise Sadat Moosavi
  • Conflict-Aware Soft Prompting for Retrieval-Augmented Generation
    Eunseong Choi, June Park, Hyeri Lee, Jongwuk Lee
  • R-CHAR: A Metacognition-Driven Framework for Role-Playing in Large Language Models
    Haiming Qin, Jiwei Zhang, Wei Zhang, KeZhong Lu, Mingyang Zhou, Hao Liao, Rui Mao
  • Annotating Training Data for Conditional Semantic Textual Similarity Measurement using Large Language Models
    Gaifan Zhang, Yi Zhou, Danushka Bollegala
  • When Words Smile: Generating Diverse Emotional Facial Expressions from Text
    Haidong Xu, Meishan Zhang, Hao Ju, Zhedong Zheng, Erik Cambria, Min Zhang, Hao Fei
  • Improving Online Job Advertisement Analysis via Compositional Entity Extraction
    Kai Krüger, Stefan Winnige, Alan Akbik, Johanna Binnewitt, Kathrin Ehmann
  • Correlation-Aware Example Selection for In-Context Learning with Nonsymmetric Determinantal Point Processes
    Qiunan Du, Zhiliang Tian, Zhen Huang, Kailun Bian, Tianlun Liu, Zhaoning Zhang, Xinwang Liu, Feng Liu, Dongsheng Li
  • Leveraging Cognitive Complexity of Texts for Contextualization in Dense Retrieval
    Effrosyni Sokli, Georgios Peikos, Pranav Kasela, Gabriella Pasi
  • Beyond Online Sampling: Bridging Offline-to-Online Alignment via Dynamic Data Transformation for LLMs
    Zhang Zhang, Guhao Feng, Jian Guan, Di He, Wei Wu
  • CAVE : Detecting and Explaining Commonsense Anomalies in Visual Environments
    Rishika Bhagwatkar, Syrielle Montariol, Angelika Romanou, Beatriz Borges, Irina Rish, Antoine Bosselut
  • Enhancing LLM Language Adaption through Cross-lingual In-Context Pre-training
    Linjuan Wu, Hao-Ran Wei, Huan Lin, Tianhao Li, Baosong Yang, Fei Huang, Weiming Lu
  • SemVink: Advancing VLMs’ Semantic Understanding of Optical Illusions via Visual Global Thinking
    Sifan Li, Yujun Cai, Yiwei Wang
  • Order Doesn’t Matter, But Reasoning Does: Training LLMs with Order-Centric Augmentation
    Qianxi He, Qianyu He, Jiaqing Liang, Weikang Zhou, Zeye Sun, Fei Yu, Yanghua Xiao
  • Type-Less yet Type-Aware Inductive Link Prediction with Pretrained Language Models
    Alessandro De Bellis, Salvatore Bufi, Giovanni Servedio, Vito Walter Anelli, Tommaso Di Noia, Eugenio Di Sciascio
  • Extracting Linguistic Information from Large Language Models: Syntactic Relations and Derivational Knowledge
    Tsedeniya Kinfe Temesgen, Marion Di Marco, Alexander Fraser
  • Beyond Correctness: Confidence-Aware Reward Modeling for Enhancing Large Language Model Reasoning
    Qianxi He, Qingyu Ren, Shanzhe Lei, Xuhong Wang, Yingchun Wang
  • TrojanStego: Your Language Model Can Secretly Be A Steganographic Privacy Leaking Agent
    Dominik Meier, Jan Philip Wahle, Paul Röttger, Terry Ruas, Bela Gipp
  • Frequency & Compositionality in Emergent Communication
    Jean-Baptiste Sevestre, Emmanuel Dupoux
  • Summarizing Speech: A Comprehensive Survey
    Fabian Retkowski
  • CogDual: Enhancing Dual Cognition of LLMs via Reinforcement Learning with Implicit Rule-Based Rewards
    Cheng Liu, YifeiLu, Fanghua Ye, Jian Li, Xingyu Chen, Feiliang Ren, Zhaopeng Tu, Xiaolong Li
  • Assay2Mol: Large Language Model-based Drug Design Using BioAssay Context
    Yifan Deng, Spencer S Ericksen, Anthony Gitter
  • Frame First, Then Extract: A Frame-Semantic Reasoning Pipeline for Zero-Shot Relation Triplet Extraction
    Zehan Li, Fu Zhang, Wenqing Zhang, JiaweiLi, Zhou Li, Jingwei Cheng, Tianyue Peng
  • MrGuard: A Multilingual Reasoning Guardrail for Universal LLM Safety
    Yahan Yang, Soham Dan, Shuo Li, Dan Roth, Insup Lee
  • TALON: A Multi-Agent Framework for Long-Table Exploration and Question Answering
    Ruochun Jin, Xiyue Wang, DongWang, Haoqi Zheng, Yunpeng Qi, Silin Yang, Meng Zhang
  • You Are What You Train: Effects of Data Composition on Training Context-aware Machine Translation Models
    Paweł Mąka, Yusuf Can Semerci, Jan Scholtes, Gerasimos Spanakis
  • Improving Neutral Point-of-View Generation with Data- and Parameter-Efficient RL
    Jessica Hoffmann, Christiane Ahlheim, Zac Yu, Aria Walfrand, Jarvis Jin, Marie Tano, Ahmad Beirami, Erin MacMurray van Liemt, Nithum Thain, Hakim Sidahmed, Lucas Dixon
  • Randomized Smoothing Meets Vision-Language Models
    Emmanouil Seferis, Changshun Wu, Stefanos Kollias, Saddek Bensalem, Chih-Hong Cheng
  • PIIvot: A Lightweight NLP Anonymization Framework for Question-Anchored Tutoring Dialogues
    Matthew Zent, Digory Smith, Simon Woodhead
  • Trustworthy Medical Question Answering: An Evaluation-Centric Survey
    Yinuo Wang, Baiyang Wang, Robert E. Mercer, Frank Rudzicz, Sudipta Singha Roy, Pengjie Ren, Zhumin Chen, and Xindi Wang
  • Unpacking Let Alone: Human-Scale Models Generalize to a Rare Construction in Form but not Meaning
    Wesley Scivetti, Tatsuya Aoyama, Ethan Wilcox, Nathan Schneider
  • BOUQuET : dataset, Benchmark and Open initiative for Universal Quality Evaluation in Translation
    Pierre Andrews, Mikel Artetxe, Mariano Coria Meglioli, Marta R. Costa-jussà, Joe Chuang, David Dale, Mark Duppenthaler, Nathanial Paul Ekberg, Cynthia Gao, Daniel Edward Licht, Jean Maillard, Alexandre Mourachko, Christophe Ropers, Safiyyah Saleem, Eduardo Sánchez, Ioannis Tsiamas, Arina Turkatenko, Albert Ventayol-Boada, Shireen Yates
  • HealthCards: Exploring Text-to-Image Generation as Visual Aids for Healthcare Knowledge Democratizing and Education
    Qian Wu, Zheyao Gao, Longfei Gou, Yifan Hou, Qi Dou
  • When Life Gives You Samples: The Benefits of Scaling up Inference Compute for Multilingual LLMs
    Ammar Khairi, Daniel D’souza, Ye Shen, Julia Kreutzer, Sara Hooker
  • Creativity in LLM-based Multi-Agent Systems: A Survey
    Yi-Cheng Lin, Kang-Chieh Chen, Zhe-Yan Li, Tzu-Heng Wu, Tzu-Hsuan Wu, Kuan-Yu Chen, Hung-yi Lee, Yun-Nung Chen
  • Context and POS in Action: A Comparative Study of Chinese Homonym Disambiguation in Human and Language Models
    XIE Chenwei, Matthew King-Hang Ma, Wenbo Wang, William Shiyuan Wang
  • Attacking Misinformation Detection Using Adversarial Examples Generated by Language Models
    Piotr Przybyła, Euan McGill, Horacio Saggion
  • Leveraging Loanword Constraints for Improving Machine Translation in a Low-Resource Multilingual Context
    Felermino D. M. A. Ali, Henrique Lopes Cardoso, Rui Sousa-Silva
  • Linguistic Neuron Overlap Patterns to Facilitate Cross-lingual Transfer on Low-resource Languages
    Yuemei Xu, Kexin Xu, Jian Zhou, Ling Hu, Lin Gui
  • Scaling Low-Resource MT via Synthetic Data Generation with LLMs
    Ona De Gibert Bonet, Joseph Attieh, Teemu Vahtola, Mikko Aulamo, Zihao Li, Raúl Vázquez, Tiancheng Hu, Jörg Tiedemann
  • Tailoring Table Retrieval from a Field-aware Hybrid Matching Perspective
    Da Li, Keping Bi, Jiafeng Guo, Xueqi Cheng
  • Randomly Removing 50% of Dimensions in Text Embeddings has Minimal Impact on Retrieval and Classification Tasks
    Sotaro Takeshita, Yurina Takeshita, Daniel Ruffinelli, Simone Paolo Ponzetto
  • Morables: A Benchmark for Assessing Abstract Moral Reasoning in LLMs with Fables
    Matteo Marcuzzo, Alessandro Zangari, Andrea Albarelli, Jose Camacho-Collados, Mohammad Taher Pilehvar
  • MessIRve: A Large-Scale Spanish Information Retrieval Dataset
    Francisco Valentini, Viviana Cotik, Damián Furman, Ivan Bercovich, Edgar Altszyler, Juan Manuel Pérez
  • AFRIDOC-MT: Document-level MT Corpus for African Languages
    Jesujoba Oluwadara Alabi, Israel Abebe Azime, Miaoran Zhang, Cristina España-Bonet, Rachel Bawden, Dawei Zhu, David Ifeoluwa Adelani, Clement Oyeleke Odoje, Idris Akinade, Iffat Maab, Davis David, Shamsuddeen Hassan Muhammad, Neo Putini, David O. Ademuyiwa, Andrew Caines, Dietrich Klakow
  • Charting the Landscape of African NLP: Mapping Progress and Shaping the Road Ahead
    Jesujoba Oluwadara Alabi, Michael A. Hedderich, David Ifeoluwa Adelani, Dietrich Klakow
  • GLIMPSE: Do Large Vision-Language Models Truly Think With Videos or Just Glimpse at Them?
    Yiyang Zhou, Linjie Li, Shi Qiu, Zhengyuan Yang, Yuyang Zhao, Siwei Han, Yangfan He, Kangqi Li, Haonian Ji, Zihao Zhao, Haibo Tong, Lijuan Wang, Huaxiu Yao
  • Social Bias in Multilingual Language Models: A Survey
    Lance Calvin Lim Gamboa, Yue Feng, Mark G. Lee
  • BYOKG-RAG: Multi-Strategy Graph Retrieval for Knowledge Graph Question Answering
    Costas Mavromatis, Soji Adeshina, Vassilis N. Ioannidis, Zhen Han, Qi Zhu, Ian Robinson, Bryan Thompson, Huzefa Rangwala, George Karypis
  • Synth-SBDH: A Synthetic Dataset of Social and Behavioral Determinants of Health for Clinical Text
    Avijit Mitra, Zhichao Yang, Emily Druhl, Raelene Goodwin, hong yu
  • Pun Unintended: LLMs and the Illusion of Humor Understanding
    Alessandro Zangari, Matteo Marcuzzo, Andrea Albarelli, Mohammad Taher Pilehvar, Jose Camacho-Collados
  • RACCooN: Versatile Instructional Video Editing with Auto-Generated Narratives
    Jaehong Yoon, Shoubin Yu, Mohit Bansal
  • Pre-trained Models Perform the Best When Token Distributions Follow Zipf’s Law
    Yanjin He, Qingkai Zeng, Meng Jiang
  • Do RAG Systems Really Suffer From Positional Bias?
    Florin Cuconasu, Simone Filice, Guy Horowitz, Yoelle Maarek, Fabrizio Silvestri
  • Aspect-Oriented Summarization for Psychiatric Short-Term Readmission Prediction
    WonJin Yoon, Boyu Ren, Spencer Thomas, Chanhwi Kim, Guergana K Savova, Mei-Hua Hall, Timothy A. Miller
  • Adapting Bias Evaluation to Domain Contexts using Generative Models
    Tamara Quiroga, Felipe Bravo-Marquez, Valentin Barriere
  • Emergent morpho-phonological representations in self-supervised speech models
    Jon Gauthier, Canaan Breiss, Matthew K Leonard, Edward F. Chang
  • Multilingual Language Model Pretraining using Machine-translated Data
    Jiayi Wang, Yao Lu, Maurice Weber, Max Ryabinin, David Ifeoluwa Adelani, Yihong Chen, Raphael Tang, Pontus Stenetorp
  • IntentionFrame: A Semi-Structured, Multi-Aspect Framework for Fine-Grained Conversational Intention Understanding
    Jinggui Liang, Dung Vo, Lizi Liao
  • Video-RTS: Rethinking Reinforcement Learning and Test-Time Scaling for Efficient and Enhanced Video Reasoning
    Ziyang Wang, Jaehong Yoon, Shoubin Yu, Md Mohaiminul Islam, Gedas Bertasius, Mohit Bansal
  • Efficient Compositional Multi-tasking for On-device Large Language Models
    Ondrej Bohdal, Mete Ozay, Jijoong Moon, Kyenghun Lee, Hyeonmok Ko, Umberto Michieli
  • Improving Large Language Model Safety with Contrastive Representation Learning
    Samuel Simko, Mrinmaya Sachan, Bernhard Schölkopf, Zhijing Jin
  • Leveraging What’s Overfixed: Post-Correction via LLM Grammatical Error Overcorrection
    Taehee Park, Heejin Do, Gary Lee
  • Scaling Up Temporal Domain Generalization via Temporal Experts Averaging
    Aoming Liu, Kevin Miller, Venkatesh Saligrama, Kate Saenko, Boqing Gong, Ser-Nam Lim, Bryan A. Plummer
  • LinguaLens: Towards Interpreting Linguistic Mechanisms of Large Language Models via Sparse Auto-Encoder
    Yi Jing, Zijun Yao, Hongzhu Guo, Lingxu Ran, Xiaozhi Wang, Lei Hou, Juanzi Li
  • The Strawberry Problem: Emergence of Character-level Understanding in Tokenized Language Models
    Adrian Cosma, Stefan Ruseti, Emilian Radoi, Mihai Dascalu
  • Improving the Quality of Web-mined Parallel Corpora of Low-Resource Languages using Debiasing Heuristics
    Surangika Ranathunga, Aloka Fernando, Menan Velayuthan, Charitha Rathnayaka, Nisansa de Silva
  • Weaver: Interweaving SQL and LLM for Table Reasoning
    Rohit Khoja, Devanshu Gupta, Yanjie Fu, Dan Roth, Vivek Gupta
  • ECO Decoding: Entropy-Based Control for Controllability and Fluency in Controllable Dialogue Generation
    Seungmin Shin, Dooyoung Kim, Youngjoong Ko
  • Investigating the interaction of linguistic and mathematical reasoning in language models using multilingual number puzzles
    Antara Raaghavi Bhattacharya, Isabel Papadimitriou, Kathryn Davidson, David Alvarez-Melis
  • Unsupervised Concept Vector Extraction for Bias Control in LLMs
    Hannah Cyberey, Yangfeng Ji, David Evans
  • Seeing the Same Story Differently: Framing‑Divergent Event Coreference for Computational Framing Analysis
    Jin Zhao, Xinrui Hu, Nianwen Xue
  • LLMs are Better Than You Think: Label-Guided In-Context Learning for Named Entity Recognition
    Fan Bai, Hamid Hassanzadeh, Ardavan Saeedi, Mark Dredze
  • COUNTDOWN: Contextually Sparse Activation Filtering Out Unnecessary Weights in Down Projection
    Jaewon Cheon, Pilsung Kang
  • SimpleDoc: Multi‑Modal Document Understanding with Dual‑Cue Page Retrieval and Iterative Refinement
    Chelsi Jain, Yiran Wu, Yifan Zeng, Jiale Liu, Shengyu Dai, Zhenwen Shao, Qingyun Wu, Huazheng Wang
  • VLP: Vision-Language Preference Learning for Embodied Manipulation
    Runze Liu, Chenjia Bai, Jiafei Lyu, Shengjie Sun, Yali Du, Xiu Li
  • QG-CoC: Question-Guided Chain-of-Captions for Large Multimodal Models
    Kuei-Chun Kao, Hsu Tzu-Yin, Yunqi Hong, Ruochen Wang, Cho-Jui Hsieh
  • EGOILLUSION: Benchmarking Hallucinations in Egocentric Video Understanding
    Ashish Seth, Utkarsh Tyagi, Ramaneswaran Selvakumar, Nishit Anand, Sonal Kumar, Sreyan Ghosh, Ramani Duraiswami, Chirag Agarwal, Dinesh Manocha
  • MULTIVOX: A Benchmark for Evaluating Voice Assistants for Multimodal Interactions
    Ramaneswaran Selvakumar, Ashish Seth, Nishit Anand, Utkarsh Tyagi, Sonal Kumar, Sreyan Ghosh, Dinesh Manocha
  • Do All Autoregressive Transformers Remember Facts the Same Way? A Cross-Architecture Analysis of Recall Mechanisms
    Minyeong Choe, Haehyun Cho, Changho Seo, Hyunil Kim
  • Probing Narrative Morals: A New Character-Focused MFT Framework for Use with Large Language Models
    Luca Mitran, Sophie Wu, Andrew Piper
  • Probing and Boosting Large Language Models Capabilities via Attention Heads
    Dezhi Zhao, Xiaocheng Feng, Xin Liu, Hui Wang, Bing Qin
  • A Survey of Link Prediction in N-ary Knowledge Graphs
    Jiyao Wei, Saiping Guan, Da Li, Zhongni Hou, Miao Su, Yucan Guo, Xiaolong Jin, Jiafeng Guo, Xueqi Cheng
  • Multi-Frequency Contrastive Decoding: Alleviating Hallucinations for Large Vision-Language Models
    Bingqian Liu, Fu Zhang, Guoqing Chen, Jingwei Cheng
  • ORPP: Self-Optimizing Role-playing Prompts to Enhance Language Model Capabilities
    Yifan Duan, Yihong Tang, Kehai Chen, Liqiang Nie, Min Zhang
  • BrailleLLM: Braille Instruction Tuning with Large Language Models for Braille Domain Tasks
    Tianyuan Huang, Zepeng Zhu, Hangdi Xing, Zirui Shao, Zhi Yu, Chaoxiong Yang, Jiaxian He, Xiaozhong Liu, Jiajun Bu
  • MAviS: A Multimodal Conversational Assistant For Avian Species
    Yevheniia Kryklyvets, Mohammed Irfan Kurpath, Sahal Shaji Mullappilly, Jinxing Zhou, Fahad Shahbaz Khan, Rao Muhammad Anwer, Salman Khan, Hisham Cholakkal
  • Refining Text Generation for Realistic Conversational Recommendation via Direct Preference Optimization
    Manato Tajiri, Michimasa Inaba
  • Large Language Models Threaten Language’s Epistemic and Communicative Foundations
    Shashank Srivastava
  • Detecting Knowledge Boundary of Vision Large Language Models by Sampling-Based Inference
    Zhuo Chen, Xinyu Wang, Yong Jiang, Zhen Zhang, Xinyu Geng, Pengjun Xie, Fei Huang, Kewei Tu
  • Multi-view-guided Passage Reranking with Large Language Models
    Jeongwoo Na, Jun Kwon, Eunseong Choi, Jongwuk Lee
  • Disentangling Subjectivity and Uncertainty for Hate Speech Annotation and Modeling using Gaze
    Özge Alacam, Sanne Hoeken, Andreas Säuberli, Hannes Gröner, Diego Frassinelli, Sina Zarrieß, Barbara Plank
  • VoiceBBQ: Investigating Effect of Content and Acoustics in Social Bias of Spoken Language Model
    Junhyuk Choi, Ro-hoon Oh, Jihwan Seol, Bugeun Kim
  • Explaining Differences Between Model Pairs in Natural Language through Sample Learning
    Advaith Malladi, Rakesh R Menon, Yuvraj Jain, Shashank Srivastava
  • Compound AI Systems Optimization: A Survey of Methods, Challenges, and Future Directions
    Yu-Ang Lee, Guan-Ting Yi, Mei-Yi Liu, Jui-Chao Lu, Guan-Bo Yang, Yun-Nung Chen
  • A Multi-Level Benchmark for Causal Language Understanding in Social Media Discourse
    Xiaohan Ding, Kaike Ping, Buse Çarık, Eugenia Rho
  • Causal Representation Learning from Multimodal Clinical Records under Non-Random Modality Missingness
    Zihan Liang, Ziwen Pan, Ruoxuan Xiong
  • XLQA: A Benchmark for Locale-Aware Multilingual Open-Domain Question Answering
    Keonwoo Roh, Yeong-Joon Ju, Seong-Whan Lee
  • Transformer-Based Temporal Information Extraction and Application: A Review
    Xin Su, Phillip Howard, Steven Bethard
  • How to Protect Yourself from 5G Radiation? Investigating LLM Responses to Implicit Misinformation
    Ruohao Guo, Wei Xu, Alan Ritter
  • AmpleHate: Amplifying the Attention for Versatile Implicit Hate Detection
    Yejin Lee, Joonghyuk Hahn, Hyeseon Ahn, Yo-Sub Han
  • Can Large Language Models Act as Ensembler for Multi-GNNs?
    Hanqi Duan, Yao Cheng, Jianxiang Yu, Yao Liu, Xiang Li
  • Agent-to-Agent Theory of Mind: Testing Interlocutor Awareness among Large Language Models
    Younwoo Choi, Changling Li, Yongjin Yang, Zhijing Jin
  • From Charts to Fair Narratives: Uncovering and Mitigating Geo-Economic Biases in Chart-to-Text
    Ridwan Mahbub, Mohammed Saidul Islam, Mir Tafseer Nayeem, Md Tahmid Rahman Laskar, Mizanur Rahman, Shafiq Joty, Enamul Hoque
  • Real-time Ad Retrieval via LLM-generative Commercial Intention for Sponsored Search Advertising
    Tongtong Liu, Zhaohui Wang, Meiyue Qin, Zenghui Lu, Xudong Chen, Yuekui Yang, Peng Shu
  • Toward Efficient Sparse Autoencoder-Guided Steering for Improved In-Context Learning in Large Language Models
    Ikhyun Cho, Julia Hockenmaier
  • CLMTracing: Black-box User-level Watermarking for Code Language Model Tracing
    Boyu Zhang, Ping He, Tianyu Du, Xuhong Zhang, LEI YUN, Kingsum Chow, Jianwei Yin
  • The Good, the Bad and the Constructive: Automatically Measuring Peer Review’s Utility for Authors
    Abdelrahman Sadallah, Tim Baumgärtner, Iryna Gurevych, Ted Briscoe
  • Evolving Chinese Spelling Correction with Corrector-Verifier Collaboration
    Linfeng Liu, Hongqiu Wu, hai zhao
  • M2Edit: Locate and Edit Multi-Granularity Knowledge in Multimodal Large Language Model
    Yang Zhou, Pengfei Cao, Yubo Chen, Qingbin Liu, Dianbo Sui, Xi Chen, Kang Liu, Jun Zhao
  • Do LLMs Behave as Claimed? Investigating How LLMs Follow Their Own Claims using Counterfactual Questions
    Haochen Shi, Shaobo Li, Guoqing Chao, Xiaoliang Shi, Wentao Chen, Zhenzhou Ji
  • Multilingual vs Crosslingual Retrieval of Fact-Checked Claims: A Tale of Two Approaches
    Alan Ramponi, Marco Rovera, Robert Moro, Sara Tonelli
  • How Much Do LLMs Hallucinate across Languages? On Realistic Multilingual Estimation of LLM Hallucination
    Saad Obaid Ul Islam, Anne Lauscher, Goran Glavaš
  • LiTransProQA: An LLM-based Literary Translation Evaluation Metric with Professional Question Answering
    Ran Zhang, Wei Zhao, Lieve Macken, Steffen Eger
  • Improving Handshape Representations for Sign Language Processing: A Graph Neural Network Approach
    Alessa Carbo, Eric Nalisnick
  • Instructing Large Language Models for Low-Resource Languages: A Systematic Study for Basque
    Oscar Sainz, Naiara Perez, Julen Etxaniz, Joseba Fernandez de Landa, Itziar Aldabe, Iker García-Ferrero, Aimar Zabala, Ekhi Azurmendi, German Rigau, Eneko Agirre, Mikel Artetxe, Aitor Soroa
  • SOCIAL SCAFFOLDS: A Generalization Framework for Social Understanding Tasks
    Ritam Dutt, Carolyn Rose, Maarten Sap
  • Beyond A Single AI Cluster: A Survey of Decentralized LLM Training
    Haotian Dong, Jingyan Jiang, Rongwei Lu, Jiajun Luo, Jiajun Song, Bowen Li, Ying Shen, Zhi Wang
  • Can LLM Agents Maintain a Persona in Discourse?
    Pranav Bhandari, Nicolas Fay, Michael J Wise, Amitava Datta, Stephanie Meek, Usman Naseem, Mehwish Nasim
  • Iterative Multilingual Spectral Attribute Erasure
    Shun Shao, Yftah Ziser, Zheng Zhao, Yifu QIU, Shay B Cohen, Anna Korhonen
  • TinySQL: A Progressive Text-to-SQL Dataset for Mechanistic Interpretability Research
    Abir HARRASSE, Philip Quirke, Clement Neo, Dhruv Nathawani, Luke Marks, Amir Abdullah
  • SCRIBE: Structured Chain Reasoning for Interactive Behaviour Explanations using Tool Calling
    Fares Fawzi, Vinitra Swamy, Dominik Glandorf, Tanya Nazaretsky, Tanja Käser
  • Logit Space Constrained Fine-Tuning for Mitigating Hallucinations in LLM-Based Recommender Systems
    Jianfeng Deng, Qingfeng Chen, Debo Cheng, Jiuyong Li, Lin Liu
  • PACHAT: Persona-Aware Speech Assistant for Multi-party Dialogue
    Dongjie Fu, Xize Cheng, Linjun Li, Xiaoda Yang, Lujia Yang, Tao Jin
  • Reasoning-to-Defend: Safety-Aware Reasoning Can Defend Large Language Models from Jailbreaking
    Junda Zhu, Lingyong Yan, Shuaiqiang Wang, Dawei Yin, Lei Sha
  • Graph-Guided Textual Explanation Generation Framework
    Shuzhou Yuan, Jingyi Sun, Ran Zhang, Michael Färber, Steffen Eger, Pepa Atanasova, Isabelle Augenstein
  • The Validation Gap: A Mechanistic Analysis of How Language Models Compute Arithmetic but Fail to Validate It
    Leonardo Bertolazzi, Philipp Mondorf, Barbara Plank, Raffaella Bernardi
  • A Causal Lens for Evaluating Faithfulness Metrics
    Kerem Zaman, Shashank Srivastava
  • Sequential-NIAH: A Needle-In-A-Haystack Benchmark for Extracting Sequential Needles from Long Contexts
    Yifei Yu, Qian-Wen Zhang, Lingfeng Qiao, di yin, Fang Li, Jie Wang, ChenZengXi, Suncong Zheng, Xiaolong Liang, Xing Sun
  • FISTAPruner: Layer-wise Post-training Pruning for Large Language Models
    Pengxiang Zhao, Hanyu Hu, Ping Li, Yi ZHENG, Zhefeng Wang, Xiaoming Yuan
  • Do LLMs Encode Frame Semantics? Evidence from Frame Identification
    Jayanth Krishna Chundru, Rudrashis Poddar, Jie Cao, Tianyu Jiang
  • StepER: Step-wise Knowledge Distillation for Enhancing Reasoning Ability in Multi-Step Retrieval-Augmented Language Models
    Kyumin Lee, Minjin Jeon, Sanghwan Jang, Hwanjo Yu
  • How Does DPO Reduce Toxicity? A Mechanistic Neuron-Level Analysis
    Yushi Yang, Filip Sondej, Harry Mayne, Andrew Lee, Adam Mahdi
  • It’s All About In-Context Learning! Teaching Extremely Low-Resource Languages to LLMs
    Yue Li, Zhixue Zhao, Carolina Scarton
  • Where to show Demos in Your Prompt: A Positional Bias of In-Context Learning
    Kwesi Adu Cobbina, Tianyi Zhou
  • Multilingual Pretraining for Pixel Language Models
    Ilker Kesen, Jonas F. Lotz, Ingo Ziegler, Phillip Rust, Desmond Elliott
  • MetaFaith: Faithful Natural Language Uncertainty Expression in LLMs
    Gabrielle Kaili-May Liu, Gal Yona, Avi Caciularu, Idan Szpektor, Tim G. J. Rudner, Arman Cohan
  • Machine-generated text detection prevents language model collapse
    George Drayson, Emine Yilmaz, Vasileios Lampos
  • Data-Efficient Hate Speech Detection via Cross-Lingual Nearest Neighbor Retrieval with Limited Labeled Data
    Faeze Ghorbanpour, Daryna Dementieva, Alexander Fraser
  • V-VAE: A Variational Auto Encoding Framework Towards Fine-Grained Control over Human-Like Chat
    Qi Lin, Weikai Xu, Lisi Chen, Bin Dai
  • Mixture of Languages: Improved Multilingual Encoders Through Language Grouping
    João Maria Janeiro, Belen Alastruey, Francisco Massa, Maha Elbayad, Benjamin Piwowarski, Patrick Gallinari, Loic Barrault
  • Too Helpful, Too Harmless, Too Honest or Just Right?
    Gautam Siddharth Kashyap, Mark Dras, Usman Naseem
  • Cardiverse: Harnessing LLMs for Novel Card Game Prototyping
    Danrui Li, Sen Zhang, Samuel S. Sohn, Kaidong Hu, Muhammad Usman, Mubbasir Kapadia
  • Assessing effective de-escalation of crisis conversations using transformer-based models and trend statistics
    Ignacio J. Tripodi, Greg Buda, Margaret Meagher, Elizabeth A. Olson
  • Measuring and Mitigating Media Outlet Name Bias in Large Language Models
    Seong-Jin Park, Kang-Min Kim
  • The Good, the Bad, and the Debatable: A Survey on the Impacts of Data for In-Context Learning
    Stephanie Schoch, Yangfeng Ji
  • Where Confabulation Lives: Latent Feature Discovery in LLMs
    Thibaud Ardoin, Yi Cai, Gerhard Wunder
  • Analysing Chain of Thought Dynamics: Active Guidance or Unfaithful Post-hoc Rationalisation?
    Samuel Lewis-Lim, Xingwei Tan, Zhixue Zhao, Nikolaos Aletras
  • Playpen: An Environment for Exploring Learning From Dialogue Game Feedback
    Nicola Horst, Davide Mazzaccara, Antonia Schmidt, Michael Sullivan, Filippo Momentè, Luca Franceschetti, Philipp Sadler, Sherzod Hakimov, Alberto Testoni, Raffaella Bernardi, Raquel Fernández, Alexander Koller, Oliver Lemon, David Schlangen, Mario Giulianelli, Alessandro Suglia
  • GenLink: Generation-Driven Schema-Linking via Multi-Model Learning for Text-to-SQL
    Zhifeng Hao, Junqi Huang, Shaobin Shi, Ruichu Cai, Boyan Xu
  • TSVer: A Benchmark for Fact Verification Against Time-Series Evidence
    Marek Strong, Andreas Vlachos
  • Cross-MoE: An Efficient Temporal Prediction Framework Integrating Textual Modality
    Ruizheng Huang, Zhicheng Zhang, Yong Wang
  • Sparse Autoencoder Features for Classifications and Transferability
    Jack Gallifant, Shan Chen, Kuleen Sasse, Hugo Aerts, Thomas Hartvigsen, Danielle Bitterman
  • KGE Calibrator: An Efficient Probability Calibration Method of Knowledge Graph Embedding Models for Trustworthy Link Prediction
    Yang Yang, Mohan Timilsina, Edward Curry
  • LCES: Zero-shot Automated Essay Scoring via Pairwise Comparisons Using Large Language Models
    Takumi Shibata, Yuichi Miyamura
  • The Arabic Generality Score: Another Dimension of Modeling Arabic Dialectness
    Sanad Sha’ban, Nizar Habash
  • Lemmatization as a Classification Task: Results from Arabic across Multiple Genres
    Mostafa Saeed, Nizar Habash
  • A Comprehensive Framework to Operationalize Social Stereotypes for Responsible AI Evaluations
    Aida Mostafazadeh Davani, Sunipa Dev, Héctor Pérez-Urbina, Vinodkumar Prabhakaran
  • Correct-Detect: Balancing Performance and Ambiguity Through the Lens of Coreference Resolution in LLMs
    Amber Shore, Russell Scheinberg, Ameeta Agrawal, So Young Lee
  • GRAID: Synthetic Data Generation with Geometric Constraints and Multi-Agentic Reflection for Harmful Content Detection
    Melissa Kazemi Rad, Alberto Purpura, Himanshu Kumar, Emily Chen, Mohammad Shahed Sorower
  • LaMDAgent: An Autonomous Framework for Post-Training Pipeline Optimization via LLM Agents
    Taro Yano
  • Finetuning LLMs for Human Behavior Prediction in Social Science Experiments
    Akaash Kolluri, Shengguang Wu, Joon Sung Park, Michael S. Bernstein
  • How Private are Language Models in Abstractive Summarization?
    Anthony Hughes, Ning Ma, Nikolaos Aletras
  • Expectation Preference Optimization: Reliable Preference Estimation for Improving the Reasoning Capability of Large Language Models
    Zelin Li, Dawei Song
  • Split-Merge: Scalable and Memory-Efficient Merging of Expert LLMs
    Sruthi Gorantla, Aditya Rawal, Devamanyu Hazarika, Kaixiang Lin, Mingyi Hong, Mahdi Namazifar
  • Model Consistency as a Cheap yet Predictive Proxy for LLM Elo Scores
    Ashwin Ramaswamy, Nestor Demeure, Ermal Rrapaj
  • Plutus: Benchmarking Large Language Models in Low-Resource Greek Finance
    Xueqing Peng, Triantafillos Papadopoulos, Efstathia Soufleri, Polydoros Giannouris, Ruoyu Xiang, Yan Wang, Lingfei Qian, Jimin Huang, Qianqian Xie, Sophia Ananiadou
  • TaxoAlign: Scholarly Taxonomy Generation Using Language Models
    Avishek Lahiri, Yufang Hou, Debarshi Kumar Sanyal
  • DiNaM: Disinformation Narrative Mining with Large Language Models
    Witold Sosnowski, Arkadiusz Modzelewski, Kinga Skorupska, Adam Wierzbicki
  • VeriLocc: End-to-End Cross-Architecture Register Allocation via LLM
    Lesheng Jin, Zhenyuan Ruan, Haohui Mai, Jingbo Shang
  • MemeIntel: Explainable Detection of Propagandistic and Hateful Memes
    Mohamed Bayan Kmainasi, Abul Hasnat, Md Arid Hasan, Ali Ezzat Shahroor, Firoj Alam
  • FLUID QA: A Multilingual Benchmark for Figurative Language Usage in Dialogue across English, Chinese, and Korean
    Seoyoon Park, Hyeji Choi, Minseon Kim, Subin An, Xiaonan Wang, Gyuri Choi, Hansaem Kim
  • Structured Moral Reasoning in Language Models: A Value-Grounded Evaluation Framework
    Mohna Chakraborty, Lu Wang, David Jurgens
  • VerIF: Verification Engineering for Reinforcement Learning in Instruction Following
    Hao Peng, Yunjia Qi, Xiaozhi Wang, Bin Xu, Lei Hou, Juanzi Li
  • UNCLE: Benchmarking Uncertainty Expressions in Long-Form Generation
    Ruihan Yang, Caiqi Zhang, Zhisong Zhang, Xinting Huang, Dong Yu, Nigel Collier, Deqing Yang
  • Enhancing Study-Level Inference from Clinical Trial Papers via Reinforcement Learning-Based Numeric Reasoning
    Massimiliano Pronesti, Michela Lorandi, Paul Flanagan, Oisín Redmond, Anya Belz, Yufang Hou
  • Context-aware Biases for Length Extrapolation
    Ali veisi, Hamidreza Amirzadeh, Amir M. Mansourian
  • AutoSDT: Scaling Data-Driven Discovery Tasks Toward Open Co-Scientists
    Yifei Li, Hanane Nour Moussa, Ziru Chen, Shijie Chen, Botao Yu, Mingyi Xue, Benjamin Burns, Tzu-Yao Chiu, Vishal Dey, Zitong Lu, Chen Wei, Qianheng Zhang, Tianyu Zhang, Song Gao, Xuhui Huang, Xia Ning, Nesreen K. Ahmed, Ali Payani, Huan Sun
  • Finding your MUSE: Mining Unexpected Solutions Engine
    Nir Sweed, Hanit Hakim, Ben Wolfson, Hila Lifshitz, Dafna Shahaf
  • Quantized but Deceptive? A Multi-Dimensional Truthfulness Evaluation of Quantized LLMs
    Yao Fu, Xianxuan Long, Runchao Li, Haotian Yu, Mu Sheng, Xiaotian Han, Yu Yin, Pan Li
  • Leveraging Knowledge Graph-Enhanced LLMs for Context-Aware Medical Consultation
    Su-Hyeong Park, Ho-Beom Kim, Seong-Jin Park, Dinara Aliyeva, Kang-Min Kim
  • Reflective Agreement: Combining Self-Mixture of Agents with a Sequence Tagger for Robust Event Extraction
    Fatemeh Haji, Mazal Bethany, Cho-Yu Jason Chiang, Anthony Rios, Peyman Najafirad
  • Simple Yet Effective: An Information-Theoretic Approach to Multi-LLM Uncertainty Quantification
    Maya Kruse, Majid Afshar, Saksham Khatwani, Anoop Mayampurath, Guanhua Chen, Yanjun Gao
  • Exploring morphology-aware tokenization: A case study on Spanish language modeling
    Alba Táboas García, Piotr Przybyła, Leo Wanner
  • Studying Rhetorically Ambiguous Questions
    Oghenevovwe Ikumariegbe, Eduardo Blanco, Ellen Riloff
  • Estimating LLM Consistency: A User Baseline vs Surrogate Metrics
    Xiaoyuan Wu, Weiran Lin, Omer Akgul, Lujo Bauer
  • Are Vision-Language Models Safe in the Wild? A Meme-Based Benchmark Study
    DongGeon Lee, Joonwon Jang, Jihae Jeong, Hwanjo Yu
  • Improving Rule-based Reasoning in LLMs using Neurosymbolic Representations
    Varun Dhanraj, Chris Eliasmith
  • Can LLMs Extract Frame-Semantic Arguments?
    Jacob Devasier, Rishabh Mediratta, Chengkai Li
  • Accelerated Test-Time Scaling with Model-Free Speculative Sampling
    Woomin Song, Saket Dingliwal, Sai Muralidhar Jayanthi, Bhavana Ganesh, Jinwoo Shin, Aram Galstyan, Sravan Babu Bodapati
  • Enhancing RLHF with Human Gaze Modeling
    Karim Galliamov, Ivan Titov, Ilya Pershin
  • Mapping semantic networks to Dutch word embeddings as a diagnostic tool for cognitive decline
    Maithe van Noort, Michal Korenar, Jelke Bloem
  • CausalVLBench: Benchmarking Visual Causal Reasoning in Large Vision-Language Models
    Aneesh Komanduri, Karuna Bhaila, Xintao Wu
  • Implicit Behavioral Alignment of Language Agents in High-Stakes Crowd Simulations
    Yunzhe Wang, Gale Lucas, Burcin Becerik-Gerber, Volkan Ustun
  • Are Language Models Consequentialist or Deontological Moral Reasoners?
    Keenan Samway, Max Kleiman-Weiner, David Guzman Piedrahita, Rada Mihalcea, Bernhard Schölkopf, Zhijing Jin
  • PatentScore: Multi-dimensional Evaluation of LLM-Generated Patent Claims
    Yongmin Yoo, Qiongkai Xu, Longbing Cao
  • All for One: LLMs Solve Mental Math at the Last Token With Information Transferred From Other Tokens
    Siddarth Mamidanna, Daking Rai, Ziyu Yao, Yilun Zhou
  • A Position Paper on the Automatic Generation of Machine Learning Leaderboards
    Roelien C. Timmer, Yufang Hou, Stephen Wan
  • SimMark: A Robust Sentence-Level Similarity-Based Watermarking Algorithm for Large Language Models
    Amirhossein Dabiriaghdam, Lele Wang
  • SERVAL: Surprisingly Effective Zero-Shot Visual Document Retrieval Powered by Large Vision and Language Models
    Thong Nguyen, Yibin Lei, Jia-Huei Ju, Andrew Yates
  • Meta-Semantics Augmented Few-Shot Relational Learning
    Han Wu, Jie Yin
  • ProLongVid: A Simple but Strong Baseline for Long-context Video Instruction Tuning
    Rui Wang, Bohao Li, Xiyang Dai, Jianwei Yang, Yi-Ling Chen, Zhen Xing, Yifan Yang, Dongdong Chen, Xipeng Qiu, Zuxuan Wu, Yu-Gang Jiang
  • ModelCitizens: Representing Community Voices in Online Safety
    Ashima Suvarna, Christina A Chance, Karolina Naranjo, Hamid Palangi, Sophie Hao, Thomas Hartvigsen, Saadia Gabriel
  • UnifiedVisual: A Framework for Constructing Unified Vision-Language Datasets
    Pengyu Wang, Shaojun Zhou, Chenkun Tan, Xinghao Wang, Wei Huang, Zhen Ye, Zhaowei Li, Botian Jiang, Dong Zhang, Xipeng Qiu
  • The Pursuit of Empathy: Evaluating Small Language Models for PTSD Dialogue Support
    Suhas BN, Yash Mahajan, Dominik O. Mattioli, Andrew M. Sherrill, Rosa I. Arriaga, Christopher Wiese, Saeed Abdullah
  • Is Cognition Consistent with Perception? Assessing and Mitigating Multimodal Knowledge Conflicts in Document Understanding
    Zirui Shao, Feiyu Gao, Zhaoqing Zhu, Chuwei Luo, Hangdi Xing, Zhi Yu, Qi Zheng, Ming Yan, Jiajun Bu
  • AutoCT: Automating Interpretable Clinical Trial Prediction with LLM Agents
    Fengze Liu, Haoyu Wang, Joonhyuk Cho, Dan Roth, Andrew Lo
  • MMDocIR: Benchmarking Multimodal Retrieval for Long Documents
    Kuicai Dong, Yujing Chang, Derrick Goh Xin Deik, Dexun Li, Ruiming Tang, Yong Liu
  • Program of Thoughts for Financial Reasoning: Leveraging Dynamic In-Context Examples and Generative Retrieval
    Subhendu Khatuya, Shashwat Naidu, Pawan Goyal, Niloy Ganguly
  • Waste-Bench: A Comprehensive Benchmark for Evaluating VLLMs in Cluttered Environments
    Muhammad Ali, Salman Khan
  • Demystifying Domain-adaptive Post-training for Financial LLMs
    Zixuan Ke, Yifei Ming, Xuan-Phi Nguyen, Caiming Xiong, Shafiq Joty
  • HICode: Hierarchical Inductive Coding with LLMs
    Mian Zhong, Pristina Wang, Anjalie Field
  • Cacheback: Speculative Decoding With Nothing But Cache
    Zhiyao Ma, In Gim, Lin Zhong
  • MA-DPR: Manifold-aware Distance Metrics for Dense Passage Retrieval
    Yifan Liu, Qianfeng Wen, Mark Zhao, Jiazhou Liang, Scott Sanner
  • LLM-Guided Co-Training for Text Classification
    Md Mezbaur Rahman, Cornelia Caragea
  • LeanK: Learnable K Cache Channel Pruning for Efficient Decoding
    Yike Zhang, Zhiyuan He, Huiqiang Jiang, Chengruidong Zhang, Yuqing Yang, Jianyong Wang, Lili Qiu
  • DELOC: Document Element Localizer
    Hammad Ayyubi, Puneet Mathur, Mehrab Tanjim, Vlad I Morariu
  • NL2Lean: Translating Natural Language into Lean 4 through Multi-Aspect Reinforcement Learning
    Yue Fang, Shaohan Huang, Xin Yu, Haizhen Huang, Zihan Zhang, Weiwei Deng, Furu Wei, Feng Sun, Qi Zhang, Zhi Jin
  • A Multilingual, Culture-First Approach to Addressing Misgendering in LLM Applications
    Sunayana Sitaram, Adrian de Wynter, Isobel McCrum, Qilong Gu, Si-Qing Chen
  • X-CoT: Explainable Text-to-Video Retrieval via LLM-based Chain-of-Thought Reasoning
    Prasanna Reddy Pulakurthi, Jiamian Wang, MAJID RABBANI, Sohail Dianat, Raghuveer Rao, Zhiqiang Tao
  • Token-level Proximal Policy Optimization for Query Generation
    Yichen Ouyang, Lu Wang, Fangkai Yang, Pu Zhao, Chenghua Huang, Jianfeng Liu, Bochen Pang, Yaming Yang, Yuefeng Zhan, Hao Sun, Qingwei Lin, Saravan Rajmohan, Weiwei Deng, Dongmei Zhang, Feng Sun
  • Prior Prompt Engineering for Reinforcement Fine-Tuning
    Pittawat Taveekitworachai, Potsawee Manakul, Sarana Nutanong, Kunat Pipatanakul
  • Beyond WER: Probing Whisper’s Sub‑token Decoder Across Diverse Language Resource Levels
    Siyu Liang, Nicolas Ballier, Gina-Anne Levow, Richard Wright
  • ThinkTuning: Instilling Cognitive Reflections without Distillation
    Aswin RRV, Jacob Dineen, Divij Handa, Md Nayem Uddin, Mihir Parmar, Chitta Baral, Ben Zhou
  • $\texttt{Droid}$: A Resource Suite for AI-Generated Code Detection
    Daniil Orel, Indraneil Paul, Iryna Gurevych, Preslav Nakov
  • LoRACoE: Improving Large Language Model via Composition-based LoRA Expert
    Guanyu Li, Zhiheng Xi, Zhihao Zhang, Boyang Hong, Tao Gui, Qi Zhang, Xuanjing Huang
  • Same Question, Different Words: A Latent Adversarial Framework for Prompt Robustness
    Tingchen Fu, Fazl Barez
  • Pluralistic Alignment for Healthcare: A Role-Driven Framework
    Jiayou Zhong, Anudeex Shetty, Chao Jia, Xuanrui Lin, Usman Naseem
  • Flexible-length Text Infilling for Discrete Diffusion Models
    Andrew Zhang, Anushka Sivakumar, Chia-Wei Tang, Chris Thomas
  • Beyond the Leaderboard: Understanding Performance Disparities in Large Language Models via Model Diffing
    Sabri Boughorbel, Fahim Dalvi, Nadir Durrani, Majd Hawasly
  • Explicit Learning and the LLM in Machine Translation
    Malik Marmonier, Rachel Bawden, Benoît Sagot
  • Towards Language-Agnostic STIPA: Universal Phonetic Transcription to Support Language Documentation at Scale
    Jacob Lee Suchardt, Hana El-Shazli, Pierluigi Cassotti
  • Beyond Pairwise: Global Zero-shot Temporal Graph Generation
    Alon Eirew, Kfir Bar, Ido Dagan
  • “Feels Feminine to Me”: Understanding Perceived Gendered Style through Human Annotations
    Hongyu Chen, Neele Falk, Michael Roth, Agnieszka Falenska
  • RALS: Resources and Baselines for Romanian Automatic Lexical Simplification
    Fabian Anghel, Cristea Petru-Theodor, Claudiu Creanga, Sergiu Nisioi
  • How Do Social Bots Participate in Misinformation Spread? A Comprehensive Dataset and Analysis
    Herun Wan, Minnan Luo, Zihan Ma, Guang Dai, Xiang Zhao
  • Are Stereotypes Leading LLMs’ Zero-Shot Stance Detection ?
    Anthony Dubreuil, Antoine Gourru, Christine Largeron, Amine Trabelsi
  • Multi-Modal Framing Analysis of News
    Arnav Arora, Srishti Yadav, Maria Antoniak, Serge Belongie, Isabelle Augenstein
  • TempParaphraser: “Heating Up” Text to Evade AI-Text Detection through Paraphrasing
    Junjie Huang, Ruiquan Zhang, Jinsong Su, Yidong Chen
  • ComicScene154: A Scene Dataset for Comic Analysis
    Sandro Paval, Pascal Meißner, Ivan P. Yamshchikov
  • MedLinkDE – MedDRA Entity Linking for German with Guided Chain of Thought Reasoning
    Roman Christof, Farnaz Zeidi, Manuela Messelhäußer, Dirk Mentzer, Renate Koenig, Liam Childs, Alexander Mehler
  • HookMoE: A learnable performance compensation strategy of Mixture-of-Experts for LLM inference acceleration
    Cheng Longkai, Along He, Mulin Li, Xie xueshuo, Tao Li
  • Cross-Document Cross-Lingual NLI via RST-Enhanced Graph Fusion and Interpretability Prediction
    Mengying Yuan, WenHao Wang, Zixuan Wang, Yujie Huang, Kangli Wei, Fei Li, Chong Teng, Donghong Ji
  • 3R: Enhancing Sentence Representation Learning via Redundant Representation Reduction
    Longxuan Ma, Xiao Wu, Yuxin Huang, Shengxiang Gao, Zhengtao Yu
  • When Big Models Train Small Ones: Label-Free Model Parity Alignment for Efficient Visual Question Answering using Small VLMs
    Abhirama Subramanyam Penamakuri, Navlika Singh, Piyush Arora, Anand Mishra
  • ProReason: Multi-Modal Proactive Reasoning with Decoupled Eyesight and Wisdom
    Jingqi Zhou, Sheng Wang, Jingwei Dong, Kai Liu, Lei Li, Jiahui Gao, Jiyue Jiang, Lingpeng Kong, Chuan Wu
  • Extractive Fact Decomposition for Interpretable Natural Language Inference in one Forward Pass
    Nicholas Popovic, Michael Färber
  • Structure-Conditional Minimum Bayes Risk Decoding
    Bryan Eikema, Anna Rutkiewicz, Mario Giulianelli
  • Label Set Optimization via Activation Distribution Kurtosis for Zero-Shot Classification with Generative Models
    Yue Li, Zhixue Zhao, Carolina Scarton
  • The Transfer Neurons Hypothesis: An Underlying Mechanism for Language Latent Space Transitions in Multilingual LLMs
    Hinata Tezuka, Naoya Inoue
  • VEHME: A Vision-Language Model For Evaluating Handwritten Mathematics Expressions
    Thu Phuong Nguyen, Duc M. Nguyen, Hyotaek Jeon, Hyunwook Lee, Hyunmin Song, Sungahn Ko, Taehwan Kim
  • All Roads Lead to Rome: Graph-Based Confidence Estimation for Large Language Model Reasoning
    Caiqi Zhang, Chang Shu, Ehsan Shareghi, Nigel Collier
  • SEMMA: A Semantic Aware Knowledge Graph Foundation Model
    Arvindh Arun, Sumit Kumar, Mojtaba Nayyeri, Bo Xiong, Ponnurangam Kumaraguru, Antonio Vergari, Steffen Staab
  • Text2Vis: A Challenging and Diverse Benchmark for Generating Multimodal Visualizations from Text
    Mizanur Rahman, Md Tahmid Rahman Laskar, Shafiq Joty, Enamul Hoque
  • Predicting Prosodic Boundaries for Children’s Texts
    Mansi Dhamne, Sneha Raman, Preeti Rao
  • Enhancing Logical Reasoning in Language Models via Symbolically-Guided Monte Carlo Process Supervision
    Xingwei Tan, Marco Valentino, Mahmud Elahi Akhter, Maria Liakata, Nikolaos Aletras
  • Can Large Language Models Outperform Non-Experts in Poetry Evaluation? A Comparative Study Using the Consensual Assessment Technique
    Piotr Sawicki, Marek Grzes, Dan Brown, Fabricio Goes
  • Beyond Human Labels: A Multi-Linguistic Auto-Generated Benchmark for Evaluating Large Language Models on Resume Parsing
    Zijian Ling, Han Zhang, Jiahao Cui, Zhequn Wu, Xu Sun, Guohao Li, Xiangjian He
  • Orthogonal Finetuning Made Scalable
    Zeju Qiu, Weiyang Liu, Adrian Weller, Bernhard Schölkopf
  • AIR: Complex Instruction Generation via Automatic Iterative Refinement
    Wei Liu, Yancheng He, Yu Li, Hui Huang, Chengwei Hu, Jiaheng Liu, Shilong Li, Wenbo Su, Bo Zheng
  • SQUiD: Synthesizing Relational Databases from Unstructured Text
    Mushtari Sadia, Zhenning Yang, Yunming Xiao, Ang Chen, Amrita Roy Chowdhury
  • RAG+: Enhancing Retrieval-Augmented Generation with Application-Aware Reasoning
    Yu Wang, Shiwan Zhao, Zhihu Wang, Ming FAN, Yubo Zhang, Xicheng Zhang, Zhengfan Wang, Heyuan Huang, Ting Liu
  • Rapid Word Learning Through Meta In-Context Learning
    Wentao Wang, Guangyuan Jiang, Tal Linzen, Brenden Lake
  • EuroGEST: Investigating gender stereotypes in multilingual language models
    Jacqueline Rowe, Mateusz Klimaszewski, Liane Guillou, Shannon Vallor, Alexandra Birch
  • How Persuasive Is Your Context?
    Tu Nguyen, Kevin Du, Alexander Miserlis Hoyle, Ryan Cotterell
  • The Medium Is Not the Message: Deconfounding Document Embeddings via Linear Concept Erasure
    Yu Fan, Yang Tian, Shauli Ravfogel, Mrinmaya Sachan, Elliott Ash, Alexander Miserlis Hoyle
  • Measuring scalar constructs in social science with LLMs
    Hauke Licht, Rupak Sarkar, Patrick Y. Wu, Pranav Goel, Niklas Stoehr, Elliott Ash, Alexander Miserlis Hoyle
  • Text Detoxification: Data Efficiency, Semantic Preservation and Model Generalization
    Jing Yu, Yibo Zhao, Jiapeng Zhu, Wenming Shao, Bo Pang, Zhao Zhang, Xiang Li
  • Not What the Doctor Ordered: Surveying LLM-based De-identification and Quantifying Clinical Information Loss
    Kiana Aghakasiri, Noopur Zambare, JoAnn Thai, Carrie Ye, Mayur Mehta, J Ross Mitchell, Mohamed Abdalla
  • Reasoning under Uncertainty: Efficient LLM Inference via Unsupervised Confidence Dilution and Convergent Adaptive Sampling
    Zhenning Shi, Yijia Zhu, Yi Xie, Junhan Shi, Guorui Xie, Haotian Zhang, Yong Jiang, Congcong Miao, Qing Li
  • Africa Health Check: Probing Cultural Bias in Medical LLMs
    Charles Nimo, Shuheng Liu, Irfan Essa, Michael L. Best
  • Assumed Identities: Quantifying Gender Bias in Machine Translation of Gender-Ambiguous Occupational Terms
    Orfeas Menis Mastromichalakis, Giorgos Filandrianos, Maria Symeonaki, Giorgos Stamou
  • REVIVING YOUR MNEME: Predicting The Side Effects of LLM Unlearning and Fine-Tuning via Sparse Model Diffing
    Aly M. Kassem, Golnoosh Farnadi, Negar Rostamzadeh, Zhuan Shi
  • ToM-SSI: Evaluating Theory of Mind in Situated Social Interactions
    Matteo Bortoletto, Constantin Ruhdorfer, Andreas Bulling
  • Recursive Training Loops in LLMs: How training data properties modulate distribution shift in generated data?
    Grgur Kovač, Jérémy Perez, Rémy Portelas, Peter Ford Dominey, Pierre-Yves Oudeyer
  • Detecting LLM Hallucination Through Layer-wise Information Deficiency: Analysis of Ambiguous Prompts and Unanswerable Questions
    Hazel Kim, Tom A. Lamb, Adel Bibi, Philip Torr, Yarin Gal
  • Extending Automatic Machine Translation Evaluation to Book-Length Documents
    Kuang-Da Wang, Shuoyang Ding, Chao-Han Huck Yang, Ping-Chun Hsieh, Wen-Chih Peng, Vitaly Lavrukhin, Boris Ginsburg
  • MedFact: A Large-scale Chinese Dataset for Evidence-based Medical Fact-checking of LLM Responses
    Tong Chen, Zimu Wang, Yiyi Miao, Haoran Luo, Sun Yuanfei, Wei Wang, Zhengyong Jiang, Procheta Sen, Jionglong Su
  • VideoPASTA: 7K Preference Pairs That Matter for Video-LLM Alignment
    Yogesh Kulkarni, Pooyan Fazli
  • Do LLMs Adhere to Label Definitions? Examining Their Receptivity to External Label Definitions
    Seyedali Mohammadi, Bhaskara Hanuma Vedula, Hemank Lamba, Edward Raff, Ponnurangam Kumaraguru, Francis Ferraro, Manas Gaur
  • Group-Aware Reinforcement Learning for Output Diversity in Large Language Models
    Oron Anschel, Alon Shoshan, Adam Botach, Shunit Haviv Hakimi, Asaf Gendler, Emanuel Ben Baruch, Nadav Bhonker, Igor Kviatkovsky, Manoj Aggarwal, Gerard Medioni
  • Model-Based Ranking of Source Languages for Zero-Shot Cross-Lingual Transfer
    Abteen Ebrahimi, Adam Wiemerslage, Katharina von der Wense
  • PruneCD: Contrasting Pruned Self Model to Improve Decoding Factuality
    Byeongho Yu, Changhun Lee, Jun-gyu Jin, Eunhyeok Park
  • Crisp: Cognitive Restructuring of Negative Thoughts through Multi-turn Supportive Dialogues
    Jinfeng Zhou, Yuxuan Chen, Jianing Yin, Yongkang Huang, Yihan Shi, Xikun Zhang, Libiao Peng, Rongsheng Zhang, Tangjie Lv, Zhipeng Hu, Hongning Wang, Minlie Huang
  • The Impact of Language Mixing on Bilingual LLM Reasoning
    Yihao Li, Jiayi Xin, Miranda Muqing Miao, Qi Long, Lyle Ungar
  • VISaGE: Understanding Visual Generics and Exceptions
    Stella Frank, Emily Allaway
  • Stronger Baselines for Retrieval-Augmented Generation with Long-Context Language Models
    Alex Laitenberger, Christopher D Manning, Nelson F. Liu
  • Discursive Circuits: How Do Language Models Understand Discourse Relations?
    Yisong Miao, Min-Yen Kan
  • Making VLMs More Robot-Friendly: Self-Critical Distillation of Low-Level Procedural Reasoning
    Chan Young Park, Jillian Fisher, Marius Memmel, Dipika Khullar, Seoho Yun, Abhishek Gupta, Yejin Choi
  • ThinkSLM: Towards Reasoning in Small Language Models
    Gaurav Srivastava, Shuxiang Cao, Xuan Wang
  • MAgICoRe: Multi-Agent, Iterative, Coarse-to-Fine Refinement for Reasoning
    Justin Chen, Archiki Prasad, Swarnadeep Saha, Elias Stengel-Eskin, Mohit Bansal
  • Batched Self-Consistency Improves LLM Relevance Assessment and Ranking
    Anton Korikov, Pan Du, Scott Sanner, Navid Rekabsaz
  • SemCSE: Semantic Contrastive Sentence Embeddings Using LLM-Generated Summaries For Scientific Abstracts
    Marc Felix Brinner, Sina Zarrieß
  • Controlled Generation for Private Synthetic Text
    Zihao Zhao, Anjalie Field
  • Towards AI-Assisted Psychotherapy: Emotion-Guided Generative Interventions
    Kilichbek Haydarov, Youssef Mohamed, Emilio Goldenhersch, Paul OCallaghan, Li-jia Li, Mohamed Elhoseiny
  • From Shortcuts to Balance: Attribution Analysis of Speech-Text Feature Utilization in Distinguishing Original from Machine-Translated Texts
    YONGJIAN CHEN, Antonio Toral
  • DEBATE, TRAIN, EVOLVE: Self‑Evolution of Language Model Reasoning
    Gaurav Srivastava, Zhenyu Bi, Meng Lu, Xuan Wang
  • From Chat Logs to Collective Insights: Aggregative Question Answering
    Wentao Zhang, Woojeong Kim, Yuntian Deng
  • A Text-Based Recommender System that Leverages Explicit Affective State Preferences
    Tonmoy Hasan, Razvan Bunescu
  • CARE: Multilingual Human Preference Learning for Cultural Awareness
    Geyang Guo, Tarek Naous, Hiromi Wakaki, Yukiko Nishimura, Yuki Mitsufuji, Alan Ritter, Wei Xu
  • Multilingual Dialogue Generation and Localization with Dialogue Act Scripting
    Justin Vasselli, Eunike Andriani Kardinata, Yusuke Sakai, Taro Watanabe
  • SUE: Sparsity-based Uncertainty Estimation via Sparse Dictionary Learning
    Tamás Ficsor, Gábor Berend
  • Planning-Aware Code Infilling via Horizon-Length Prediction
    Yifeng Ding, Hantian Ding, Shiqi Wang, Qing Sun, Varun Kumar, Zijian Wang
  • SinhalaMMLU: A Comprehensive Benchmark for Evaluating Multitask Language Understanding in Sinhala
    Ashmari Pramodya, Nirasha Nelki, Heshan Shalinda, Chamila Liyanage, Yusuke Sakai, Randil Pushpananda, Ruvan Weerasinghe, Hidetaka Kamigaito, Taro Watanabe
  • OG-RAG: Ontology-grounded retrieval-augmented generation for large language models
    Kartik Sharma, Peeyush Kumar, Yunqing Li
  • Convergence and Divergence of Language Models under Different Random Seeds
    Finlay Fehlauer, Kyle Mahowald, Tiago Pimentel
  • Analyzing and Modeling LLM Response Lengths with Extreme Value Theory: Anchoring Effects and Hybrid Distributions
    Liuxuan Jiao, Chen Gao, Yiqian Yang, Chenliang Zhou, YiXian Huang, Yong Li, Xinlei Chen
  • Language Models Identify Ambiguities and Exploit Loopholes
    Jio Choi, Mohit Bansal, Elias Stengel-Eskin
  • Benchmarking LLMs for Translating Classical Chinese Poetry: Evaluating Adequacy, Fluency, and Elegance
    Andong Chen, Lianzhang Lou, Kehai Chen, Xuefeng Bai, Yang Xiang, Muyun Yang, Tiejun Zhao, Min Zhang
  • AraEval: An Arabic Multi-Task Evaluation Suite for Large Language Models
    Alhanoof Althnian, Norah A. Alzahrani, Shaykhah Z. Alsubaie, Eman Albilali, Ahmed Abdelali, Nouf M. Alotaibi, M Saiful Bari, Yazeed Alnumay, Abdulhamed Alothaimen, Maryam Saif, Shahad D. Alzaidi, Faisal Abdulrahman Mirza, Yousef Almushayqih, Mohammed Al Saleem, Ghadah Alabduljabbar, Abdulmohsen Al-Thubaity, Areeb Alowisheq, Nora Al-Twairesh
  • QUIDS: Query Intent Description for Exploratory Search via Dual Space Modeling
    Yumeng Wang, Xiuying Chen, Suzan Verberne
  • A Systematic Survey of Automatic Prompt Optimization Techniques
    Kiran Ramnath, Kang Zhou, Sheng Guan, Soumya Smruti Mishra, Xuan Qi, Zhengyuan Shen, Shuai Wang, Sangmin Woo, Sullam Jeoung, Yawei Wang, Haozhu Wang, Han Ding, Yuzhe Lu, Zhichao Xu, Yun Zhou, Balasubramaniam Srinivasan, Qiaojing Yan, Yueyan Chen, Haibo Ding, Panpan Xu, Lin Lee Cheong
  • Threading the Needle: Reweaving Chain-of-Thought Reasoning to Explain Human Label Variation
    Beiduo Chen, Yang Janet Liu, Anna Korhonen, Barbara Plank
  • MemInsight: Autonomous Memory Augmentation for LLM Agents
    Rana Salama, Jason Cai, Michelle Yuan, Anna Currey, MONICA SUNKARA, Yi Zhang, Yassine Benajiba
  • Breaking the Noise Barrier: LLM-Guided Semantic Filtering and Enhancement for Multi-Modal Entity Alignment
    Chenglong Lu, Chenxiao Li, Jingwei Cheng, Yongquan Ji, Guoqing Chen, Fu Zhang
  • ImpliRet: Benchmarking the Implicit Fact Retrieval Challenge
    Zeinab Sadat Taghavi, Ali Modarressi, Yunpu Ma, Hinrich Schuetze
  • No Need for Explanations: LLMs can implicitly learn from mistakes in-context
    Lisa Alazraki, Maximilian Mozes, Jon Ander Campos, Tan Yi-Chern, Marek Rei, Max Bartolo
  • MoVa: Towards Generalizable Classification of Human Morals and Values
    Ziyu Chen, Junfei Sun, Chenxi Li, Tuan Dung Nguyen, Jing Yao, Xiaoyuan Yi, Xing Xie, Chenhao Tan, Lexing Xie
  • GUI-Bee: Align GUI Action Grounding to Novel Environments via Autonomous Exploration
    Yue Fan, Handong Zhao, Ruiyi Zhang, Yu Shen, Xin Eric Wang, Gang Wu
  • Revealing and Mitigating the Challenge of Detecting Character Knowledge Errors in LLM Role-Playing
    Wenyuan Zhang, Shuaiyi Nie, Jiawei Sheng, Zefeng Zhang, Xinghua Zhang, Yongquan He, Tingwen Liu
  • Taking Notes Brings Focus? Towards Multi-Turn Multimodal Dialogue Learning
    jiazheng liu, Sipeng Zheng, Börje F. Karlsson, Zongqing Lu
  • Graph-Based Multi-Trait Essay Scoring
    Shengjie Li, Vincent Ng
  • Benchmarking LLMs on Semantic Overlap Summarization
    John Salvador, Naman Bansal, Mousumi Akter, Souvika Sarkar, Anupam Das, Santu Karmaker
  • N-CORE: N-View Consistency Regularization for Disentangled Representation Learning in Nonverbal Vocalizations
    Siddhant Bikram Shah, Kristina T. Johnson
  • Probability Distribution Collapse: A Critical Bottleneck to Compact Unsupervised Neural Grammar Induction
    Jinwook Park, Kangil Kim
  • Spatial Layouts in News Homepages Capture Human Preferences
    Alexander Spangher, Michael Vu, Arda Kaz, Naitian Zhou, Ben Welsh
  • KRETA: A Benchmark for Korean Reading and Reasoning in Text-Rich VQA Attuned to Diverse Visual Contexts
    Taebaek Hwang, Minseo Kim, Gisang Lee, Seonuk Kim, Hyunjun Eun
  • ReflAct: World-Grounded Decision Making in LLM Agents via Goal-State Reflection
    Jeonghye Kim, Sojeong Rhee, Minbeom Kim, Dohyung Kim, Sangmook Lee, Youngchul Sung, Kyomin Jung
  • CompassVerifier: A Unified and Robust Verifier for LLMs Evaluation and Outcome Reward
    Shudong Liu, Hongwei Liu, Junnan Liu, Linchen Xiao, Songyang Gao, Chengqi Lyu, Yuzhe Gu, Wenwei Zhang, Derek F. Wong, Songyang Zhang, Kai Chen
  • A Knowledge-driven Adaptive Collaboration of LLMs for Enhancing Medical Decision-making
    Xiao Wu, Ting-Zhu Huang, Liang-Jian Deng, Yanyuan Qiao, Imran Razzak, Yutong Xie
  • Castle: Causal Cascade Updates in Relational Databases with Large Language Models
    Yongye Su, Yucheng Zhang, Zeru Shi, Bruno Ribeiro, Elisa Bertino
  • Idiosyncratic Versus Normative Modeling of Atypical Speech Recognition: Dysarthric Case Studies
    Vishnu Raja, Adithya V Ganesan, Anand Syamkumar, Ritwik Banerjee, H. Schwartz
  • NESTFUL: A Benchmark for Evaluating LLMs on Nested Sequences of API Calls
    Kinjal Basu, Ibrahim Abdelaziz, Kiran Kate, Mayank Agarwal, Maxwell Crouse, Yara Rizk, Kelsey Bradford, Asim Munawar, Sadhana Kumaravel, Saurabh Goyal, Xin Wang, Luis A. Lastras, Pavan Kapanipathi
  • Benchmarking and Mitigating MCQA Selection Bias of Large Vision-Language Models
    Md. Atabuzzaman, Ali Asgarov, Chris Thomas
  • Can Large Language Models Unlock Novel Scientific Research Ideas?
    Sandeep Kumar, Tirthankar Ghosal, Vinayak Goyal, Asif Ekbal
  • Word Salad Chopper: Reasoning Models Waste A Ton Of Decoding Budget On Useless Repetitions, Self-Knowingly
    Wenya Xie, Shaochen Zhong, Hoang Anh Duy Le, Zhaozhuo Xu, Jianwen Xie, Zirui Liu
  • DIWALI - Diversity and Inclusivity aWare cuLture specific Items for India: Dataset and Assessment of LLMs for Cultural Text Adaptation in Indian Context
    Maharaj Brahma, Pramit Sahoo, Maunendra Sankar Desarkar
  • SYNC: A Synthetic Long-Context Understanding Benchmark for Controlled Comparisons of Model Capabilities
    Shuyang Cao, Kaijian Zou, Lu Wang
  • OpenNER 1.0: Standardized Open-Access Named Entity Recognition Datasets in 50+ Languages
    Chester Palen-Michel, Maxwell Pickering, Maya Kruse, Jonne Sälevä, Constantine Lignos
  • Mondrian: A Framework for Logical Abstract (Re)Structuring
    Elizabeth Grace Orwig, Shinwoo Park, Hyundong Jin, Yo-Sub Han
  • Case-Based Decision-Theoretic Decoding with Quality Memories
    Hiroyuki Deguchi, Masaaki Nagata
  • PRIME: Large Language Model Personalization with Cognitive Dual-Memory and Personalized Thought Process
    Xinliang Frederick Zhang, Nicholas Beauchamp, Lu Wang
  • Mechanisms vs. Outcomes: Probing for Syntax Fails to Explain Performance on Targeted Syntactic Evaluations
    Ananth Agarwal, Jasper Jian, Christopher D Manning, Shikhar Murty
  • Image Difference Captioning via Adversarial Preference Optimization
    Zihan Huang, Junda Wu, Rohan Surana, Tong Yu, David Arbour, Ritwik Sinha, Julian McAuley
  • seqBench: A Tunable Benchmark to Quantify Sequential Reasoning Limits of LLMs
    Mohammad Ramezanali, Mo Vazifeh, Paolo Santi
  • NormGenesis: Multicultural Dialogue Generation via Exemplar-Guided Social Norm Modeling and Violation Recovery
    Minki Hong, Jangho Choi, Jihie Kim
  • SATBench: Benchmarking LLMs’ Logical Reasoning via Automated Puzzle Generation from SAT Formulas
    Anjiang Wei, Yuheng Wu, Yingjia Wan, Tarun Suresh, Huanmi Tan, Zhanke Zhou, Sanmi Koyejo, Ke Wang, Alex Aiken
  • Data Descriptions from Large Language Models with Influence Estimation
    Chaeri Kim, Jaeyeon Bae, Taehwan Kim
  • EquiBench: Benchmarking Large Language Models’ Reasoning about Program Semantics via Equivalence Checking
    Anjiang Wei, Jiannan Cao, Ran Li, Hongyu Chen, Yuhui Zhang, Ziheng Wang, Yuan Liu, Thiago S. F. X. Teixeira, Diyi Yang, Ke Wang, Alex Aiken
  • MicroEdit: Neuron-level Knowledge Disentanglement and Localization in Lifelong Model Editing
    Shiqi Wang, Qi Wang, Runliang Niu, He Kong, Yi Chang
  • Do Large Language Models Understand Word Senses?
    Domenico Meconi, Simone Stirpe, Federico Martelli, Leonardo Lavalle, Roberto Navigli
  • Diverse, not Short: A Length-Controlled Data Selection Strategy for Improving Response Diversity of Language Models
    Vijeta Deshpande, Debasmita Ghose, John D Patterson, Roger E. Beaty, Anna Rumshisky
  • Uncovering the Bigger Picture: Comprehensive Event Understanding Via Diverse News Retrieval
    Yixuan Tang, Yuanyuan Shi, Yiqun Sun, Anthony Kum Hoe Tung
  • Personalized LLM Decoding via Contrasting Personal Preference
    Hyungjune Bu, ChanJoo Jung, Minjae Kang, Jaehyung Kim
  • The Missing Parts: Augmenting Fact Verification with Half Truth Detection
    Yixuan Tang, Jincheng Wang, Anthony Kum Hoe Tung
  • Toward Machine Translation Literacy: How Lay Users Perceive and Rely on Imperfect Translations
    Yimin Xiao, Yongle Zhang, Dayeon Ki, Calvin Bao, Marianna J. Martindale, Charlotte Vaughn, Ge Gao, Marine Carpuat
  • Personalization up to a Point: Why Personalized Content Moderation Needs Boundaries, and How We Can Enforce Them
    Emanuele Moscato, Tiancheng Hu, Matthias Orlikowski, Paul Röttger, Debora Nozza
  • MPCG: Multi-Round Persona-Conditioned Generation for Modeling the Evolution of Misinformation with LLMs
    Chong Jun Rong Brian, Yixuan Tang, Anthony Kum Hoe Tung
  • LiTEx: A Linguistic Taxonomy of Explanations for Understanding Within-Label Variation in Natural Language Inference
    Pingjun Hong, Beiduo Chen, Siyao Peng, Marie-Catherine de Marneffe, Barbara Plank
  • LiteraryQA: Towards Effective Evaluation of Long-document Narrative QA
    Tommaso Bonomo, Luca Gioffré, Roberto Navigli
  • FillerSpeech: Towards Human-Like Text-to-Speech Synthesis with Filler Insertion and Filler Style Control
    Seung-Bin Kim, Jun-Hyeok Cha, Hyung-Seok Oh, Heejin Choi, Seong-Whan Lee
  • Multi-LMentry: Can Multilingual LLMs Solve Elementary Tasks Across Languages?
    Luca Moroni, Javier Aula-Blasco, Simone Conia, Irene Baucells, Naiara Perez, Silvia Paniagua Suárez, Anna Sallés, Malte Ostendorff, Júlia Falcão, Guijin Son, Aitor Gonzalez-Agirre, Roberto Navigli, Marta Villegas
  • Lookahead Q-Cache: Achieving More Consistent KV Cache Eviction via Pseudo Query
    Yixuan Wang, Shiyu Ji, Yijun Liu, Yuzhuang Xu, Yang Xu, Qingfu Zhu, Wanxiang Che
  • PerspectiveMod: A Perspectivist Resource for Deliberative Moderation
    Eva Maria Vecchi, Neele Falk, Carlotta Quensel, Iman Jundi, Gabriella Lapesa
  • LoCt-Instruct: An Automatic Pipeline for Constructing Datasets of Logical Continuous Instructions
    Hongyu Sun, Yusuke Sakai, Haruki Sakajo, Shintaro Ozaki, Kazuki Hayashi, Hidetaka Kamigaito, Taro Watanabe
  • CodeSSM: Towards State Space Models for Code Understanding
    Shweta Verma, Abhinav Anand, Mira Mezini
  • EduAdapt: A Question Answer Benchmark Dataset for Evaluating Grade-Level Adaptability in LLMs
    Numaan Naeem, Abdellah EL MEKKI, Muhammad Abdul-Mageed
  • xCoRe: Cross-context Coreference Resolution
    Giuliano Martinelli, Bruno Gatti, Roberto Navigli
  • Retrieval-Augmented Generation with Estimation of Source Reliability
    Jeongyeon Hwang, Junyoung Park, Hyejin Park, Dongwoo Kim, Sangdon Park, Jungseul Ok
  • NitiBench: Benchmarking LLM Frameworks on Thai Legal Question Answering Capabilities
    Pawitsapak Akarajaradwong, Pirat Pothavorn, Chompakorn Chaksangchaichot, Panuthep Tasawong, Thitiwat Nopparatbundit, Keerakiat Pratai, Sarana Nutanong
  • From Input Perception to Predictive Insight: Modeling Model Blind Spots Before They Become Errors
    Maggie Mi, Aline Villavicencio, Nafise Sadat Moosavi
  • $\mathrm{Wojood^{Relations}}$: Arabic Relation Extraction Corpus and Modeling
    Alaa Aljabari, Mohammed Khalilia, Mustafa Jarrar
  • Conflicting Needles in a Haystack: How LLMs behave when faced with contradictory information
    Murathan Kurfali
  • Towards Event Extraction with Massive Types: LLM-based Collaborative Annotation and Partitioning Extraction
    Wenxuan Liu, Zixuan Li, Long Bai, Yuxin Zuo, Daozhu Xu, Xiaolong Jin, Jiafeng Guo, Xueqi Cheng
  • Liaozhai through the Looking-Glass: On Paratextual Explicitation of Culture-Bound Terms in Machine Translation
    Sherrie Shen, Weixuan Wang, Alexandra Birch
  • Concept-pedia: a Wide-coverage Semantically-annotated Multimodal Dataset
    Karim Ghonim, Andrei Stefan Bejgu, Alberte Fernández-Castro, Roberto Navigli
  • RAED: Retrieval-Augmented Entity Description Generation for Emerging Entity Linking and Disambiguation
    Karim Ghonim, Pere-Lluís Huguet Cabot, Riccardo Orlando, Roberto Navigli
  • Personalized Language Models via Privacy-Preserving Evolutionary Model Merging
    Kyuyoung Kim, Jinwoo Shin, Jaehyung Kim
  • Aligning Text/Speech Representations from Multimodal Models with MEG Brain Activity During Listening
    Padakanti Srijith, Khushbu Pahwa, Radhika Mamidi, Bapi Raju Surampudi, Manish Gupta, SUBBA REDDY OOTA
  • STARQA: A Question Answering Dataset for Complex Analytical Reasoning over Structured Databases
    Mounica Maddela, Lingjue Xie, Daniel Preotiuc-Pietro, Mausam
  • Slim-SC: Thought Pruning for Efficient Scaling with Self-Consistency
    Colin Hong Fung Heng, Xu Guo, Anand Chaanan Singh, Esha Choukse, Dmitrii Ustiugov
  • Long Chain-of-Thought Fine-tuning via Understanding-to-Reasoning Transition
    Chenxin An, Zhihui Xie, Xiaonan Li, Ming Zhong, Shansan Gong, Lei Li, Jun Zhang, Jingjing Xu, Lingpeng Kong
  • Exploring Large Language Models for Detecting Mental Disorders
    Gleb Kuzmin, Petr Strepetov, Maksim Stankevich, Natalia Chudova, Artem Shelmanov, Ivan Smirnov
  • Efficient Real-time Refinement of Language Model Text Generation
    Joonho Ko, Jinheon Baek, Sung Ju Hwang
  • Reward-Weighted Sampling: Enhancing Non-Autoregressive Characteristics in Masked Diffusion LLMs
    Daehoon Gwak, Minseo Jung, Junwoo Park, Minho Park, ChaeHun Park, Junha Hyung, Jaegul Choo
  • AI Argues Differently: Distinct Argumentative and Linguistic Patterns of LLMs in Persuasive Contexts
    Esra Dönmez, Maximilian Maurer, Gabriella Lapesa, Agnieszka Falenska
  • TounsiBench: Benchmarking Large Language Models for Tunisian Arabic
    Souha Ben Hassine, Asma Arrak, Marouene Addhoum, Steven R Wilson
  • Moral Framing in Politics (MFiP): A new resource and models for moral framing
    Ines Rehbein, Ines Reinig, Simone Paolo Ponzetto
  • ReDepress: A Cognitive Framework for Detecting Depression Relapse from Social Media
    Aakash Kumar Agarwal, Saprativa Bhattacharjee, Mauli Rastogi, Jemima S. Jacob, Biplab Banerjee, Rashmi Gupta, Pushpak Bhattacharyya
  • iKnow-audio: Integrating Knowledge Graphs with Audio-Language Models
    Michel Olvera, Changhong Wang, Paraskevas Stamatiadis, Gaël Richard, Slim Essid
  • EduVidQA: Generating and Evaluating Long-form Answers to Student Questions based on Lecture Videos
    Sourjyadip Ray, Shubham Sharma, Somak Aditya, Pawan Goyal
  • The Illusion of Progress: Re-evaluating Hallucination Detection in LLMs
    Denis Janiak, Jakub Binkowski, Albert Sawczyn, Bogdan Gabrys, Ravid Shwartz-Ziv, Tomasz Jan Kajdanowicz
  • Turning Logic Against Itself: Probing Model Defenses Through Contrastive Questions
    Rachneet Singh Sachdeva, Rima Hazra, Iryna Gurevych
  • CHURRO: Making History Readable with an Open-Weight Large Vision-Language Model for High-Accuracy, Low-Cost Historical Text Recognition
    Sina Semnani, Han Zhang, Xinyan He, Merve Tekgurler, Monica Lam
  • Towards Author-informed NLP: Mind the Social Bias
    Inbar Cohen, Einat Minkov
  • Detecting Corpus-Level Knowledge Inconsistencies in Wikipedia with Large Language Models
    Sina Semnani, Jirayu Burapacheep, Arpandeep Khatua, Thanawan Atchariyachanvanit, Zheng Wang, Monica Lam
  • Leveraging Multilingual Training for Authorship Representation: Enhancing Generalization across Languages and Domains
    Junghwan Kim, Haotian Zhang, David Jurgens
  • DrFrattn: Directly Learn Adaptive Policy from Attention for Simultaneous Machine Translation
    Libo Zhao, Jing Li, Ziqian Zeng
  • The Sound of Syntax: Finetuning and Comprehensive Evaluation of Language Models for Speech Pathology
    Fagun Patel, Duc Quang Nguyen, Sang T. Truong, Jody Vaynshtok, Sanmi Koyejo, Nick Haber
  • NormXLogit: The Head-on-Top Never Lies
    Sina Abbasi, Mohammad Reza Modarres, Mohammad Taher Pilehvar
  • Doc2Chart: Intent-Driven Zero-Shot Chart Generation from Documents
    Akriti Jain, Pritika Ramu, Aparna Garimella, Apoorv Saxena
  • Breaking Agents: Compromising Autonomous LLM Agents Through Malfunction Amplification
    Boyang Zhang, Yicong Tan, Yun Shen, Ahmed Salem, Michael Backes, Savvas Zannettou, Yang Zhang
  • FoREST: Frame of Reference Evaluation in Spatial Reasoning Tasks
    Tanawan Premsri, Parisa Kordjamshidi
  • Multilinguality Does not Make Sense: Investigating Factors Behind Zero-Shot Cross-Lingual Transfer in Sense-Aware Tasks
    Roksana Goworek, Haim Dubossarsky
  • Translating Domain-Specific Terminology in Typologically-Diverse Languages: A Study in Tax and Financial Education
    Arturo Oncevay, Elena Kochkina, Keshav Ramani, Toyin Aguda, Simerjot Kaur, Charese Smiley
  • Train It and Forget It: Merge Lists are Unnecessary for BPE Inference in Language Models
    Tomohiro Sawada, Kartik Goyal
  • Spectral Scaling Laws in Language Models: \ emph{How Effectively Do Feed-Forward Networks Use Their Latent Space?}
    Nandan Kumar Jha, Brandon Reagen
  • TLUE: A Tibetan Language Understanding Evaluation Benchmark
    Fan Gao, Cheng Huang, Yutong Liu, Nyima Tashi, Xiangxiang Wang, Thupten Tsering, BAN Ma-bao, RENZENG Duojie, Gadeng Luosang, Rinchen Dongrub, Dorje Tashi, XiaoFengCD, Yongbin Yu, Hao Wang
  • Retrieving Support to Rank Answers in Open-Domain Question Answering
    Zeyu Zhang, Alessandro Moschitti, Thuy Vu
  • Trojsten Benchmark: Evaluating LLM Problem-Solving in Slovak STEM Competition Problems
    Adam Zahradník, Marek Suppa
  • BRSpeech-DF: A Deep Fake Synthetic Speech Dataset for Portuguese Zero-Shot TTS
    Alexandre Costa Ferro Filho, Rafaello Virgilli, Lucas Alcantara Souza, F S de Oliveira, Marcelo Henrique Lopes Ferreira, Daniel Tunnermann, Gustavo dos Reis Oliveira, Anderson Da Silva Soares, Arlindo Rodrigues Galvão Filho
  • A Simple Yet Effective Method for Non-Refusing Context Relevant Fine-grained Safety Steering in LLMs
    Shaona Ghosh, Amrita Bhattacharjee, Yftah Ziser, Christopher Parisien
  • Statistical and Neural Methods for Hawaiian Orthography Modernization
    Jaden Kapali, Keaton Williamson, Winston Wu
  • so much depends / upon / a whitespace: Why Whitespace Matters for Poets and LLMs
    Sriharsh Bhyravajjula, Melanie Walsh, Anna Preus, Maria Antoniak
  • Certified Mitigation of Worst-Case LLM Copyright Infringement
    Jingyu Zhang, Jiacan Yu, Marc Marone, Benjamin Van Durme, Daniel Khashabi
  • Quantifying Logical Consistency in Transformers via Query-Key Alignment
    Eduard Tulchinskii, Laida Kushnareva, Anastasia Voznyuk, Andrei Andriiainen, Irina Piontkovskaya, Evgeny Burnaev, Serguei Barannikov
  • SimulatorArena: Are User Simulators Reliable Proxies for Multi-Turn Evaluation of AI Assistants?
    Yao Dou, Michel Galley, Baolin Peng, Chris Kedzie, Weixin Cai, Alan Ritter, Chris Quirk, Wei Xu, Jianfeng Gao
  • CourtReasoner: Can LLM Agents Reason Like Judges?
    Simeng Han, Yoshiki Takashima, Shannon Zejiang Shen, Chen Liu, Yixin Liu, Roque K. Thuo, Sonia Knowlton, Ruzica Piskac, Scott J Shapiro, Arman Cohan
  • Not Your Typical Government Tipline: LLM-Assisted Routing of Environmental Protection Agency Citizen Tips
    Sharanya Majumder, Zehua Li, Derek Ouyang, Kit T Rodolfa, Elena Eneva, Julian Nyarko, Daniel E. Ho
  • Retracing the Past: LLMs Emit Training Data When They Get Lost
    Myeongseob Ko, Nikhil Reddy Billa, Adam Nguyen, Charles Fleming, Ming Jin, Ruoxi Jia
  • Layer-wise Minimal Pair Probing Reveals Contextual Grammatical-Conceptual Hierarchy in Speech Representations
    Linyang He, Qiaolin Wang, Xilin Jiang, Nima Mesgarani
  • Current Semantic-change Quantification Methods Struggle with Semantic Change Discovery in the Wild
    Khonzoda Umarova, Lillian Lee, Laerdon Kim
  • Evaluating Large Language Models for Detecting Antisemitism
    Jay Patel, Hrudayangam Mehta, Jeremy Blackburn
  • D-RAG: Differentiable Retrieval-Augmented Generation for Knowledge Graph Question Answering
    Guangze Gao, Zixuan Li, Chunfeng Yuan, Jiawei Li, Wu Jianzhuo, Yuehao Zhang, Xiaolong Jin, Bing Li, Weiming Hu
  • Towards Robust Mathematical Reasoning
    Thang Luong, Hoang H Nguyen, Dawsen Hwang, Golnaz Ghiasi, Yuri Chervonyi, Insuk Seo, Garrett Bingham, Jonathan Lee, Swaroop Mishra, Alex Zhai, Huiyi Hu, Henryk Michalewski, Jimin Kim, Jeonghyun Ahn, Junhwi Bae, Quoc V Le, Junehyuk Jung
  • Table-LLM-Specialist: Language Model Specialists for Tables using Iterative Fine-tuning
    Junjie Xing, Yeye He, Mengyu Zhou, Haoyu Dong, Shi Han, Dongmei Zhang, Surajit Chaudhuri
  • Introducing Spotlight: A Novel Approach for Generating Captivating Key Information from Documents
    Ankan Mullick, Sombit Bose, Rounak Saha, Ayan Kumar Bhowmick, Aditya Vempaty, Prasenjit Dey, Ravi Kokku, Pawan Goyal, Niloy Ganguly
  • Argument Summarization and its Evaluation in the Era of Large Language Models
    Moritz Altemeyer, Steffen Eger, Johannes Daxenberger, Yanran Chen, Tim Altendorf, Philipp Cimiano, Benjamin Schiller
  • Computational Analysis of Conversation Dynamics through Participant Responsivity
    Margaret Hughes, Brandon Roy, Elinor Poole-Dayan, Deb Roy, Jad Kabbara
  • AMQ: Enabling AutoML for Mixed-precision Weight-Only Quantization of Large Language Models
    Sangjun Lee, Seung-taek Woo, Jun-gyu Jin, Changhun Lee, Eunhyeok Park
  • Beyond Averages: Learning with Annotator Disagreement in STS
    Alejandro Benito-Santos, Adrian Ghajari
  • Dipper: Diversity in Prompts for Producing Large Language Model Ensembles in Reasoning Tasks
    Gregory Kang Ruey Lau, Wenyang Hu, Liu Diwen, Chen Jizhuo, See-Kiong Ng, Bryan Kian Hsiang Low
  • Constrained Non-negative Matrix Factorization for Guided Topic Modeling of Minority Topics
    Seyedeh Fatemeh Ebrahimi, Jaakko Peltonen
  • Which Word Orders Facilitate Length Generalization in LMs? An Investigation with GCG-Based Artificial Languages
    Nadine El-Naggar, Tatsuki Kuribayashi, Ted Briscoe
  • Training compute-optimal transformer encoder models
    Megi Dervishi, Alexandre Allauzen, Gabriel Synnaeve, Yann LeCun
  • Mind the Blind Spots: A Focus-Level Evaluation Framework for LLM Reviews
    Hyungyu Shin, Jingyu Tang, Yoonjoo Lee, Nayoung Kim, Hyunseung Lim, Ji Yong Cho, Hwajung Hong, Moontae Lee, Juho Kim
  • Seeing Through Words, Speaking Through Pixels: Deep Representational Alignment Between Vision and Language Models
    Zoe Wanying He, Sean Trott, Meenakshi Khosla
  • Unconditional Truthfulness: Learning Unconditional Uncertainty of Large Language Models
    Artem Vazhentsev, Ekaterina Fadeeva, Rui Xing, Gleb Kuzmin, Ivan Lazichny, Alexander Panchenko, Preslav Nakov, Timothy Baldwin, Maxim Panov, Artem Shelmanov
  • Chinese Toxic Language Mitigation via Sentiment Polarity Consistent Rewrites
    Xintong Wang, Yixiao Liu, Jingheng Pan, Liang Ding, Longyue Wang, Chris Biemann
  • A Head to Predict and a Head to Question: Pre-trained Uncertainty Quantification Heads for Hallucination Detection in LLM Outputs
    Artem Shelmanov, Ekaterina Fadeeva, Akim Tsvigun, Ivan Tsvigun, Zhuohan Xie, Igor Kiselev, Nico Daheim, Caiqi Zhang, Artem Vazhentsev, Mrinmaya Sachan, Preslav Nakov, Timothy Baldwin
  • Generative or Discriminative? Revisiting Text Classification in the Era of Transformers
    Siva Rajesh Kasa, Karan Gupta, Sumegh Roychowdhury, Ashutosh Kumar, Yaswanth Biruduraju, SANTHOSH KUMAR KASA, Pattisapu Nikhil Priyatam, Arindam Bhattacharya, Shailendra Agarwal, Vijay huddar
  • Measuring Chain of Thought Faithfulness by Unlearning Reasoning Steps
    Martin Tutek, Fateme Hashemi Chaleshtori, Ana Marasovic, Yonatan Belinkov
  • LingGym: How Far Are LLMs from Thinking Like Field Linguists?
    Changbing Yang, Franklin Ma, Freda Shi, Jian Zhu
  • MiCRo: Mixture Modeling and Context-aware Routing for Personalized Preference Learning
    Jingyan Shen, Jiarui Yao, Rui Yang, Yifan Sun, Feng Luo, Rui Pan, Tong Zhang, Han Zhao
  • Autoformalization in the Wild: Assessing LLMs on Real-World Mathematical Definitions
    Lan Zhang, Marco Valentino, Andre Freitas
  • InterIDEAS: Philosophical Intertextuality via LLMs
    Yue Yang, Yinzhi Xu, Chenghao Huang, JohnMichael Jurgensen, Han Hu, Hao Wang
  • Infini-gram mini: Exact n-gram Search at the Internet Scale with FM-Index
    Hao Xu, Jiacheng Liu, Yejin Choi, Noah A. Smith, Hannaneh Hajishirzi
  • Causal Interventions Reveal Shared Structure Across English Filler–Gap Constructions
    Sasha Boguraev, Christopher Potts, Kyle Mahowald
  • Mind the Value-Action Gap: Do LLMs Act in Alignment with Their Values?
    Hua Shen, Nicholas Clark, Tanu Mitra
  • AccessEval: Benchmarking Disability Bias in Large Language Models
    Srikant Panda, Amit Agarwal, Hitesh Laxmichand Patel
  • DiscoSG: Towards Discourse-Level Text Scene Graph Parsing through Iterative Graph Refinement
    Shaoqing Lin, Chong Teng, Fei Li, Donghong Ji, Lizhen Qu, Zhuang Li