Accepted Industry Track Papers
- RAVEN++: Pinpointing Fine-Grained Violations in Advertisement Videos with Active Reinforcement Reasoning
Deyi Ji, Yuekui Yang, Haiyang Wu, Shaogang Tang, Peng Shu, Xudong Chen, Shaoping Ma, Tianrun Chen, Lanyun Zhu
- SAGE: A Generic Framework for LLM Safety Evaluation
Madhur Jindal, Hari Shrawgi, Parag Agrawal, Sandipan Dandapat
- CRAB: A Benchmark for Evaluating Curation of Retrieval-Augmented LLMs in Biomedicine
Hanmeng Zhong, Linqing Chen, weilei wang, Wentao Wu
- VENUS: A VLLM-driven Video Content Moderation System for Real Application Scenarios
Minyi Zhao, YI LIU, Jianfeng Wen, Boshen Zhang, Hailang Chang, Zhiheng ouyang, Jie Wang, wensong he, Shuigeng Zhou
- Text2MDT: Extracting Decision Trees from Medical Texts Using Large Language Models
Yuheng Li, Wei Zhu, Jiechao Gao
- PolyNorm: Few-Shot LLM-Based Text Normalization for Text-to-Speech
Michel Wong, Ali Alshehri, Sophia Kao, Haotian He
- Audio Query Handling System with Integrated Expert Models and Contextual Understanding
NAVEEN VAKADA, Yinyi Guo, Erik Visser, Arvind Krishna Sridhar
- Generative Reviewer Agents: Scalable Simulacra of Peer Review
Nicolas Bougie, Narimawa Watanabe
- Aligning LLMs for Multilingual Consistency in Enterprise Applications
Amit Agarwal, Hansa Meghwani, Hitesh Laxmichand Patel, Tao Sheng, Sujith Ravi, Dan Roth
- RCI: A Score for Evaluating Global and Local Reasoning in Multimodal Benchmarks
Amit Agarwal, Hitesh Laxmichand Patel, Srikant Panda, Hansa Meghwani, Jyotika Singh, Karan Dua, Paul Li, Tao Sheng, Sujith Ravi, Dan Roth
- LP Data Pipeline: Lightweight, Purpose-driven Data Pipeline for Large Language Models
Yungi Kim, Hyunsoo Ha, Seonghoon Yang, Sukyung Lee, Jihoo Kim, Chanjun Park
- Toward Reliable Clinical Coding with Language Models: Verification and Lightweight Adaptation
Moy Yuan, Han-Chin Shing, Mitch Strong, Chaitanya Shivade
- Enhancing Talent Search Ranking with Role-Aware Expert Mixtures and LLM-based Fine-Grained Job Descriptions
JihangLi, Bing Xu, Minping Chen, Zulong Chen, Chuanfei Xu, Suyu Liu, Zeyi Wen
- PCRI: Measuring Context Robustness in Multimodal Models for Enterprise Applications
Hitesh Laxmichand Patel, Srikant Panda, Amit Agarwal, Hansa Meghwani, Karan Dua, Paul Li, Tao Sheng, Sujith Ravi, Dan Roth
- CitySim: Modeling Urban Behaviors and City Dynamics with Large-Scale LLM-Driven Agent Simulation
Nicolas Bougie, Narimawa Watanabe
- Evaluating Conversational Agents with Persona-driven User Simulations based on Large Language Models: A Sales Bot Case Study
Justyna Gromada, Alicja Kasicka, Ewa Komkowska, Lukasz Krajewski, Natalia Krawczyk, Morgan Veyret, Bartosz Przybył, Lina M. Rojas-Barahona, Michał K. Szczerbak
- Mirror in the Model: Ad Banner Image Generation via Reflective Multi-LLM and Multi-modal Agents
Zhao Wang, Bowen Chen, Yotaro Shimose, Sota Moriyama, Heng Wang, Shingo Takamatsu
- PatternRAG: Leveraging Product Catalog Patterns for Multilingual E-commerce Product Attribute Prediction
Bryan Zhang, Suleiman A. Khan, Stephan Walter
- ECom-Bench: Can LLM Agent Resolve Real-World E-commerce Customer Support Issues?
Haoxin Wang, Xianhan Peng, xucheng Huang, Yizhe Huang, Ming Gong, chenghan Yang, Yang Liu, ling Jiang
- ProCut: LLM Prompt Compression via Attribution Estimation
Zhentao Xu, Fengyi Li, Albert C. Chen, Xiaofeng Wang
- A Reasoner for Real-World Event Detection: Scaling Reinforcement Learning via Adaptive Perplexity-Aware Sampling Strategy
Xiaoyun Zhang, Jingqing Ruan, Xing Ma, Yawen Zhu, Jiansong Chen, Ke Zeng, Xunliang Cai
- Detecting Omissions in LLM-Generated Medical Summaries
Achir Oukelmoun, Nasredine Semmar, Gaël de Chalendar, Clément cormi, Mariame Oukelmoun, Eric Vibert, Marc-Antoine Allard
- LEAF: Learning and Evaluation Augmented by Fact-Checking to Improve Factualness in Large Language Models
Hieu Tran, Junda Wang, Yujan Ting, hong yu, Weijing Huang, Terrence Chen
- ReAct Meets Industrial IoT: Language Agents for Data Access
James T Rayfield, Shuxin Lin, Nianjun Zhou, Dhaval C Patel
- ProductAgent: Benchmarking Conversational Product Search Agent with Asking Clarification Questions
Jingheng Ye, Yong Jiang, Xiaobin Wang, Yinghui Li, Yangning Li, Pengjun Xie, Fei Huang
- MADS: Multi-Agent Dialogue Simulation for Diverse and Persuasive Training Data Generation
Mingjin Li, Yu Liu, Huayi Liu, Hongguang Zhang, Xiang Ye
- On-device System of Compositional Multi-tasking in Large Language Models
Ondrej Bohdal, Konstantinos Theodosiadis, Asterios Mpatziakas, Dimitrios Filippidis, Iro Spyrou, Christos Zonios, Anastasios Drosou, Dimosthenis Ioannidis, Kyenghun Lee, Jijoong Moon, Hyeonmok Ko, Mete Ozay, Umberto Michieli
- Select-then-Route : Taxonomy guided Routing for LLMs
Soham Shah, Kumar Shridhar
- FABRIC: Fully-Automated Broad Intent Categorization in E-commerce
Anna Tigunova, Philipp Schmidt, Damla Ezgi Akcora
- MKT: A Multi-Stage Knowledge Transfer Framework to Mitigate Catastrophic Forgetting in Multi-Domain Chinese Spelling Correction
Peng Xing, Yinghui Li, Shirong Ma, Xinnian Liang, Haojing Huang, Yangning Li, Shu-Yu Guo, Hai-Tao Zheng, Wenhao Jiang, Ying Shen
- End-to-End Aspect-Guided Review Summarization at Scale
Ilya Boytsov, Vinny DeGenova, Mikhail Balyasin, Joseph Walt, Caitlin Eusden, Marie-Claire Rochat, Margaret Pierson
- SLOT: Structuring the Output of Large Language Models
Zhengyuan Shen, Darren Yow-Bang Wang, Soumya Smruti Mishra, Zhichao Xu, Yifei Teng, Haibo Ding
- QuackIR: Retrieval in DuckDB and Other Relational Database Management Systems
Yijun Ge, Zijian Chen, Jimmy Lin
- Benchmarking Deep Search over Heterogeneous Enterprise Data
Prafulla Kumar Choubey, XIANGYU PENG, Shilpa Bhagavath, Kung-Hsiang Huang, Caiming Xiong, Chien-Sheng Wu
- RLHF Algorithms Ranked: An Extensive Evaluation Across Diverse Tasks, Rewards, and Hyperparameters
Lucas Spangher, Rama Kumar Pasumarthi, Nick Masiewicki, William F. Arnold, Aditi Kaushal, Dale Johnson, Peter Grabowski, Eugene Ie
- Predicting Cross-lingual Trends in Microblogs
Satoshi Akasaki
- Generating Fine Details of Entity Interactions
Xinyi Gu, Jiayuan Mao
- AutoCVSS: Assessing the Performance of LLMs for Automated Software Vulnerability Scoring
Davide Sanvito, Giovanni Arriciati, Giuseppe Siracusano, Roberto Bifulco, Michele Carminati
- SFAL: Semantic-Functional Alignment Scores for Distributional Evaluation of Auto-Interpretability in Sparse Autoencoders
Fabio Mercorio, Filippo Pallucchini, Daniele Potertì, Antonio Serino, Andrea Seveso
- Just One is Enough: An Existence-based Alignment Check for Robust Japanese Pronunciation Estimation
Hayate Nakano, Nobuhiro Kaji
- Towards Enforcing Company Policy Adherence in Agentic Workflows
Naama Zwerdling, David Boaz, Ella Rabinovich, Guy Uziel, David Amid, Ateret Anaby Tavor
- Learning to Translate Ambiguous Terminology by Preference Optimization on Post-Edits
Nathaniel Berger, Johannes Eschbach-Dymanus, Miriam Exel, Matthias Huck, Stefan Riezler
- More Data or Better Data? A Critical Analysis of Data Selection and Synthesis for Mathematical Reasoning
Yike Zhao, Simin Guo, Ziqing Yang, Shifan Han, Dahua Lin, Fei Tan
- SRS-Stories: Vocabulary-constrained multilingual story generation for language learning
Wiktor Kamzela, Mateusz Lango, Ondrej Dusek
- Banking Done Right: Redefining Retail Banking with Language-Centric AI
Xin Jie Chua, Jeraelyn Ming Li Tan, Jia Xuan Tan, Soon Chang Poh, Yi Xian Goh, Debbie Hui Tian Choong, Foong Chee Mun, Sze Jue Yang, Chee Seng Chan
- Graph of Attacks with Pruning: Optimizing Stealthy Jailbreak Prompt Generation for Enhanced LLM Content Moderation
Daniel Schwartz, Dmitriy Bespalov, Zhe Wang, Ninad Kulkarni, Yanjun Qi
- Structural Reward Model: Enhancing Interpretability, Efficiency, and Scalability in Reward Modeling
Xiaoyu Liu, Di Liang, Hongyu Shan, Peiyang Liu, Yonghao Liu, Muling Wu, Yuntao Li, Xianjie Wu, LI Miao, Jiangrong Shen, Minlong Peng
- Controllable Clustering with LLM-driven Embeddings
Kerria Pang-Naylor, Shivani Manivasagan, Aitong Zhong, Mehak Garg, Nicholas Mondello, Blake Buckner, Jonathan Chang, Khyati Mahajan, Masoud Hashemi, Fabio Casati
- SpeechLLMs for Large-scale Contextualized Zero-shot Slot Filling
Kadri Hacioglu, Manjunath K E, Andreas Stolcke
- NurseLLM: The First Specialized Language Model for Nursing
Md Tawkat Islam Khondaker, Julia Harrington, Shady Shehata
- Augmenting Compliance Guaranteed Conversational AI: Context-Aware Knowledge Base Expansion with LLMs and Combinatorial Optimization
Mengze Hong, Chen Jason Zhang, Di Jiang, Yuanqin He
- Memory-Efficient Backpropagation for Fine-Tuning LLMs on Resource-Constrained Mobile Devices
Congzheng Song, Xinyu Tang
- PrismRAG: Boosting RAG Factuality with Distractor Resilience and Strategized Reasoning
Mohammad Kachuee, Teja Gollapudi, Minseok Kim, Yin Huang, Kai Sun, Xiao Yang, Jiaqi Wang, Nirav Shah, Yue Liu, AARON COLAK, Anuj Kumar, Wen-tau Yih, Xin Luna Dong
- Benchmarking LLM Faithfulness in RAG with Evolving Leaderboards
Manveer Singh Tamber, Forrest Sheng Bao, Chenyu Xu, Ge Luo, Suleman Kazi, Minseok Bae, Miaoran Li, Ofer Mendelevitch, Renyi Qu, Jimmy Lin
- A Multi-Agent Framework for Quantitative Finance : An Application to Portfolio Management Analytics
Sayani Kundu, Dushyant Sahoo, Victor Li, Jennifer Rabowsky, Amit Varshney
- Group Preference Alignment: Customizing LLM Responses from In-Situ Conversations Only When Needed
Ishani Mondal, Jack W Stokes, Sujay Kumar Jauhar, Longqi Yang, Mengting Wan, Xiaofeng Xu, Xia Song, Jordan Lee Boyd-Graber, Jennifer Neville
- DASR: Distributed Adaptive Scene Recognition - A Multi-Agent Cloud-Edge Framework for Language-Guided Scene Detection
Can Cui, Yongkang Liu, Seyhan Ucar, Juntong Peng, Ahmadreza Moradipari, Maryam Khabazi, Ziran Wang
- Empowering Healthcare Practitioners with Language Models: Structuring Speech Transcripts in Two Real-World Clinical Applications
Jean-Philippe Corbeil, Asma Ben Abacha, George Michalopoulos, Phillip Swazinna, Miguel Del-Agua, Jérôme Tremblay, Akila Jeeson Daniel, Cari Bader, Kevin Cho, Pooja Krishnan, Nathan Bodenstab, Thomas Lin, Wenxuan Teng, Francois Beaulieu, Paul Vozila
- Leveraging the Power of Large Language Models in Entity Linking via Adaptive Routing and Targeted Reasoning
Yajie Li, Albert Galimov, Mitra Datta Ganapaneni, Pujitha Thejaswi, De Meng, Priyanshu Kumar, Saloni Potdar
- Can LLMs Narrate Tabular Data? An Evaluation Framework for Natural Language Representations of Text-to-SQL System Outputs
Jyotika Singh, Weiyi Sun, Amit Agarwal
- Enhancing Foundation Models in Transaction Understanding with LLM-based Sentence Embeddings
Xiran Fan, Zhimeng Jiang, Chin-Chia Michael Yeh, Yuzhong Chen, Yingtong Dou, Menghai Pan, Yan Zheng
- Agent vs. Agent: Automated Data Generation and Red-Teaming for Custom Agentic Workflows
Ninad Kulkarni, Xian Wu, Siddharth Varia, Dmitriy Bespalov
- Auto prompting without training labels: An LLM cascade for product quality assessment in e-commerce catalogs
Soham Satyadharma, Fatemeh Sheikholeslami, Swati Kaul, Aziz Umit Batur, Suleiman A. Khan
- Harmonizing Diverse Models: A Layer-wise Merging Strategy for Consistent Generation
Xujun Peng, Anoop Kumar, Jingyu Wu, Parker Glenn, Daben Liu
- Transparent Reference-free Automated Evaluation of Open-Ended User Survey Responses
Subin An, Yugyeong Ji, Junyoung Kim, Heejin Kook, Yang Lu, Josh Seltzer
- Datasets and Recipes for Video Temporal Grounding via Reinforcement Learning
Ruizhe Chen, Tianze Luo, Zhiting Fan, Heqing Zou, Zhaopeng Feng, Guiyang Xie, Hansheng Zhang, Wang Zhuochen, Zuozhu Liu, Zhang huaijian
- SEARA: An Automated Approach for Obtaining Optimal Retrievers
Zou Yuheng
- UniEDU: Toward Unified and Efficient Large Multimodal Models for Educational Tasks
Zhendong Chu, Jian Xie, Shen Wang, Zichao Wang, Qingsong Wen
- Truth, Trust, and Trouble: Medical AI on the Edge
Mohammad Anas Azeez, Rafiq Ali, Ebad Shabbir, Zohaib Hasan Siddiqui, Gautam Siddharth Kashyap, Jiechao Gao, Usman Naseem
- An Address Intelligence Framework for E-commerce Deliveries
Gokul Swamy, Aman Gulati, Srinivas Virinchi, Anoop Saladi
- LLMs on a Budget? Say HOLA
Zohaib Hasan Siddiqui, Jiechao Gao, Ebad Shabbir, Mohammad Anas Azeez, Rafiq Ali, Gautam Siddharth Kashyap, Usman Naseem
- LLM-Based Dialogue Labeling for Multiturn Adaptive RAG
Zhiyu Chen, biancen xie, SIDARTH SRINIVASAN, Manikandarajan Ramanathan, Rajashekar Maragoud, Qun Liu
- RAGulator: Lightweight Out-of-Context Detectors for Grounded Text Generation
Ian Poey, Jiajun Liu, Qishuai Zhong
- REIC: RAG-Enhanced Intent Classification at Scale
Ziji Zhang, Michael Yang, Zhiyu Chen, Yingying Zhuang, Shu-Ting Pi, Qun Liu, Rajashekar Maragoud, Vy Nguyen, Anurag Beniwal
- Mapping Smarter, Not Harder: A Test-Time Reinforcement Learning Agent That Improve Without Labels or Model Updates
Wen-Kwang Tsao, Yao-Ching Yu, Chien-Ming Huang
- On Assigning Product and Software Codes to Customer Service Requests with Large Language Models
Sujatha Das Gollapalli, Mouad Hakam, Mingzhe Du, See-Kiong Ng, Mohammed Hamzeh
- Reasoning-Enhanced Domain-Adaptive Pretraining of Multimodal Large Language Models for Short Video Content Moderation
Zixuan Wang, Yu Sun, Hongwei Wang, Baoyu Jing, Xiang Shen, Xin Dong, Zhuolin Hao, Hongyu Xiong, Yang Song
- GSID: Generative Semantic Indexing for E-Commerce Product Understanding
Haiyang Yang, Qinye Xie, qingheng zhang, Chen Li Yu, Huike Zou, ChengbaoLian, Shuguang Han, Fei Huang, jufeng chen, Bo Zheng
- Learning from LLM Agents: In-Context Generative Models for Text Casing in E-Commerce Ads
Yingxue Zhou, Tan Zhu, Tao Zeng, Wei Shen
- Auto-Weighted Group Relative Preference Optimization for Multi-Objective Text Generation Tasks
Yuki Ichihara, Yuu Jinnai
- Cost-Effective E-Commerce Catalog Translation at Scale Ensuring Named Entity Protection
Asier Gutiérrez-Fandiño, Jorge Yero Salazar, Clement Ruin, Alejandro Quintero-Roba, Shang Ravichandran, Jesus Perez-Martin, Pankaj Adsul, Suruchi Garg, Leonardo Lezcano
- InstaJudge: Aligning Judgment Bias of LLM-as-Judge with Humans in Industry Applications
Myeongjun Erik Jang, Fran Silavong
- TelAgentBench: A Multi-faceted Benchmark for Evaluating LLM-based Agents in Telecommunications
Sunwoo Lee, Dhammiko Arya, Seung-Mo Cho, Gyoung-eun Han, Seokyoung Hong, Daseong Jang, Wonbeom Jang, Sangjin Kim, SaeRom Kim, Seojin Lee, Sohee Park, Sereimony Sek, Injee Song, Sungbin yoon, Eric Davis
- Taming the Real-world Complexities in CPT E/M Coding with Large Language Models
Islam Nassar, Yang Lin, Yuan Jin, Rongxin Zhu, Chang Wei Tan, Zenan Zhai, Thanh Tien Vu, Xu Zhong, Long Duong, Yuan-Fang Li
- Classifier-Augmented Generation for Structured Workflow Prediction
Thomas Gschwind, Shramona Chakraborty, Nitin Gupta, Sameep Mehta
- Efficient and Versatile Model for Multilingual Information Retrieval of Islamic Text: Development and Deployment in Real-World Scenarios
Vera Pavlova, Mohammed Makhlouf
- AutoQual: An LLM Agent for Automated Discovery of Interpretable Features for Review Quality Assessment
Xiaochong Lan, Jie Feng, Yinxing Liu, Xinleishi, Yong Li
- JSON Whisperer: Efficient JSON Editing with LLMs
Sarel Duanis, Asnat Greenstein-Messica, Eliya Habba
- L4: Mutual Learning Helps Lifelong Language Learning
Jiyong Li, Dilshod Azizov, Shangsong Liang
- TTD-SQL: Tree-Guided Token Decoding for Efficient and Schema-Aware SQL Generation
Chetan Sharma, Ramasuri Narayanam, Soumyabrata Pal, Kalidas Yeturu, Shiv Kumar Saini, Koyel Mukherjee
- Spot the BlindSpots: Systematic Identification and Quantification of Fine-Grained LLM Biases in Contact Center Summaries
Kawin Mayilvaghanan, Siddhant Gupta, Ayush Kumar
- HierDiffuse: Progressive Diffusion for Robust Interest Fusion in CTR Prediction
Ziheng Ni, Congcong Liu, Yuying Chen, Zhiwei Fang, Changping Peng, Zhangang Lin, Ching Law, Jingping Shao
- TOBUGraph: Knowledge Graph-Based Retrieval for Enhanced LLM Performance Beyond RAG
Savini Kashmira, Jayanaka L. Dantanarayana, Joshua Brodsky, Ashish Mahendra, Yiping Kang, Krisztian Flautner, Lingjia Tang, Jason Mars
- Thinking with DistilQwen: A Tale of Four Distilled Reasoning and Reward Model Series
Wenrui Cai, Chengyu Wang, Junbing Yan, Jun Huang, Xiangzhong Fang
- Crossing Domains without Labels: Distant Supervision for Term Extraction
Elena Senger, Yuri Campbell, Rob van der Goot, Barbara Plank
- I-SEE: An Instruction-tuned, SOP-Enhanced Quality Evaluator for Product Content
Aniket Joshi, Cyrus Andre DSouza, Sejal Jain, Jitenkumar Babubhai Rana, Promod Yenigalla
- Computational Blueprints: Generating Isomorphic Math Problems with Large Language Models
Jeong-hoon Kim, Jinwoo Nam, Geunsik Jo
- Fin-ExBERT: User Intent based Text Extraction in Financial Context using Graph-Augmented BERT and trainable Plugin
Soumick Sarker, Abhijit Kumar Rai
- DecEx-RAG: Boosting Agentic Retrieval-Augmented Generation with Decision and Execution Optimization via Process Supervision
Yongqi Leng, Yikun Lei, Xikai Liu, Meizhi Zhong, Bojian Xiong, Yurong Zhang, Yan Gao, YIWU, Yao Hu, Deyi Xiong
- FLOW-BENCH: Towards Conversational Generation of Enterprise Workflows
Evelyn Duesterwald, Siyu Huo, Vatche Isahagian, Ritesh Kumar, Vinod Muthusamy, Punleuk Oum, K. R. Jayaram, Debashish Saha, Gegi Thomas, Praveen Venkateswaran
- Format Inertia: A Failure Mechanism of LLMs in Medical Pre-Consultation
Seungseop Lim, Gibaeg Kim, Wooseok Han, Jean Seo, Hyunkyung Lee, Jaehyo Yoo, Eunho Yang
- Extraction of Information Provision Activity Requirements from EU-Acquis
Jakub Piskorski, Dominik Skotarczak
- Contrastive Learning Using Graph Embeddings for Domain Adaptation of Language Models in the Process Industry
Anastasia Zhukova, Jonas Lührs, Christian E. Matt, Bela Gipp
- From Feedback to Checklists: Grounded Evaluation of AI-Generated Clinical Notes
Karen Zhou, John Michael Giorgi, Pranav Mani, Peng Xu, Davis Liang, Chenhao Tan
- FlexDoc: Parameterized Sampling for Diverse Multilingual Synthetic Documents for Training Document Understanding Models
Karan Dua, Hitesh Laxmichand Patel, Puneet Mittal, Ranjeet Gupta, Amit Agarwal, Praneet Pabolu, Srikant Panda, Hansa Meghwani
- GEMMAS: Graph-based Evaluation Metrics for Multi Agent Systems
Jisoo lee, Raeyoung Chang, Dongwook Kwon, Harmanpreet Singh, Nikhil Verma
- Knowledge-Augmented Question Error Correction for Chinese Question Answer System with QuestionRAG
Longpeng Qiu, Ting Li, Shuai Mao, Nan Yang, Xiaohui Yan
- SMART: Scalable Multilingual Approach for a Robust TOD System
Karan Malhotra, Arihant Jain, Purav Aggarwal, Anoop Saladi
- Think-Search-Patch: A Retrieval-Augmented Reasoning Framework for Repository-Level Code Repair
Bojian Xiong, Yikun Lei, Xikai Liu, Shaowei Zhang, Pengyun Zhu, Yan Liu, Yongqi Leng, Ling Shi, Meizhi Zhong, Yurong Zhang, Yan Gao, YIWU, Yao Hu, Deyi Xiong
- ASR-EC Benchmark: Evaluating Large Language Models on Chinese ASR Error Correction
Victor Junqiu Wei, Weicheng Wang, Di Jiang, Yuanfeng SONG, Lu Wang
- Bidirectional Reasoning Supervision for Multilingual Financial Decision Making
Muhammad Rafsan Kabir, Jawad Ibn Ahad, Robin Krambroeckers, Silvia Ahmed, M M Lutfe Elahi, Nabeel Mohammed, Shafin Rahman
- Automotive Document Labeling Using Large Language Models
Dang Van Thin, Cuong Xuan Chu, Christian Graf, Tobias Kaminski, Trung-Kien Tran
- Building Data-Driven Occupation Taxonomies: A Bottom-Up Multi-Stage Approach via Semantic Clustering and Multi-Agent Collaboration
Nan Li, Bo Kang, Tijl De Bie
- AutoPenBench: A Vulnerability Testing Benchmark for Generative Agents
Luca Gioacchini, Alexander Delsanto, Idilio Drago, Marco Mellia, Giuseppe Siracusano, Roberto Bifulco
- Enabling Self-Improving Agents to Learn at Test Time With Human-In-The-Loop Guidance
Yufei He, Ruoyu Li, Alex Chen, Yue Liu, Yulin Chen, Yuan Sui, Cheng Chen, Yi Zhu, Luca Luo, Frank Yang, Bryan Hooi
- Encouraging Good Processes Without the Need for Good Answers: Reinforcement Learning for LLM Agent Planning
Zhiwei Li, Yong Hu, Wenqing Wang
- Experience report: Implementing Machine Translation in a Regulated Industry
Marco Zocca, Per Fallgren, David Buffoni
- Multi-Task Pre-Finetuning of Lightweight Transformer Encoders for Text Classification and NER
Junyi Zhu, Savas Ozkan, Andrea Maracani, Sinan Mutlu, Cho Jung Min, Mete Ozay
- Scaling Down, Serving Fast: Compressing and Deploying Efficient LLMs for Recommendation Systems
Zhipeng Wang, Kayhan Behdin, Qingquan Song, Yun Dai, Ata Fatahibaarzi, Aman Gupta, Hejian Sang, Shao Tang, Gregory Dexter, Sirou Zhu, Siyu Zhu, Tejas Dharamsi, Vignesh Kothapalli, Zhoutong Fu, Yihan Cao, Pin-Lun Hsu, Fedor Borisyuk, Rahul Mazumder, Natesh S. Pillai, Luke Simon
- Group, Embed and Reason: A Hybrid LLM and Embedding Framework for Semantic Attribute Alignment
Shramona Chakraborty, Shashank Mujumdar, Nitin Gupta, Sameep Mehta, Ronen Kat, Itay Etelis, Mohamed Mahameed, Itai Guez, Rachel Brill
- STREAQ: Selective Tiered Routing for Effective and Affordable Contact Center Quality Assurance
Prajwal Sood, Rajdeep Agrawal, Mayank Sati, Digvijay Anil Ingle, Cijo George
- Divide, Link, and Conquer: Recall-oriented Schema Linking for NL-to-SQL via Question Decomposition
Kiran Pradeep, Kirushikesh DB, Nishtha Madaan, Sameep Mehta, Pushpak Bhattacharyya
- Declarative Techniques for NL Queries over Heterogeneous Data
Elham Khabiri, Jeffrey O. Kephart, Fenno F. Heath III, Srideepika Jayaraman, Yingjie Li, Fateh A. Tipu, Dhruv Shah, Achille Fokoue, Anu Bhamidipaty
- Taxonomy of Comprehensive Safety for Clinical Agents
Jean Seo, Hyunkyung Lee, Gibaeg Kim, Wooseok Han, Jaehyo Yoo, Seungseop Lim, Kihun Shin, Eunho Yang
- Dr. Copilot: A Multi-Agent Prompt Optimized Assistant for Improving Patient-Doctor Communication in Romanian
Andrei Niculae, Adrian Cosma, Cosmin Dumitrache, Emilian Radoi
- Data-Efficient Active Prompt Optimization for Memory-Enhanced Conversational Agents
Ervine Zheng, Yikuan Li, Geoffrey Jay Tso, Jilong Kuang
- CLARITY: Clinical Assistant for Routing, Inference, and Triage
Vladimir Shaposhnikov, Alexandr Nesterov, Ilia Kopanichuk, Ivan Bakulin, Zhelvakov Egor, Ruslan Abramov, Tsapieva Ekaterina Olegovna, Iaroslav Radionovich Bespalov, Dmitry V. Dylov, Ivan Oseledets
- HalluDetect: Detecting, Mitigating, and Benchmarking Hallucinations in Conversational Systems
Spandan Anaokar, Shrey Ganatra, Swapnil Bhattacharyya, Harshvivek Kashid, Shruthi N Nair, Reshma Sekhar, Siddharth Manohar, Rahul Hemrajani, Pushpak Bhattacharyya
- How Accurate Are LLMs at Multi-Question Answering on Conversational Transcripts?
Xiliang Zhu, Shi Zong, David Rossouw
- AI Knowledge Assist: An Automated Approach for the Creation of Knowledge Bases for Conversational AI Agents
Md Tahmid Rahman Laskar, Julien Bouvier Tremblay, Xue-Yong Fu, Cheng Chen, SHASHI BHUSHAN TN
- DACIP-RC: Domain Adaptive Continual Instruction Pre-Training via Reading Comprehension on Business Conversations
Elena Khasanova, Harsh Saini, Md Tahmid Rahman Laskar, Xue-Yong Fu, Cheng Chen, SHASHI BHUSHAN TN
- Analysis of Automated Document Relevance Annotation for Information Retrieval in Oil and Gas Industry
João Vitor Mariano Correia, Murilo Missano Bell, João Vitor Robiatti Amorim, Jonas Queiroz, Daniel Pedronette, Ivan Rizzo Guilherme, Felipe Lima de Oliveira
- Mind the Query: A Benchmark Dataset towards Text2Cypher Task
Vashu Chauhan, Shobhit Raj, Shashank Mujumdar, Avirup Saha, Anannay Jain
- Deploying Tiny LVLM Judges for Real-World Evaluation of Chart Models: Lessons Learned and Best Practices
Md Tahmid Rahman Laskar, Mohammed Saidul Islam, Ridwan Mahbub, Mizanur Rahman, Amran Bhuiyan, Israt Jahan, Mir Tafseer Nayeem, Shafiq Joty, Enamul Hoque, Jimmy Huang
- Agent-in-the-Loop: A Data Flywheel for Continuous Improvement in LLM-based Customer Support
Cen Zhao, Tiantian Zhang, Hanchen Su, Yufeng Zhang, Shaowei Su, Mingzhi Xu, Yu Liu, Wei Han, Jeremy Werner, Claire Na Cheng, Yashar Mehdad
- Beyond Pointwise Scores: Decomposed Criteria-Based Evaluation of LLM Responses
Fangyi Yu, Nabeel Seedat, Drahomira Herrmannova, Frank Schilder, Jonathan Richard Schwarz
- Scalable and Cost Effective High-Cardinality Classification with LLMs via Multi-View Label Representations and Retrieval Augmentation
Anup Pattnaik, Cijo George, Sasanka Vutla, Hamvir Dev, Jeevesh Nandan
- How to Fine-Tune Safely on a Budget: Model Adaptation Using Minimal Resources
Anh C. Pham, Mihir Thalanki, Michael Sun, Aditya Chaloo, Ankita Gupta, Tian Xia, Aditya Mate, Ehi Nosakhare, Soundararajan Srinivasan
- Zero-knowledge LLM hallucination detection and mitigation through fine-grained cross-model consistency
Aman Goel, Daniel Schwartz, Yanjun Qi
- Incremental Summarization for Customer Support via Progressive Note-Taking and Agent Feedback
Yisha Wu, Cen Zhao, Yuanpei Cao, Xiaoqing Xu, Yashar Mehdad, Mindy Ji, Claire Na Cheng
- R3D - Reasoning for Search Relevance using Reinforcement Learning and Distillation
Sourab Mangrulkar, Ankith M S, Vijay huddar, Atul Saroop, Sumit Negi, Rahul Bhagat
- LLMInit: A Free Lunch from Large Language Models for Selective Initialization of Recommendation
Weizhi Zhang, Liangwei Yang, Wooseong Yang, Henry Peng Zou, Yuqing Liu, Ke Xu, Sourav Medya, Philip S. Yu
- LLM Agents Implement an NLG System from Scratch: Building Interpretable Rule-Based RDF-to-Text Generators
Mateusz Lango, Ondrej Dusek
- Leveraging LLMs to Streamline the Review of Public Funding Applications
João DS Marques, Andre Vicente Duarte, André Mendes Marques de Carvalho, Gil Rocha, Bruno Martins, Arlindo L. Oliveira
- AdaSwarm: Adaptive Graph Structure Selection for LLM-based Multi-agent System
Hui Yi Leong, Yuheng Li, Yuqing Wu, Wei Zhu, Jiechao Gao
- ColMate: Contrastive Late Interaction and Masked Text for Multimodal Document Retrieval
Ahmed Masry, Megh Thakkar, Patrice Bechard, Sathwik Tejaswi Madhusudhan, Rabiul Awal, Shambhavi Mishra, Akshay Kalkunte Suresh, Srivatsava Daruru, Enamul Hoque, Spandana Gella, Torsten Scholak, Sai Rajeswar
- Confidence-Aware Reasoning: Optimizing Self-Guided Thinking Trajectories in Large Reasoning Models
Jiaxin Zhang
- Multi-Value-Product Retrieval-Augmented Generation for Industrial Product Attribute Value Identification
Huike Zou, Haiyang Yang, Yindu Su, Chen Li Yu, Qinye Xie, ChengbaoLian, qingheng zhang, Shuguang Han, Fei Huang, jufeng chen
- AttributeForge: An Agentic LLM Framework for Automated Product Schema Modeling
Yunhan Huang, Klevis Ramo, Andrea Iovine, Melvin Monteiro, Sedat Gokalp, Arjun Bakshi, Hasan Turalic, Arsh Kumar, Jona Neumeier, Rejaul Monir, Simon Hartmann, Mohamed Yakout
- VestaBench: An Embodied Benchmark for Safe Long-Horizon Planning Under Multi-Constraint and Adversarial Settings
Tanmana Sadhu, Yanan Chen, Ali Pesaranghader
- Advancing E-commerce Merchants Telemarketing with Synthetic Data-Driven LLMs
Qi Gou, Zehua Xia, Li Juan, Qingyang Zhao, wenjing yang
- Medical Knowledge-Guided Depression Detection on Social Media with Large Language Models
Xiaochong Lan, Zhiguang Han, Yiming Cheng, Li Sheng, Jie Feng, Chen Gao, Yong Li
- BullyBench: Youth \& Experts-in-the-loop Framework for \textit{Intrinsic} and \textit{Extrinsic} Cyberbullying NLP Benchmarking
Kanishk Verma, Sri Balaaji Natarajan Kalaivendan, Joachim Wagner, Arefeh Kazemi, Sinan Asci, Sayani Basak, Isobel Walsh, Darragh McCashin, Alexandros Poulis, Yelena Cherkasova, James O’Higgins Norman, Rebecca Umbach, Tijana Milosevic, Brian Davis
- Tagging-Augmented Generation: Assisting Language Models in Finding Intricate Knowledge In Long Contexts
Anwesan Pal, Karen Hovsepian, Tinghao Guo, Mengnan Zhao, Somendra Tripathi, George Mihaila, Nikos Kanakaris, Sumit Nigam
- DispatchQA: A Benchmark for Small Function Calling Language Models in E-Commerce Applications
Joachim Daiber, Victor Maricato, Ayan Sinha, Andrew Rabinovich
- Generalized Embedding Models for Industry 4.0 Applications
Christodoulos Constantinides, Shuxin Lin, Dhaval C Patel
- ECHO-LLaMA: Efficient Caching for High-Performance LLaMA Training
Hossein Rajabzadeh, Maryam Dialameh, Rezaul Karim, Omar Mohamed Awad, Hyock Ju Kwon, Boxing Chen, Walid Ahmed, Yang Liu
- Generating Spatial Knowledge Graphs from Automotive Diagrams for Question Answering
Steve Bakos, Chen Xing, Heidar Davoudi, Aijun An, Ron DiCarlantonio
- Enhancing Persuasive Dialogue Agents by Synthesizing Cross‑Disciplinary Communication Strategies
Shinnosuke Nozue, Yuto Nakano, Yotaro Watanabe, Meguru Takasaki, Shoji Moriya, Reina Akama, Jun Suzuki
- BIOPSY - Biomarkers In Oncology: Pipeline for Structured Yielding
Sanya A. Chetwani, Jaseem Mahmmdla
- DAST: Difficulty-Adaptive Slow-Thinking for Large Reasoning Models
Yi Shen, Jian Zhang, Jieyun Huang, Shuming Shi, Wenjing Zhang, Jiangze Yan, Ning Wang, Kai Wang, Zhaoxiang Liu, Shiguo Lian
- pEBR: A Probabilistic Approach to Embedding Based Retrieval
Han Zhang, Yunjiang Jiang, Mingming Li, Haowei Yuan, Yiming Qiu, Wen-Yun Yang
- Finding Diamonds in Conversation Haystacks: A Benchmark for Conversational Data Retrieval
Yohan Lee, Yongwoo Song, Sangyeop Kim
- **: Structuring Your Natural Language SOPs into Tailored Ambiguity-Resolved Code Templates**
*Sachin Kumar Giroh, Pushpendu Ghosh, Aryan Jain, Harshal Giridhari Paunikar, Aditi Rastogi, Promod Yenigalla*
- Recover-LoRA: Data-Free Accuracy Recovery of Degraded Language Models via Low-Rank Adaptation
Devleena Das, Rajeev Patwari, Ashish Sirasao
- Efficient Industrial sLLMs through Domain Adaptive Continual Pretraining: Method, Evaluation and Applications
Seonwu kim, Yohan Na, Kihun Kim, Hanhee Cho, Geun Lim, Mintae Kim, Seongik Park, Ki Hyun Kim, Youngsub Han, Byoung-Ki Jeon
- GRAFT: A Graph-based Flow-aware Agentic Framework for Document-level Machine Translation
Himanshu Dutta, Sunny Manchanda, Prakhar Bapat, Meva Ram Gurjar, Pushpak Bhattacharyya
- Recon, Answer, Verify: Agents in Search of Truth
Satyam Shukla, Himanshu Dutta, Pushpak Bhattacharyya
- T-VEC: A Telecom-Specific Vectorization Model with Enhanced Semantic Understanding via Deep Triplet Loss Fine-Tuning
Vignesh Ethiraj, Ashwath D, Sidhanth Menon, Divya Vijay, Vidhyakshaya Kannan
- PlanGPT-VL: Enhancing Urban Planning with Domain-Specific Vision-Language Models
He Zhu, Junyou Su, Minxin Chen, Wen Wang, Yijie Deng, Guanhua Chen, Wenjia Zhang
- IPR: Intelligent Prompt Routing with User-Controlled Quality-Cost Trade-offs
Aosong Feng, Zhichao Xu, Xian Wu, Kang Zhou, Sheng Guan, Yueyan Chen, Ninad Kulkarni, Yun Zhou, Balasubramaniam Srinivasan, Haibo Ding, Lin Lee Cheong
- Semantic Agreement Enables Efficient Open-Ended LLM Cascades
Duncan Soiffer, Steven Kolawole, Virginia Smith
- Lost in Pronunciation: Detecting Chinese Offensive Language Disguised by Phonetic Cloaking Replacement
Haotan Guo, Jianfei He, Jiayuan Ma, Hongbin Na, Zimu Wang, Haiyang Zhang, Qi Chen, Wei Wang, Zijing Shi, Tao Shen, Ling Chen
- Distilling Cross-Modal Knowledge into Domain-Specific Retrievers for Enhanced Industrial Document Understanding
Jinhyeong Lim, Jeongwan Shin, Seeun Lee, Seongdeok Kim, JOUNGSU CHOI, Jongbae Kim, Chun Hwan Jung, Youjin Kang
- Don’t Forget the Base Retriever! A Low-Resource Graph-based Retriever for Multi-hop Question Answering
Andre Melo, Enting Chen, Pavlos Vougiouklis, Chenxin Diao, Shriram Piramanayagam, Ruofei Lai, Jeff Z. Pan
- Beyond Dynamic Quantization: An Efficient Static Hierarchical Mix-precision Framework for Near-Lossless LLM Compression
Yi Zhang, Kai Zhang, Zheyang Li, Wenming Tan, Ye Ren, Jilin Hu
- STACKFEED: Structured Textual Actor-Critic Knowledge base editing with FEEDback
Shashank Kirtania, Naman Gupta, Priyanshu Gupta, Sumit Gulwani, Arun Iyer, Suresh Parthasarathy Iyengar, Arjun Radhakrishna, Sriram K. Rajamani, Gustavo Soares
- JaCorpTrack: Corporate History Event Extraction for Tracking Organizational Changes
Yuya Sawada, Hiroki Ouchi, Yuichiro Yasui, Hiroki Teranishi, Yuji Matsumoto, Taro Watanabe, Masayuki Ishii
- CTR-Guided Generative Query Suggestion in Conversational Search
Erxue Min, Hsiu-Yuan Huang, Xihong Yang, MinYang, Xin Jia, Yunfang Wu, Hengyi Cai, Junfeng Wang, Shuaiqiang Wang, Dawei Yin
- LATTE: Learning Aligned Transactions and Textual Embeddings for Bank Clients
Egor Fadeev, Dzhambulat Mollaev, Aleksei Shestov, Dima Korolev, Omar Zoloev, Ivan A Kireev, Andrey Savchenko, Maksim Makarenko
- RedOne: Revealing Domain-specific LLM Post-Training in Social Networking Services
Fei zhao, Chonggang Lu, wangyue, Zheyong Xie, Ziyan Liu, Haofu Qian, Jianzhao Huang, Fangcheng Shi, Zijie Meng, Hongcheng Guo, Mingqian He, Xinze Lyu, Zheyu Ye, Weiting Liu, Boyang Wang, Shaosheng Cao
- High-Quality Medical Dialogue Synthesis for Improving EMR Generation
Chengze Ge, Yu Xu, Qi Shao
- Z1: Efficient Test-time Scaling with Code
Zhaojian Yu, Yinghao Wu, Yilun Zhao, Arman Cohan, Xiao-Ping Zhang
- Quality Assessment of Tabular Data using Large Language Models and Code Generation
Ashlesha Akella, Akshar Kaul, Krishnasuri Narayanam, Sameep Mehta
- PARSE: Parameter Automated Refinement and Schema Extraction
Anubhav Shrimal, Aryan Jain, Soumyajit Chowdhury, Promod Yenigalla
- From Long Videos to Engaging Clips: A Human-Inspired Video Editing Framework with Multimodal Narrative Understanding
Xiangfeng Wang, XiaoLi, Yadong Wei, songxueyu, Yang Song, xiaxiaoqiang, Fangrui Zeng, Zaiyi Chen, liuliu, Gu Xu, Tong Xu
- Efficiency-Effectiveness Reranking FLOPs for LLM-based Rerankers
Zhiyuan Peng, Ting-Ruen Wei, Tingyu Song, Yilun Zhao, Yi Fang
- GEAR: A Scalable and Interpretable Evaluation Framework for RAG-Based Car Assistant Systems
Niloufar Beyranvand, Hamidreza Dastmalchi, Aijun An, Heidar Davoudi, Winston Chan, Ron DiCarlantonio
- FQ-Eval: Building Evaluation Dataset for User-centered Follow-up Question Generation
Sanghyun Seo, Bumsoo Kang, DAHM LEE, Jaeheon Kim, Joongbo Shin, Euisoon Kim, Kijeong Jeon
- Evaluating AI for Finance: Is AI Credible at Assessing Investment Risk Appetite?
Divij Chawla, Ashita Bhutada, Duc Anh Do, Abhinav Raghunathan, Vinod SP, Cathy Guo, Dar Win Liew, Prannaya Gupta, Rishabh Bhardwaj, Rajat Bhardwaj, Soujanya Poria
- A Proactive Reliability Metric for Detecting Failures in Language Model Training .
Maryam Fatima
- CAPSTONE: Composable Attribute‑Prompted Scene Translation for Zero‑Shot Vision–Language Reasoning
Md. Ismail Hossain, Shahriyar Zaman Ridoy, Moshiur Farazi, Nabeel Mohammed, Shafin Rahman
- Building Resource-Constrained Language Agents: A Korean Case Study on Chemical Toxicity Information
Hojun Cho, Donghu Kim, Soyoung Yang, Chan Lee, Hunjoo Lee, Jaegul Choo
- AutoDSPy: Automating Modular Prompt Design with Reinforcement Learning for Small and Large Language Models
Nafew Azim, Abrar Ur Alam, Hasan Bin Omar, Abdullah Mohammad Muntasir Adnan Jami, Jawad Ibn Ahad, Muhammad Rafsan Kabir, Md. Ismail Hossain, Fuad Rahman, Mohammad Ruhul Amin, Shafin Rahman, Nabeel Mohammed