Carnegie Mellon College at EMNLP 2025 – Machine Studying Weblog | ML@CMU
CMU researchers are presenting 50 papers on the Thirtieth Convention on Empirical Strategies in Pure Language Processing (EMNLP 2025), held from November 4 – 9 in Suzhou, China. This consists of 27 papers in the primary convention, 19 papers within the Findings observe, 2 system demonstrations papers, and a pair of business observe papers. This weblog put up supplies aggregated details about EMNLP 2025 papers printed by CMU researchers.
Key areas addressed are visualized under (representing 30 of the 50 complete papers), illustrating the breadth of NLP and machine studying analysis being carried out at CMU :

Observe: All info on this put up has been obtained via the ACL Anthology API and the EMNLP 2025 Presentation Data spreadsheet. Please contact CMU ML Weblog editors if you need any info added or modified.
Desk of Contents
Important Convention Papers
Particular Theme: Interdisciplinary Recontextualization of NLP
Spontaneous Giving and Calculated Greed in Language Models
Synthetic Socratic Debates: Examining Persona Effects on Moral Decision and Persuasion Dynamics
Multimodality and Language Grounding to Imaginative and prescient, Robotics and Past
Social Genome: Grounded Social Reasoning Abilities of Multimodal Models
VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search
Identifying & Interactively Refining Ambiguous User Goals for Data Visualization Code Generation
Sources and Analysis
Persona-Augmented Benchmarking: Evaluating LLMs Across Diverse Writing Styles
Human-AI Interplay/Cooperation
Estimating LLM Consistency: A User Baseline vs Surrogate Metrics
Humanizing Machines: Rethinking LLM Anthropomorphism Through a Multi-Level Framework of Design
Interpretability, Mannequin Enhancing, Transparency, and Explainability
Calibrating LLMs for Text-to-SQL Parsing by Leveraging Sub-clause Frequencies
Mathematical, Symbolic, and Logical Reasoning in NLP
Rewarding the Unlikely: Lifting GRPO Beyond Distribution Sharpening
Agentic-R1: Distilled Dual-Strategy Reasoning
Generalizability and Switch
SOCIAL SCAFFOLDS: A Generalization Framework for Social Understanding Tasks
Searching for the Most Human-like Emergent Language
NLP Functions
PhoniTale: Phonologically Grounded Mnemonic Generation for Typologically Distant Language Pairs
Security and Alignment in LLMs
Anecdoctoring: Automated Red-Teaming Across Language and Place
Pure Language Era
CIE: Controlling Language Model Text Generations Using Continuous Signals
Query Answering
Table-R1: Inference-Time Scaling for Table Reasoning Tasks
Multilinguality and Language Range
Grounding Multilingual Multimodal LLMs With Cultural Knowledge
Computational Social Science, Cultural Analytics, and NLP for Social Good
Words Like Knives: Backstory-Personalized Modeling and Detection of Violent Communication
AI/LLM Brokers
On the Fine-Grained Planning Abilities of VLM Web Agents
Code Fashions
An Empirical Study on Strong-Weak Model Collaboration for Repo-level Code Generation
Summarization
Summarizing Speech: A Comprehensive Survey
Retrieval-Augmented Language Fashions
MoR: Better Handling Diverse Queries with a Mixture of Sparse, Dense, and Human Retrievers
Phonology, Morphology and Phrase Segmentation
Morpheme Induction for Emergent Language
Low-resource Strategies for NLP
Language Models Can be Efficiently Steered via Minimal Embedding Layer Transformations
Findings Papers
Particular Theme: Interdisciplinary Recontextualization of NLP
FicSim: A Dataset for Multi-Faceted Semantic Similarity in Long-Form Fiction
Sources and Analysis
SimBA: Simplifying Benchmark Analysis Using Performance Matrices Alone
mrCAD: Multimodal Communication to Refine Computer-aided Designs
Human-AI Interplay/Cooperation
Let Them Down Easy! Contextual Effects of LLM Guardrails on User Perceptions and Preferences
Interpretability, Mannequin Enhancing, Transparency, and Explainability
Linear Steerability in Language Models: When It Emerges and How It Evolves
Predicting Language Models’ Success at Zero-Shot Probabilistic Prediction
Multilinguality and Language Range
BenchMAX: A Comprehensive Multilingual Evaluation Suite for Large Language Models
AI/LLM Brokers
FLAIRR-TS – Forecasting LLM-Agents with Iterative Refinement and Retrieval for Time Series
Code Fashions
VisCoder: Fine-Tuning LLMs for Executable Python Visualization Code Generation
Retrieval-Augmented Language Fashions
GAMIC: Graph-Aligned Molecular In-context Learning for Molecule Analysis via LLMs
Speech Processing and Spoken Language Understanding
SVeritas: Benchmark for Robust Speaker Verification under Diverse Conditions
CAARMA: Class Augmentation with Adversarial Mixup Regularization
Semantics: Lexical, Sentence-Stage Semantics, Textual Inference, and Different Areas
Bridging the Editing Gap in LLMs: FineEdit for Precise and Targeted Text Modifications
Ethics, Bias, and Equity
Mitigate One, Skew Another? Tackling Intersectional Biases in Text-to-Image Models
Dialogue and Interactive Methods
LLM Effectivity
TreeBoN: Enhancing Inference-Time Alignment with Speculative Tree-Search and Best-of-N Sampling
System Demonstrations
AgentDiagnose: An Open Toolkit for Diagnosing LLM Agent Trajectories
BioGraphia: A LLM-Assisted Biological Pathway Graph Annotation Platform
Trade Observe Papers
Leveraging LLMs to Streamline the Review of Public Funding Applications
Semantic Agreement Enables Efficient Open-Ended LLM Cascades