知识中心主页
文献服务
文献资源
外文期刊
外文会议
专业机构
智能制造
高级检索
版权声明
使用帮助
会议文集
文集名
AAAI Special Track (AI Alignment)
会议名
39th AAAI Conference on Artificial Intelligence (AAAI-25), 37th Conference on Innovative Applications of Artificial Intelligence (IAAI-25), 15th Symposium on Educational Advances in Artificial Intelligence (EAAI-25)
中译名
《第三十九届AAAI人工智能会议,第三十七届人工智能创新应用会议,第十五届人工智能教育进展讨论会,卷26》
机构
Association for the Advancement of Artificial Intelligence (AAAI)
会议日期
25 February - 4 March 2025
会议地点
Philadelphia, Pennsylvania, USA
出版年
2025
馆藏号
358223
题名
作者
出版年
SafeInfer: Context Adaptive Decoding Time Safety Alignment for Large Language Models
Somnath Banerjee; Sayan Layek; Soham Tripathy; Shanu Kumar; Animesh Mukherjee; Rima Hazra
2025
Bridging the Knowledge Gap: Understanding User Expectations for Trustworthy LLM Standards
Michaela Benk; Leane Wettstein; Nadine Schlicker; Florian von Wangenheim; Nicolas Scharowski
2025
Scaling Trends for Data Poisoning in LLMs
Dillon Bowen; Brendan Murphy; Will Cai; David Khachaturov; Adam Gleave; Kellin Pelrine
2025
Verification of Neural Networks Against Convolutional Perturbations via Parameterised Kernels
Benedikt Bruckner; Alessio Lomuscio
2025
Risk Controlled Image Retrieval
Kaiwen Cai; Chris Xiaoxuan Lu; Xingyu Zhao; Wei Huang; Xiaowei Huang
2025
Political Bias Prediction Models Focus on Source Cues, Not Semantics
Selin Chun; Daejin Choi; Taekyoung Kwon
2025
Searching for Unfairness in Algorithms' Outputs: Novel Tests and Insights
Ian Davidson; S. S. Ravi
2025
In Search of Trees: Decision-Tree Policy Synthesis for Black-Box Systems via Search
Emir Demirovic; Christian Schilling; Anna Lukina
2025
Evaluate with the Inverse: Efficient Approximation of Latent Explanation Quality Distribution
Carlos Eiras-Franco; Anna Hedstrom; Marina M. -C. Hohne
2025
Retrieving Versus Understanding Extractive Evidence in Few-Shot Learning
Karl Elbakian; Samuel Carton
2025
Legend: Leveraging Representation Engineering to Annotate Safety Margin for Preference Datasets
Duanyu Feng; Bowen Qin; Chen Huang; Youcheng Huang; Zheng Zhang; Wenqiang Lei
2025
SMLE: Safe Machine Learning via Embedded Overapproximation
Matteo Francobaldi; Michele Lombardi
2025
MIA-Tuner: Adapting Large Language Models as Pre-training Text Detector
Wenjie Fu; Huandong Wang; Chen Gao; Guanghua Liu; Yong Li; Tao Jiang
2025
The Partially Observable Off-Switch Game
Andrew Garber; Rohan Subramani; Linus Luu; Mark Bedaywi; Stuart Russell; Scott Emmons
2025
UFID: A Unified Framework for Black-box Input-level Backdoor Detection on Diffusion Models
Zihan Guan; Mengxuan Hu; Sheng Li; Anil Kumar Vullikanti
2025
Robust Multi-Objective Preference Alignment with Online DPO
Raghav Gupta; Ryan Sullivan; Yunxuan Li; Samrat Phatale; Abhinav Rastogi
2025
Token Highlighter: Inspecting and Mitigating Jailbreak Prompts for Large Language Models
Xiaomeng Hu; Pin-Yu Chen; Tsung-Yi Ho
2025
Joint Scoring Rules: Competition Between Agents Avoids Performative Prediction
Rubi Hudson
2025
ChatBug: A Common Vulnerability of Aligned LLMs Induced by Chat Templates
Fengqing Jiang; Zhangchen Xu; Luyao Niu; Bill Yuchen Lin; Radha Poovendran
2025
Dynamic Algorithm Termination for Branch-and-Bound-based Neural Network Verification
Konstantin Kaulen; Matthias Konig; Holger H. Hoos
2025
1
2
3
4
制造业外文文献服务平台