The workshop on Actionable Interpretability@ICML2025 aims to foster discussions on leveraging interpretability insights to drive tangible advancements in AI across diverse domains. We welcome contributions that move beyond theoretical analysis, demonstrating concrete improvements in model alignment, robustness, and real-world applications. Additionally, we seek to explore the challenges inherent in translating interpretability research into actionable impact.
Outstanding Papers
We are happy to announce that the Outstanding Paper Awards goes to:
- Detecting High-Stakes Interactions with Activation Probes by Alex McKenzie, Phil Blandfort, Urja Pawar, William Bankes, David Krueger, Ekdeep Singh Lubana, Dmitrii Krasheninnikov and
- Truthful or Fabricated? Using Causal Attribution to Mitigate Reward Hacking in Explanations by Pedro Lobato Ferreira, Wilker Aziz, Ivan Titov.
Congratulations to the authors for their exceptional work!
News
- July 03 2025: Schedule for the workshop published!
- July 03 2025: Information for authors added.
- June 19 2025: Acceptance notifications are published!
- May 12 2025: The workshop is scheduled for July 19!
- May 12 2025: Clarification: Submissions to the conference track may include the camera-ready version of the accepted paper (up to 9 pages, do not need to be anonymized).
- May 3 2025: The submission deadline has been extended to May 19.
- April 15 2025: Our submissions page on OpenReview is open!
- March 31 2025: Call for papers published!
- March 19 2025: Our Workshop was accepted to ICML!
Information for Authors
Since the AIW workshop is non-archival, there is no need to submit a camera-ready version. For the same reason, papers and reviews on OpenReview will not be made public. We will list all the accepted papers’ titles & authors (alongside your poster PDFs and optional video recordings) on our website, but we will not link the PDFs. If you would like your paper to be public, we recommend hosting it on your personal website or on arXiv.
Important Dates
May 19 - Submissions due (extended)
June 19 - Acceptance notification
July 19 - Workshop day
(All dates are Anywhere On Earth.)
Schedule
From | Until | |
---|---|---|
08:00 | 09:00 | Poster Setup 1 |
09:00 | 09:10 | Opening Remarks |
09:10 | 09:40 | Keynote - Been Kim - Agentic Interpretability and Neologism: what LLMs can offer us |
09:40 | 10:10 | Keynote - Sarah Schwettmann - AI Investigators for Understanding AI Systems |
10:10 | 10:25 | Talk - Detecting High-Stakes Interactions with Activation Probes |
10:25 | 10:40 | Talk - Actionable Interpretability with NDIF and NNsight |
10:40 | 11:40 | Poster Session 1 |
11:40 | 13:00 | Lunch + Poster Setup 2 |
13:00 | 14:00 | Poster Session 2 |
14:00 | 14:30 | Keynote - Byron Wallace - What (if anything) can interpretability do for healthcare? |
14:30 | 14:45 | Talk - Truthful or Fabricated? Using Causal Attribution to Mitigate Reward Hacking in Explanations |
14:45 | 15:00 | Talk - Internal Causal Mechanisms Robustly Predict Language Model Out-of-Distribution Behaviors |
15:00 | 15:30 | Coffee Break |
15:30 | 16:00 | Keynote - Eric Wong - Explanations for Experts via Guarantees and Domain Knowledge: From Attributions to Reasoning |
16:00 | 16:45 | Panel Discussion |
16:45 | 17:00 | Closing Remarks |