Aligned Probing: Relating Toxic Behavior and Model Internals
Aligned Probing: Relating Toxic Behavior and Model Internals
Questions? Discussion? Reach out to us.
@dippedrusk.com @a-lauscher.bsky.social Dietrich Klakow @igurevych.bsky.social
Full paper & code: alignedprobing.github.io
(7/7)
27.01.2026 13:02
๐ 4
๐ 2
๐ฌ 0
๐ 0
In a new briefing by the @sciencemediacenter.de, Prof. Dr. @igurevych.bsky.social (@tuda.bsky.social) notes that the studyโs methodology is well aligned with its claims: It extends earlier work by the same lab showing that fine-tuning can lead to broader misalignment.
(2/๐งต)
23.01.2026 08:59
๐ 2
๐ 1
๐ฌ 1
๐ 0
Promotional graphic for a keynote on โRoadmap to Internet Security.โ The background shows a glowing digital lock and circuit-board imagery. Large text announces the talk title โHow to Make AI-Native Internet Content Secure? Coping with Synthetic and Misleading Data.โ On the right is a circular portrait of a woman with shoulder-length curly hair and glasses (Iryna Gurevych), looking toward the camera.
๐ ๐๐ฒ๐๐ป๐ผ๐๐ฒ ๐ฏ๐ ๐๐ฟ๐๐ป๐ฎ ๐๐๐ฟ๐ฒ๐๐๐ฐ๐ต ๐ฎ๐ ๐ฅ๐ผ๐ฎ๐ฑ๐บ๐ฎ๐ฝ ๐๐ผ ๐๐ป๐๐ฒ๐ฟ๐ป๐ฒ๐ ๐ฆ๐ฒ๐ฐ๐๐ฟ๐ถ๐๐ ๐ฎ๐ฌ๐ฎ๐ฑ
We are pleased to share that Prof. @igurevych.bsky.social, Distinguished Professor at the @athenecenter.bsky.social and Director of the UKP Lab at @tuda.bsky.social, will deliver a keynote at the ATHENE event ...
04.12.2025 13:31
๐ 5
๐ 1
๐ฌ 1
๐ 1
Do you like AI for Cybersecurity, Agentic AI research or both? This job is might be for you!
27.11.2025 20:11
๐ 0
๐ 0
๐ฌ 0
๐ 0
๐๐ผCongratulations๐พ๐ฅon this fantastic award from us to ๐๐ฟ๐๐ป๐ฎ ๐๐๐ฟ๐ฒ๐๐๐ฐ๐ต, who is not only the๐ฅLOEWE top professor, but also involved in other LOEWE projects such as LOEWE @emergencity.de and @loewe-dynamic.bsky.social! @igurevych.bsky.social @tuda.bsky.social @unimarburg.bsky.social @royalsociety.org
20.11.2025 13:02
๐ 8
๐ 2
๐ฌ 0
๐ 0
As part of the ceremony at the Royal Society in London, @igurevych.bsky.social delivered the Milner Prize Lecture titled ๐๐ฐ๐ธ ๐ต๐ฐ ๐ด๐ฑ๐ฐ๐ต ๐ข๐ฏ๐ฅ ๐ฅ๐ฆ๐ฃ๐ถ๐ฏ๐ฌ ๐ฎ๐ช๐ด๐ญ๐ฆ๐ข๐ฅ๐ช๐ฏ๐จ ๐ค๐ฐ๐ฏ๐ต๐ฆ๐ฏ๐ต, addressing current challenges in detecting and countering misleading information.
18.11.2025 14:11
๐ 1
๐ 1
๐ฌ 1
๐ 0
The @royalsociety.org Milner Award Lecture 2025: Prof. @igurevych.bsky.social's on how AI fact checkers can be misled (similar to how humans are, in many ways) by implicit source citation fallacies, image feature training bias, and misleading chart design, and what we can do about it.
17.11.2025 23:29
๐ 4
๐ 1
๐ฌ 0
๐ 0
Happens tomorrow, online attendance possible!
16.11.2025 08:02
๐ 3
๐ 0
๐ฌ 0
๐ 0
Sprachmodelle: Wie weit die Kรผnstliche Intelligenz wirklich ist
ChatGPT lรถste den bislang grรถรten KI-Boom aus. Doch wie weit sind die Sprachmodelle inzwischen? Und wie geht es weiter? Ein รberblick.
It touches on fast-moving model releases, the growing relevance of AI agents and tool-use, the shift towards reasoning-oriented training, and the limits of what benchmark results really tell us.
๐ Read the article (F+ / paywalled):
www.faz.net/aktuell/wirt...
(2/2)
07.11.2025 09:05
๐ 0
๐ 1
๐ฌ 0
๐ 0
#INLG2025 main conference starts with not one, not two, but three keynotes! We'll hear from Minlie Huang, Iryna Gurevych (@igurevych.bsky.social), and Verena Rieser (@verenarieser.bsky.social), as well as oral and poster sessions for accepted papers.
31.10.2025 03:18
๐ 1
๐ 2
๐ฌ 0
๐ 0
KI mit Schweigepflicht: Forschende der TU Darmstadt & IIT Delhi zeigen, wie KI in der Psychodiagnostik helfen kann โ ohne sensible Daten zu gefรคhrden @igurevych.bsky.social
๐ tu-darmstadt.de/universitaet/aktuelles_meldungen/einzelansicht_527552.de.jsp
13.10.2025 11:44
๐ 4
๐ 2
๐ฌ 0
๐ 0
Logos of the UKP Lab (Ubiquitous Knowledge Processing) and Hugging Face side by side with a plus sign between them. Below, the text reads: โSentence Transformers is joining Hugging Face.โ
๐ค ๐ฆ๐ฒ๐ป๐๐ฒ๐ป๐ฐ๐ฒ ๐ง๐ฟ๐ฎ๐ป๐๐ณ๐ผ๐ฟ๐บ๐ฒ๐ฟ๐ ๐ท๐ผ๐ถ๐ป๐ ๐๐๐ด๐ด๐ถ๐ป๐ด ๐๐ฎ๐ฐ๐ฒ
Originally developed at the UKP Lab at @tuda.bsky.social, Sentence Transformers has become one of the worldโs most widely used open-source libraries for semantic embeddings in natural language processing.
(1/๐งต)
22.10.2025 14:07
๐ 12
๐ 2
๐ฌ 1
๐ 0
Join Professor Iryna Gurevych for the Milner Prize Lecture on 17 November. She will demonstrate how machine learning and artificial intelligence can be applied to protect people and machines from misinformation: royalsociety.org/science-even... @igurevych.bsky.social @tuda.bsky.social
08.10.2025 09:02
๐ 7
๐ 2
๐ฌ 0
๐ 0
๐ ๐๐๐ด๐ฒ ๐๐ต๐ฎ๐ป๐ธ๐ ๐๐ผ ๐ผ๐๐ฟ ๐ฐ๐ผ๐น๐น๐ฎ๐ฏ๐ผ๐ฟ๐ฎ๐๐ผ๐ฟ๐!
This release wouldnโt be possible without the contributions of: Sheng Lu, Nils Dycke, @atnafu.bsky.social, Thamar Solorio, Xiaodan Zhu, Koen Dercksen, Lizhen Qu, Margot Mieskes, @dirkhovy.bsky.social and @igurevych.bsky.social.
(4/๐งต)
08.10.2025 06:57
๐ 2
๐ 1
๐ฌ 1
๐ 0
Quantum AI and NLP Conference 2025
Quantum AI and NLP Conference 2025 Website. The conference will be held from the 6th to the 8th of August 2025 in Bloomington, Indiana at Indiana University.
This work was made possible through a great collaboration with
๐ฉโ๐ฌ Anna Schroeder & Mariami Gachechiladze (Quantum Computing Group, @tuda.bsky.social)
๐จโ๐ฌ Yue Zhang (Westlake University)
๐ฉโ๐ป @igurevych.bsky.social (@ukplab.bsky.social, @tuda.bsky.social)
(2/๐งต)
02.10.2025 07:31
๐ 2
๐ 1
๐ฌ 1
๐ 0
๐ What could a pan-European PhD program focusing on #MachineLearning & AI be like?
Check out @wafaamohammed.bsky.social's reason for joining the #ELLISPhD Program.
You can apply to the via our central recruiting portal starting on Oct 1st.
Get all the details now ๐ bit.ly/45DSe75
25.09.2025 09:18
๐ 6
๐ 4
๐ฌ 0
๐ 0
๐ข Call for Posters โ ELLIS UnConference 2025
Have a 2025 published paper? Present it in Copenhagen on Dec 2!
๐น Showcase your research
๐น Get feedback & connect with peers
๐น Grow your network
๐ Submit here: https://forms.gle/vUUdM9mah34swX7U7
๐
Deadline: Oct 23, 17:00 CEST
#EurIPS
24.09.2025 13:00
๐ 10
๐ 7
๐ฌ 0
๐ 0
Promotional graphic for the Discovery Science 2025 International Conference keynote. The text reads: โPlease meet AI, our dear new colleague. In other words: can scientists and machines truly cooperate?โ Keynote by Prof. Iryna Gurevych. On the right is a photo of Prof. Iryna Gurevych standing in front of bookshelves. The logos of UKP Lab and Discovery Science are included at the bottom.
๐ ๐๐ฒ๐๐ป๐ผ๐๐ฒ ๐ฏ๐ ๐ฃ๐ฟ๐ผ๐ณ. ๐๐ฟ๐๐ป๐ฎ ๐๐๐ฟ๐ฒ๐๐๐ฐ๐ต ๐ถ๐ป ๐๐ท๐๐ฏ๐น๐ท๐ฎ๐ป๐ฎ
Prof. @igurevych.bsky.social, Head of the UKP Lab, is giving a keynote today at the 28th International Conference on Discovery Science 2025 in Ljubljana.
25.09.2025 07:36
๐ 4
๐ 1
๐ฌ 1
๐ 0
#Dagstuhl Seminar "Open Scholarly Information Systems:
Iryna Gurevych @igurevych.bsky.social introduces us to AI, our dear new colleague: Are we becoming obsolete?
17.09.2025 07:53
๐ 2
๐ 1
๐ฌ 1
๐ 0
Weโre excited to share that @igurevych.bsky.social, Director of the UKP Lab at @tuda.bsky.social, will join the expert panel to discuss the opportunities and challenges of Generative AI from a research perspective.
(2/๐งต)
25.06.2025 11:18
๐ 2
๐ 1
๐ฌ 1
๐ 0
๐ค Panel with Peter Buxmann (@tuda.bsky.social), Holger Hanselka (โช@fraunhofer.bsky.socialโฌ), Florian Rentsch (Verband der Sparda-Banken e.V.), Sara Jourdan (@tuda.bsky.social) , Carsten Knop (@faznet.bsky.social), and @igurevych.bsky.social (@ukplab.bsky.social / @tuda.bsky.social)
(3/๐งต)
25.06.2025 11:18
๐ 1
๐ 1
๐ฌ 1
๐ 0
Iryna Gurevych in die Academia Europaea gewรคhlt
06.06.2025 ATHENE Principal Investigator Prof. Iryna Gurevych ist in die Academia Europaea gewรคhlt worden. Damit wird ihre langjรคhrige akademische Exzellenz und internationale Sichtbarkeit in den Be...
๐ ATHENE Principal Investigator Prof. ๐๐ฟ๐๐ป๐ฎ ๐๐๐ฟ๐ฒ๐๐๐ฐ๐ต @igurevych.bsky.social wurde in die Academia Europaea @acad-euro.bsky.social gewรคhlt โ fรผr ihre langjรคhrige akademische Exzellenz in den Bereichen Kรผnstliche Intelligenz #KI & Natural Language Processing. ๐
๐ www.athene-center.de/aktuelles/ne...
06.06.2025 19:26
๐ 3
๐ 1
๐ฌ 0
๐ 0
This is a unique opportunity to work with @igurevych.bsky.social and the team on the intersection of AI Safety, Natural Language Processing and Machine Learning.
(2/๐งต)
12.06.2025 14:18
๐ 2
๐ 1
๐ฌ 1
๐ 0
Super important initiative, thanks for doing this great work!
17.06.2025 14:41
๐ 1
๐ 0
๐ฌ 0
๐ 0
Portrait of Iryna Gurevych.
ยฉMarkus Scholz
๐๐ฟ๐๐ป๐ฎ ๐๐๐ฟ๐ฒ๐๐๐ฐ๐ต ๐ฒ๐น๐ฒ๐ฐ๐๐ฒ๐ฑ ๐ฎ๐ ๐บ๐ฒ๐บ๐ฏ๐ฒ๐ฟ ๐ผ๐ณ ๐๐ต๐ฒ ๐๐ฐ๐ฎ๐ฑ๐ฒ๐บ๐ถ๐ฎ ๐๐๐ฟ๐ผ๐ฝ๐ฎ๐ฒ๐ฎ
@tuda.bsky.social computer scientist Prof. Dr. Iryna Gurevych (@igurevych.bsky.social) has been elected as a member of the Academia Europaea (@acad-euro.bsky.socialโฌ), the pan-European academy of sciences, humanities and letters.
(1/๐งต)
05.06.2025 13:13
๐ 8
๐ 1
๐ฌ 1
๐ 0
DeepSeek-Modelle auf dem Prรผfstand
In einem Gastbeitrag geben Professorin Iryna Gurevych und Irina Bigoulaeva vom Ubiquitous Knowledge Processing (UKP) Lab am Fachbereich Informatik der TU Darmstadt einen Einblick in ihre Forschung zum...
DeepSeek: In einem Gastbeitrag geben Prof. Iryna Gurevych und Irina Bigoulaeva vom Ubiquitous Knowledge Processing Lab am Fachbereich Informatik einen Einblick in ihre Forschung zum Leistungsvermรถgen von generativer Kรผnstlicher Intelligenz @cs-tudarmstadt.bsky.social @ukplab.bsky.social
27.05.2025 13:36
๐ 3
๐ 2
๐ฌ 0
๐ 0