Submission deadline extension: until March 19.
Final Call for Papers: PrivateNLP workshop co-located with ACL 2026
See sites.google.com/view/private... for OpenReview submission link and details
Submission deadline extension: until March 19.
Final Call for Papers: PrivateNLP workshop co-located with ACL 2026
See sites.google.com/view/private... for OpenReview submission link and details
📅 Deadlines (AoE):
Regular submissions: March 5
Fast-track: March 24
Non-archival: April 7
For questions/queries please contact: privatenlp26-orga[at]lists.ruhr-uni-bochum.de
🔐 Announcing the call for papers for the 7th Workshop on Privacy-Preserving Natural Language Processing at ACL 2026 in San Diego!
If your research lies at the intersection of privacy and NLP, consider submitting to our workshop!
Website: sites.google.com/view/private...
First call for papers - Seventh Workshop on Privacy in Natural Language Processing, co-located with ACL 2026, San Diego (CA), USA (and on Zoom)
sites.google.com/view/private...
Frustrated with how most of the world’s low-resource languages have NO evaluation resources?
📢 Check out ChiKhaPo, a massively multilingual lexical comprehension and generation benchmark covering 2700+ languages.
www.arxiv.org/abs/2510.16928
Led by @stolenpyjak.bsky.social, we built a user-friendly python package for generating and evaluating privacy-preserving synthetic data! See details in our EMNLP Demo paper:
Catch @zihaozhao.bsky.social at today’s poster session (10:30–12) where he'll be presenting SynthTextEval! Stop by if you're interested in synthetic text for high-stakes domains. Zihao also has another EMNLP paper on private text generation, for people interested in this space!
@jhuclsp.bsky.social
SynthTextEval was developed in close collaboration with
Daniel Smolyak, @zihaozhao.bsky.social, Nupoor Gandhi, Ritu Agarwal, Margrét Bjarnadóttir, @anjalief.bsky.social
@jhuclsp.bsky.social @jhucompsci.bsky.social
Stop by to see our work at EMNLP tomorrow, which Zihao will be presenting!
SynthTextEval is a comprehensive toolkit for evaluating synthetic text data with a wide range of metrics, enabling standardized, comparable assessments of generation approaches and building greater confidence in the quality of synthetic data, especially for high-stakes domains
Synthetic data shouldn’t be a black box - we make it easier to examine and identify issues in synthetic data outputs with
- Interactive text exploration & review with our GUI tool
- Exploring text diversity, structure and themes with our visual and descriptive text analyses tools
SynthTextEval also supports fine-tuning models for controllable text generation across diverse domains, which allows users to
- Produce text tailored to user-defined styles, content types, or domain labels
- Generate synthetic data with differentially private guarantees
🔧Utility: Downstream task-based evaluations (classification, coreference resolution)
📊Fairness: distributional balance & representational biases
🔐Privacy: Leakage, memorization, and re-identification risk
📜Quality: Distributional differences between synthetic and real text
Conventional metrics like BLEU, ROUGE, or perplexity only scratch the surface of synthetic text quality!
Our framework introduces a multi-dimensional evaluation suite that covers aspects such as utility, privacy, fairness and distributional similarity to the real data.
🚀 SynthTextEval, our open-source toolkit for generating and evaluating synthetic text data for high-stakes domains, will be featured at EMNLP 2025 as a system demonstration!
GitHub: github.com/kr-ramesh/sy...
Paper 📝: aclanthology.org/2025.emnlp-d...
#EMNLP2025 #EMNLP #SyntheticData
Thank you to @anjalief.bsky.social for advising. Hands-on with DP-SGD? Start with our another paper and open-source package
(arxiv.org/abs/2507.07229
github.com/kr-ramesh/sy...)
🔗 Paper & code
Paper is accepted to EMNLP 2025 Main
arXiv: arxiv.org/abs/2509.25729
Code: github.com/zzhao71/Cont...
#SyntheticData #Privacy #NLP #LLM #Deidentification #HealthcareAI #LLM
Take a look at this EMNLP 2025 paper by @zihaozhao.bsky.social, which proposes novel methods for generating high utility, privacy-preserving synthetic text!
‼️‼️
This hypothesis says that 1) Multilingual generation uses a model-internal task-solving→translation cascade. 2) Failure of the translation stage *despite task-solving success* is a large part of the problem. That is, the model often solves the task but fails to articulate the answer.
⁉️
We know that speech LID systems flunk on accented speech. But why? And what can we do about it? 🤔
Our work arxiv.org/abs/2506.00628 (Interspeech '25) finds that *accent-language confusion* is an important culprit, ties it to the length of feature that the model relies on, and proposes a fix.
Go find new linguidtic changes, compare corpora and invent
huggingface.co/Hplm
arxiv.org/abs/2504.05523
Historical analysis is a good example, as historical periods can get lost in blended information from different eras. Finetuning large models isn't enough, they “leak” future/modern concepts, making historical analysis impossible. Did you know cars existed in the 1800s? 🤦
arxiv.org/abs/2504.05523
Typical Large Language Models (LLMs) are trained on massive, mixed datasets, so the model's behaviour can't be linked to a specific subset of the pretraining data. Or in our case, to time eras.
How should the humanities leverage LLMs?
▶️Domain-specific pretraining!
Pretraining models can be a research tool, it's cheaper than LoRA, and allows studying
💠grammatical change
💠emergent word senses
💠who knows what more…
Train on your data with our pipeline or use ours!
#AI #LLM 🤖📈
Dialects lie on continua of (structured) linguistic variation, right? And we can’t collect data for every point on the continuum...🤔
📢 Check out DialUp, a technique to make your MT model robust to the dialect continua of its training languages, including unseen dialects.
arxiv.org/abs/2501.16581
📢 Want to host MASC 2025?
The 12th Mid-Atlantic Student Colloquium is a one day event bringing together students, faculty and researchers from universities and industry in the Mid-Atlantic.
Please submit this very short form if you are interested in hosting! Deadline January 6th. #MASC2025
📢 It's PhD admissions season! 🎓
The PhD admissions process is stressful! 😅
Want a behind-the-scenes look at the process? 👀✨ You have questions, we have answers. 📝🤝
Watch my Admissions AMA for @jhuclsp.
https://youtu.be/YlwpIPFNXjo?si=O7n5QwGT5sQdpg7u
I'm super excited about this program and happy to connect if you're interested in working with me through it!