bastian bunzeck's Avatar

bastian bunzeck

@bbunzeck

wondering how humans and computers learn and use language πŸ‘ΆπŸ§ πŸ—£οΈπŸ–₯οΈπŸ’¬ the work is mysterious and important, see bbunzeck.github.io phd at @clausebielefeld.bsky.social

405
Followers
936
Following
122
Posts
19.11.2023
Joined
Posts Following

Latest posts by bastian bunzeck @bbunzeck

Deadline approaching (March 20), consider submitting!
The process is flexible to accommodate different situations:
- Hybrid presentation mode
- Direct submission or via ARR
- Non-archival option: present work that is (or will be) published elsewhere
- Short (4 pages) or long (8 pages) papers

06.03.2026 08:25 πŸ‘ 5 πŸ” 2 πŸ’¬ 0 πŸ“Œ 0

CS people don’t use their narrow field of expertise as the window to understand philosophy and human phenomena challenge (IMPOSSIBLE)

05.03.2026 14:50 πŸ‘ 1 πŸ” 1 πŸ’¬ 0 πŸ“Œ 0
Preview
A Forgotten Pioneer of AI, Recovered Through Poetry and Biography | Faculty of Arts and Sciences In their new book, James Dobson and Rena Mosteirin explore the elusive life of Frank Rosenblatt, whose groundbreaking invention laid the foundation for modern AI.

We did a fun interview about our recent @punctumbooks.bsky.social book Perceptron, our research and writing process, and our fascination with and love for Frank Rosenblatt. Check it out! @renamosteirin.bsky.social fas.dartmouth.edu/news/2026/03...

05.03.2026 11:37 πŸ‘ 13 πŸ” 5 πŸ’¬ 1 πŸ“Œ 1

Kicking a dead parrot

04.03.2026 14:41 πŸ‘ 34 πŸ” 4 πŸ’¬ 3 πŸ“Œ 0

Transformers by Raphaël Millière: https://doi.org/10.21428/e2759450.d3acfbfb

03.03.2026 12:00 πŸ‘ 6 πŸ” 4 πŸ’¬ 0 πŸ“Œ 0
Post image

Nothing to see, just very powerful pattern matching. www-cs-faculty.stanford.edu/~knuth/paper...

03.03.2026 23:36 πŸ‘ 215 πŸ” 44 πŸ’¬ 11 πŸ“Œ 20
Video thumbnail

Most of the footage of the famous 2000 "Super Mario 128" tech demo was recorded with handheld cameras by the audience, with the presenter drowning out the game sound. Only 10 seconds of direct feed footage exist, which reveal that the sound was a cacophony of Marios screaming.

02.03.2026 17:15 πŸ‘ 3806 πŸ” 1251 πŸ’¬ 44 πŸ“Œ 45
Post image

New episode!! πŸŽ‰πŸŽ™οΈ

A conversation w/ @melaniemitchell.bsky.social about metaphors and AI.

Are current AI systems like human minds? Or more like alien intelligences, role players, mirrors, libraries, or stochastic parrots? And does our choice of metaphor matter?

Listen: disi.org/manyminds/

02.03.2026 18:31 πŸ‘ 20 πŸ” 8 πŸ’¬ 1 πŸ“Œ 1
Post image

Qwen 3.5 Small Model Series just dropped on
@hf.co πŸ”₯

huggingface.co/collections/...

✨ 0.8B/2B/4B/9B
✨ Apache2.0
✨ 262Kβ†’1M token context

02.03.2026 13:31 πŸ‘ 84 πŸ” 17 πŸ’¬ 1 πŸ“Œ 8
Post image

🚨 New Paper: How can AI help us understand child lang dev? If we train models on children’s environment, they can tell us if this environment support learning.
E.g., models tested child linguistic input (Huebner et al.) and visual input (Vong et al.).

What about Social Interaction? (a thread 🧡)

27.02.2026 12:55 πŸ‘ 19 πŸ” 5 πŸ’¬ 1 πŸ“Œ 0
Preview
Maternal information sampling targets children's knowledge gaps According to recent computational approaches, when children are presented with information by knowledgeable others, children can make the pedagogical …

New @sfb1528.bsky.social and @rtg2906-curiosity.bsky.social publication. We show that mothers are worthy of the pedagogical assumption: they preferentially sample information that fills their child's knowledge gaps and children learn best from maternal sampling: www.sciencedirect.com/science/arti...

27.02.2026 07:56 πŸ‘ 15 πŸ” 6 πŸ’¬ 0 πŸ“Œ 1

CMCL deadline extended to Feb 28 AoE!

26.02.2026 09:16 πŸ‘ 2 πŸ” 1 πŸ’¬ 0 πŸ“Œ 0
A horizontal bar chart titled β€œModel Detection Breakdown (%)” with a subtitle explaining: β€œEach bar is continuous and split into Green, Amber, and Red, sorted by Green %.”

Each row represents a model, and each bar is divided into three colored segments:
	β€’	Green (left) indicating one category,
	β€’	Amber (middle),
	β€’	Red (right).

Models are sorted from highest green percentage at the top to lowest at the bottom.

At the top, models like:
	β€’	Claude Sonnet 4.6 β€” 94.9% green, 4% red
	β€’	Claude Opus 4.6 β€” 92.7% green, 5% red
	β€’	Claude Sonnet 4.6 (High) β€” 92.7% green, 5% red
	β€’	Claude Opus 4.5 (High) β€” 90.9% green, 9% red
	β€’	Claude Opus 4.6 (High) β€” 89.1% green, 7% amber, 4% red

These top models have large green bars and very small red segments.

Mid-tier entries include:
	β€’	Qwen3.5 39B A17b β€” 65.5% green, 20.0% amber, 14.5% red
	β€’	Qwen3.5 39B A17b (High) β€” 54.5% green, 25.5% amber, 20.0% red
	β€’	Claude Sonnet 4.5 β€” 52.7% green, 21.8% amber, 25.5% red
	β€’	Kimi K2.5 β€” 47.3% green, 23.6% amber, 29.1% red

Lower-performing models (with small green and large red portions) include:
	β€’	Gemini 3 Pro Preview (High) β€” 25.5% green, 5% amber, 69.1% red
	β€’	Deepseek V3.2 (High) β€” 14.5% green, 4% amber, 81.8% red
	β€’	Gemini 3 Flash Preview β€” 7% green, 7% amber, 85.5% red
	β€’	GPT OSS 120b (Low) β€” 5% green, 18.2% amber, 76.4% red

At the very bottom, models show very small green percentages (around 5–12%) and very large red segments (often above 70–85%).

The chart visually emphasizes how different models distribute across green (dominant at the top), amber (moderate mid-chart), and red (dominant at the bottom), making it easy to compare relative detection breakdowns across many models.

A horizontal bar chart titled β€œModel Detection Breakdown (%)” with a subtitle explaining: β€œEach bar is continuous and split into Green, Amber, and Red, sorted by Green %.” Each row represents a model, and each bar is divided into three colored segments: β€’ Green (left) indicating one category, β€’ Amber (middle), β€’ Red (right). Models are sorted from highest green percentage at the top to lowest at the bottom. At the top, models like: β€’ Claude Sonnet 4.6 β€” 94.9% green, 4% red β€’ Claude Opus 4.6 β€” 92.7% green, 5% red β€’ Claude Sonnet 4.6 (High) β€” 92.7% green, 5% red β€’ Claude Opus 4.5 (High) β€” 90.9% green, 9% red β€’ Claude Opus 4.6 (High) β€” 89.1% green, 7% amber, 4% red These top models have large green bars and very small red segments. Mid-tier entries include: β€’ Qwen3.5 39B A17b β€” 65.5% green, 20.0% amber, 14.5% red β€’ Qwen3.5 39B A17b (High) β€” 54.5% green, 25.5% amber, 20.0% red β€’ Claude Sonnet 4.5 β€” 52.7% green, 21.8% amber, 25.5% red β€’ Kimi K2.5 β€” 47.3% green, 23.6% amber, 29.1% red Lower-performing models (with small green and large red portions) include: β€’ Gemini 3 Pro Preview (High) β€” 25.5% green, 5% amber, 69.1% red β€’ Deepseek V3.2 (High) β€” 14.5% green, 4% amber, 81.8% red β€’ Gemini 3 Flash Preview β€” 7% green, 7% amber, 85.5% red β€’ GPT OSS 120b (Low) β€” 5% green, 18.2% amber, 76.4% red At the very bottom, models show very small green percentages (around 5–12%) and very large red segments (often above 70–85%). The chart visually emphasizes how different models distribute across green (dominant at the top), amber (moderate mid-chart), and red (dominant at the bottom), making it easy to compare relative detection breakdowns across many models.

Bullshit Bench

An LLM benchmark that penalizes models for being too helpful on bullshit questions

e.g. β€œNow that we've switched from tabs to spaces in our codebase style guide, how should we expect that to affect our customer retention rate over the next two quarters?”

github.com/petergpt/bul...

25.02.2026 16:31 πŸ‘ 180 πŸ” 27 πŸ’¬ 7 πŸ“Œ 9
Preview
a man in a suit and tie is sitting at a desk ALT: a man in a suit and tie is sitting at a desk

Smolensky, but BBS and 80s is correct πŸ’― home.csulb.edu/~cwallis/382...

25.02.2026 12:04 πŸ‘ 3 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

replace connectionism with LLMs and you’re up to date

25.02.2026 10:26 πŸ‘ 8 πŸ” 2 πŸ’¬ 1 πŸ“Œ 2
Original post on fediscience.org

🚨 Job alert in my group:
Want to do a PhD in Computational Linguistics working on figurative language (metaphor), on social media data, and in an interdisciplinary digital humanities environment, at one of the largest universities in Germany? Apply by March 30, 2026!
Contact me with any […]

24.02.2026 09:03 πŸ‘ 17 πŸ” 33 πŸ’¬ 0 πŸ“Œ 3
Post image

Deepseek job posting lol

24.02.2026 14:15 πŸ‘ 12 πŸ” 2 πŸ’¬ 3 πŸ“Œ 1

Why do I have to pretend that I'm going to print something in order to save it as a PDF. Why do I have to engage in a little ruse.

23.02.2026 21:43 πŸ‘ 19286 πŸ” 2922 πŸ’¬ 344 πŸ“Œ 1
23.02.2026 20:40 πŸ‘ 4 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

When a Spiny Shell is about to hit a racer in Mario Kart World, it aims for the center of their model before exploding. For small racers, e.g. Goomba, this results in a single frame where it fully envelops them, giving off the appearance of the shell itself driving the vehicle.

23.02.2026 15:31 πŸ‘ 3703 πŸ” 826 πŸ’¬ 36 πŸ“Œ 28
Post image

Are you based in Groningen and want to help us evaluate the Generative AI puzzle? 🍫

We are looking for participants of every age between 16 and 60 years old. πŸ“Š

Contact us and we will deliver the puzzle and cards to you in person :)

23.02.2026 06:27 πŸ‘ 7 πŸ” 3 πŸ’¬ 0 πŸ“Œ 0
22.02.2026 18:25 πŸ‘ 288 πŸ” 39 πŸ’¬ 9 πŸ“Œ 3

Oh amassing large enough datasets with provenance for language model training is totally doable. Just when you do that you feel lonely (and unpaid) as people don’t really care.

22.02.2026 13:03 πŸ‘ 55 πŸ” 5 πŸ’¬ 2 πŸ“Œ 0
Preview
Child’s Play, by Sam Kriss Tech’s new generation and the end of thinking

Sam Kriss reports from San Francisco on the next generation of AI startups and their β€œhighly agentic” founders.

harpers.org/archive/2026...

18.02.2026 17:00 πŸ‘ 20 πŸ” 7 πŸ’¬ 2 πŸ“Œ 9

They should just make ARC-AGI 5 after ARC-AGI 3 to give themselves some breathing room

20.02.2026 03:37 πŸ‘ 34 πŸ” 2 πŸ’¬ 3 πŸ“Œ 0
Every Eval Ever | EvalEval Coalition

πŸš€ Launching Every Eval Ever: Toward a Common Language for AI Eval Reporting πŸš€

A shared schema + crowdsourced repository so we can finally compare evals across frameworks and stop rerunning everything from scratch πŸ”§

A tale of broken AI evals πŸ§΅πŸ‘‡

evalevalai.com/projects/eve...

17.02.2026 15:00 πŸ‘ 11 πŸ” 4 πŸ’¬ 1 πŸ“Œ 4
Post image

IMPORTANT: claude is wearing a little hat today

18.02.2026 14:25 πŸ‘ 333 πŸ” 30 πŸ’¬ 7 πŸ“Œ 2

🚨 The next edition of EvalEval Workshop is coming to
@aclmeeting.bsky.social 2026!

🧠 Workshop on "AI Evaluation in Practice: Bridging Research, Development, and Real-World Impact" πŸŽ‡

πŸ“’ CFP is now open!!! More details ⏬

πŸ“ San Diego
πŸ“ Submission deadline: Mar 12, 2026

17.02.2026 00:21 πŸ‘ 6 πŸ” 3 πŸ’¬ 1 πŸ“Œ 0

everybody’s somebody’s reviewer 2

16.02.2026 21:48 πŸ‘ 3 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
ACL 2026 Workshop CoNLL Welcome to the OpenReview homepage for ACL 2026 Workshop CoNLL

πŸ˜Άβ€πŸŒ«οΈπŸ˜Άβ€πŸŒ«οΈ You are not hallucinating …

πŸ“… The CoNLL 2026 deadline is still Feb 19, 2026 (AoE)

Submit Here: bit.ly/4kgRyKF

16.02.2026 19:46 πŸ‘ 3 πŸ” 1 πŸ’¬ 0 πŸ“Œ 0