Jeff Smith's Avatar

Jeff Smith

@jeffsmith.tech

Building 2nd Set AI: https://2ndset.ai/ Researching and writing: https://www.jeffsmith.tech/

690
Followers
2,705
Following
277
Posts
15.11.2024
Joined
Posts Following

Latest posts by Jeff Smith @jeffsmith.tech

Preview
GitHub - 2ndSetAI/good-egg: Trust scoring for GitHub PR authors using graph-based ranking on contribution graphs Trust scoring for GitHub PR authors using graph-based ranking on contribution graphs - 2ndSetAI/good-egg

Full methodology, all scoring data, and the failures are published alongside the successes. github.com/2ndSetAI/goo...

25.02.2026 14:13 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

v1 and v2 have identical AUC (0.647). We shipped v2 anyway because merge rate corrects survivorship bias and account age stabilizes sparse graphs. Both carry confirmed statistical signal. The flat AUC just means the graph already captures most ranking information.

25.02.2026 14:13 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

We tested seven features on 5,129 PRs across 49 repos. Three survived. Most interesting failure: text similarity between PR descriptions and project READMEs. Higher similarity predicted lower merge rates. We think low-effort PRs parrot project language.

25.02.2026 14:13 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

The case that motivated it: Guillermo Rauch scores MEDIUM against his own company's Next.js repo. Zero merged PRs in Next.js itself. v2 factors in his 17.7-year account and 78% merge rate, pushing him to HIGH.

25.02.2026 14:13 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

v1's blind spot: it only sees merged PRs. Someone with 10 merged and 90 closed looks identical to someone with 10 merged and 0 closed. v2 adds merge rate and account age on top of the graph score to fix this.

25.02.2026 14:13 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Preview
Scoring Open Source Contributors in the Age of AI Slop Finding Good Eggs

New blog post: full methodology behind Good Egg's v2 scoring model (Better Egg), a validation study on 5,129 PRs, and every feature we tested and dropped. neotenyai.substack.com/p/scoring-op...

25.02.2026 14:13 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Preview
Good Egg: Trust Scoring PRs - GitHub Marketplace Score PR author trustworthiness using contribution graph analysis

Or on @github.com Actions Marketplace: github.com/marketplace/...

10.02.2026 15:14 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Get started today with:
pip install good-egg

10.02.2026 15:14 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Full methodology writeup if you want the details on the graph scoring, language normalization, and anti-gaming measures:
github.com/2ndSetAI/good-egg/blob/main/docs/methodology.md

10.02.2026 15:14 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

What Good Egg doesn't do: it doesn't send data to any remote service. Reads from the GitHub API, computes locally. No training set, no contributor database. Just a tool.

Scoring parameters are fully configurable. More data sources (GitLab) and methodology extensions planned.

10.02.2026 15:14 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

On Vouch: Mitchell Hashimoto built a manual web-of-trust for this. I think that's valid. I've seen circles of trust work on PyTorch where contributors came from everywhere.
But I've also seen gaps that a bit of existing data could fill. These are complementary.

10.02.2026 15:14 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

It runs four ways:
- GitHub Action (drop into any PR workflow) - CLI
- Python library
- MCP server (for AI assistants)

Designed to be simple and portable. Pick the interface that fits your workflow.

10.02.2026 15:14 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

How: it builds a contribution graph from merged PRs, applies personalized graph scoring biased toward your project and language ecosystem, and accounts for recency, repo quality, and anti-gaming measures.

Classifies contributors as HIGH / MEDIUM / LOW / UNKNOWN / BOT.

10.02.2026 15:14 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Good Egg is a trust scoring tool for GitHub PR authors. It mines a contributor's merged PR history across GitHub and computes a trust score relative to your project.

10.02.2026 15:14 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

I've seen OSS collaboration at its best. But the code slop problem is real.

10.02.2026 15:14 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Preview
Machine Learning Systems - Jeff Smith Build reliable, scalable machine learning systems with reactive design solutions.

I've been in AI + open source for a long time: Spark, Elixir, and then managing the original PyTorch team at Meta. I even wrote a book about it with all open source code:

manning.com/books/machine-learning-systems

10.02.2026 15:14 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Preview
GitHub - 2ndSetAI/good-egg: Trust scoring for GitHub PR authors using graph-based ranking on contribution graphs Trust scoring for GitHub PR authors using graph-based ranking on contribution graphs - 2ndSetAI/good-egg

AI has made mass pull requests trivial to generate. Contribution volume is up, signal-to-noise is down. Maintainers can't assume a PR represents genuine investment anymore.

I built a tool to help with this. Thread 🧡
github.com/2ndSetAI/goo...

10.02.2026 15:14 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

Vibe coding kills open source.
Our most direct title yet. @koren.mk @julianhi.nz @aaron-lohmann.bsky.social
Theory paper with numbers and policy recs. First at arxiv.org/abs/2601.15494
Comments welcome.

@ceu-economics.bsky.social @kiel.institute

23.01.2026 22:40 πŸ‘ 56 πŸ” 24 πŸ’¬ 2 πŸ“Œ 4

If structuralism can unlock a new era of AI research, then the party is really just getting started. πŸ₯³

05.01.2026 13:11 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

The era of biology-like problems getting unlocked by connectionist approaches has been a blast. But I can't help but agree that it's coming to its closing chapter. And that's actually incredibly exciting.

05.01.2026 13:11 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Preview
SHARe-KAN: Holographic Vector Quantization for Memory-Bound Inference Kolmogorov-Arnold Networks (KANs) face a fundamental memory wall: their learned basis functions create parameter counts that impose extreme bandwidth demands, hindering deployment in memory-constraine...

Aside: Shameless plug of my recent paper on SHARe-KANs that shows how extreme this can be with just some off the shelf compression tricks. arxiv.org/abs/2512.15742

05.01.2026 13:11 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

As is Ziming's wont, he's pretty modest about big of a deal KANs are for pointing the way towards a more structuralist future for learning methods. They're not the final answer, but they are 100% proof of life that a structuralist approach enables radical leaps in compressibility of intelligence.

05.01.2026 13:11 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

I strongly agree with Ziming's framing: abstraction is really the goal. And structure is clearly part of the answer to how we get to higher levels of abstraction.

05.01.2026 13:11 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Some of the earliest ML work I ever did professional was on symbolic regression when working for Ben Goertzel at a fever dream of an AI research startup back in Hong Kong.

05.01.2026 13:11 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Achieving AGI Intelligently – Structure, Not Scale | Ziming Liu A simple, whitespace theme for academics. Based on [*folio](https://github.com/bogoli/-folio) design.

One of the most illuminating and inspiring things I've read this year is Ziming Liu's post on structuralism. So, much of his framing really resonates with me. kindxiaoming.github.io/blog/2025/st...

05.01.2026 13:11 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Oftentimes, this involves thinking about what's really going on in AI more generally. It's a field full of some of the smartest people in the world. There is a distinct arc to where the progress is headed, and a regular hacker like me can only swim with the wave and hope to stand up once it crests.

05.01.2026 13:11 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

I try to begin every new year in big picture mode. Really think about what I want to do differently this year and how to get there.

A 🧡on the larger arc of #AI #research follows.

05.01.2026 13:11 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Preview
SHARe-KAN: Holographic Vector Quantization for Memory-Bound Inference Kolmogorov-Arnold Networks (KANs) face a fundamental memory wall: their learned basis functions create parameter counts that impose extreme bandwidth demands, hindering deployment in memory-constraine...

Preprint here: arxiv.org/abs/2512.15742 #ai #research #arxiv

19.12.2025 12:08 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Thanks to Robert Ronan, Saurav Pandit, and Ian Nielsen for their feedback. And thanks to Ziming Liu for the original KAN work and his encouragement on this path.

19.12.2025 12:08 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

By shifting from pruning to quantization, the method achieves ResNet-50 accuracy with a 12MB head running at sub-millisecond latency.
A dense KAN can scale to complex tasks if you treat the weights as signals rather than parameters.

19.12.2025 12:08 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0