Derek Abdine (@dabdine)

Fair enough. Way more conspiracy theories in X these days than there used to be though. Also had about 11 bots follow me yesterday after a single tweet. One created a meme coin for my startup. Maybe dead internet theory is actually real….

03.01.2025 01:55 👍 1 🔁 0 💬 1 📌 0

Perfectly describes the current state of Xhitter

03.01.2025 01:50 👍 1 🔁 0 💬 1 📌 0

Turns out I was right on the nose. I didn’t see his talk (even posted this hours before I saw it popped up), but it’s not hard to see based on the rate of enhancements over the past 24 months.

14.12.2024 05:52 👍 0 🔁 0 💬 0 📌 0

They’re waiting for you, Gordon. In the tessssssst chamberrrrrr.

14.12.2024 02:43 👍 2 🔁 0 💬 1 📌 0

I keep forgetting they’re still doing this

14.12.2024 01:57 👍 1 🔁 0 💬 0 📌 0

Manual. YMMV with prepared stuff like AutoGPT, but base LLMs at a fundamental level are just token emitters, so you have to string them together with other stuff to make them useful. Like a brain without a body.

13.12.2024 20:57 👍 1 🔁 0 💬 1 📌 0

Another fun thought: I could give furl an agent that knows how furl itself is designed, its code framework, etc., and make it self-generate new agents and tools in case it can’t accomplish a task itself. Even an agent/tool to (re)train its own model.

13.12.2024 19:55 👍 1 🔁 0 💬 1 📌 0

- Anthropic released a computer use model which seems like it would rely on tools combined with image processing (which has already existed).

To name a few. In other words, innovation seems to be on price per token and specific application now rather than on overall accuracy of base models.

13.12.2024 19:38 👍 1 🔁 0 💬 0 📌 0

It seems like there’s credence to the idea that LLMs are at a point where we will see less significant gains on base models alone:

- Amazon’s stats at re:invent were underwhelming compared to most existing models.
- OpenAI’s o1 appears to just be an agent arch applying a critic.

13.12.2024 19:38 👍 1 🔁 0 💬 1 📌 1

AI layer to research details about software like vendor website, docs, etc that a human could do but would take forever. Useful for remediation to have all the details about a particular software / package / whatever available when deciding what to do.

13.12.2024 19:27 👍 2 🔁 0 💬 0 📌 0

This setup is used as the backing AI to furl.ai’s autonomous patching. We expose it all as a REST API internally to our other services which rely on our AI layer to gen the scripts/instrictuons/research details on software for us (software inventory info databases suck so we also use our 1/2

13.12.2024 19:27 👍 0 🔁 0 💬 1 📌 0

For executing scripts we basically just boot a clean macOS / windows / Linux (rhel, Ubuntu) host and ship the script, execute and return stdout/stderr. Lots of ways to do that (some cheaper than others). 2/2

13.12.2024 19:23 👍 1 🔁 0 💬 0 📌 0

Nope, those tools were built by us in-house. You can use scraperapi or other headless browser scraping services for content extraction (note: this is a slightly dumb way to do it, there are more intelligent ways to extract text from websites). 1/2

13.12.2024 19:23 👍 1 🔁 0 💬 1 📌 0

to use with the web_scrape tool. If we find that it isn't doing that well enough, we can make a google_search agent (agents have a system prompt, samples, own model, etc that tools don't have. Tools are just functions.) that is specialized for this task. 5/5

13.12.2024 17:53 👍 0 🔁 0 💬 1 📌 0

The research_from_internet tool actually calls our "internet_researcher" agent, which itself has web_scrape and search_google tools. The former will use services to extract text from rendered websites, the latter will use Google's customsearch api. internet_researcher must also gen search terms 4/5

13.12.2024 17:53 👍 0 🔁 0 💬 1 📌 0

For example, the "upgrade_script_developer" agent uses OpenAI's base gpt-4o model, but itself knows about two tools: execute_script_on_runner and research_from_internet. The execute_script_on_runner tool runs a script that is generated by the LLM on a host and simply returns the response. 3/5

13.12.2024 17:53 👍 0 🔁 0 💬 1 📌 0

with it's own system prompt and tool knowledge. Each agent can be configured to use its own model if we want (but don't do right now). When we build out a new agent, we can make the agent use other agents to achieve its goal.
2/5

13.12.2024 17:53 👍 0 🔁 0 💬 1 📌 0

We use OpenAI's base models with RAG (later, fine tuned) essentially. So, in this case gpt-4o. Our "cognition" framework (which follows the NVIDIA blog post) contains agents and tools. Agents know about tools. Agents can be tools themselves. So basically each agent is the specialist 1/5

13.12.2024 17:53 👍 0 🔁 0 💬 1 📌 0

Right now we just use OpenAI, though our design allows us to plug any LLM in (we have support for Gemini, Azure OpenAI, Grok, and Anthropic). Only very few support tool calls. For those that do, I still haven’t seen accuracy or reliability as high as OpenAI. Tool calls can be added to any LLM tho.

13.12.2024 17:41 👍 1 🔁 0 💬 1 📌 0

Introduction to LLM Agents | NVIDIA Technical Blog Consider a large language model (LLM) application that is designed to help financial analysts answer questions about the performance of a company. With a well-designed retrieval augmented generation…

More or less implement the components here, though the agent graph is not detailed:

developer.nvidia.com/blog/introdu...

13.12.2024 17:38 👍 1 🔁 0 💬 0 📌 0

Haven’t written a guide, but open to doing that. LangGraph may be the closest framework to what we’ve built.

Most of what we have now is the culmination of trial & error + arxiv papers + blog posts + security/scanning backgrounds + some major conceptual contributions from our former chief of ai

13.12.2024 17:34 👍 1 🔁 0 💬 2 📌 0

Definitely is. I’ve found accuracy improves greatly as you add more “specialists” that work in concert with each other (ie a true multi agent architecture), not just tools and not just prompt engineering. Accuracy scales fairly well and much faster than with prompt tweaks alone.

13.12.2024 05:25 👍 1 🔁 0 💬 1 📌 0

Dunno. I’ve built one that uses agents to reason through creating upgrade scripts that work by giving it access to search google, scrape content from websites, and execute stuff in a sandbox. If it fails itll correct itself and try again. Knowing when to stop is key tho not hard for narrow use cases

13.12.2024 04:12 👍 1 🔁 0 💬 1 📌 0

Yep. Basically run the original request and response through a “critic” which attempts to refute hallucinated bullshit. LLMs are pretty damn good at text extraction, so you are sort of leaning on that to provide some level of error correction.

13.12.2024 03:46 👍 1 🔁 0 💬 1 📌 0

Scaling up the Prime Video audio/video monitoring service and reducing costs by 90% The move from a distributed microservices architecture to a monolith application helped achieve higher scale, resilience, and reduce costs.

Relevant:

https://www.primevideotech.com/video-streaming/scaling-up-the-prime-video-audio-video-monitoring-service-and-reducing-costs-by-90

16.06.2023 15:13 👍 1 🔁 0 💬 0 📌 0

Good to be on BS (are we calling it that)? Guess I should update my profile image...

14.06.2023 15:54 👍 2 🔁 0 💬 0 📌 0

Derek Abdine

Latest posts by Derek Abdine @dabdine