Fair enough. Way more conspiracy theories in X these days than there used to be though. Also had about 11 bots follow me yesterday after a single tweet. One created a meme coin for my startup. Maybe dead internet theory is actually realβ¦.
Fair enough. Way more conspiracy theories in X these days than there used to be though. Also had about 11 bots follow me yesterday after a single tweet. One created a meme coin for my startup. Maybe dead internet theory is actually realβ¦.
Perfectly describes the current state of Xhitter
Turns out I was right on the nose. I didnβt see his talk (even posted this hours before I saw it popped up), but itβs not hard to see based on the rate of enhancements over the past 24 months.
Theyβre waiting for you, Gordon. In the tessssssst chamberrrrrr.
I keep forgetting theyβre still doing this
Manual. YMMV with prepared stuff like AutoGPT, but base LLMs at a fundamental level are just token emitters, so you have to string them together with other stuff to make them useful. Like a brain without a body.
Another fun thought: I could give furl an agent that knows how furl itself is designed, its code framework, etc., and make it self-generate new agents and tools in case it canβt accomplish a task itself. Even an agent/tool to (re)train its own model.
- Anthropic released a computer use model which seems like it would rely on tools combined with image processing (which has already existed).
To name a few. In other words, innovation seems to be on price per token and specific application now rather than on overall accuracy of base models.
It seems like thereβs credence to the idea that LLMs are at a point where we will see less significant gains on base models alone:
- Amazonβs stats at re:invent were underwhelming compared to most existing models.
- OpenAIβs o1 appears to just be an agent arch applying a critic.
AI layer to research details about software like vendor website, docs, etc that a human could do but would take forever. Useful for remediation to have all the details about a particular software / package / whatever available when deciding what to do.
This setup is used as the backing AI to furl.aiβs autonomous patching. We expose it all as a REST API internally to our other services which rely on our AI layer to gen the scripts/instrictuons/research details on software for us (software inventory info databases suck so we also use our 1/2
For executing scripts we basically just boot a clean macOS / windows / Linux (rhel, Ubuntu) host and ship the script, execute and return stdout/stderr. Lots of ways to do that (some cheaper than others). 2/2
Nope, those tools were built by us in-house. You can use scraperapi or other headless browser scraping services for content extraction (note: this is a slightly dumb way to do it, there are more intelligent ways to extract text from websites). 1/2
to use with the web_scrape tool. If we find that it isn't doing that well enough, we can make a google_search agent (agents have a system prompt, samples, own model, etc that tools don't have. Tools are just functions.) that is specialized for this task. 5/5
The research_from_internet tool actually calls our "internet_researcher" agent, which itself has web_scrape and search_google tools. The former will use services to extract text from rendered websites, the latter will use Google's customsearch api. internet_researcher must also gen search terms 4/5
For example, the "upgrade_script_developer" agent uses OpenAI's base gpt-4o model, but itself knows about two tools: execute_script_on_runner and research_from_internet. The execute_script_on_runner tool runs a script that is generated by the LLM on a host and simply returns the response. 3/5
with it's own system prompt and tool knowledge. Each agent can be configured to use its own model if we want (but don't do right now). When we build out a new agent, we can make the agent use other agents to achieve its goal.
2/5
We use OpenAI's base models with RAG (later, fine tuned) essentially. So, in this case gpt-4o. Our "cognition" framework (which follows the NVIDIA blog post) contains agents and tools. Agents know about tools. Agents can be tools themselves. So basically each agent is the specialist 1/5
Right now we just use OpenAI, though our design allows us to plug any LLM in (we have support for Gemini, Azure OpenAI, Grok, and Anthropic). Only very few support tool calls. For those that do, I still havenβt seen accuracy or reliability as high as OpenAI. Tool calls can be added to any LLM tho.
More or less implement the components here, though the agent graph is not detailed:
developer.nvidia.com/blog/introdu...
Havenβt written a guide, but open to doing that. LangGraph may be the closest framework to what weβve built.
Most of what we have now is the culmination of trial & error + arxiv papers + blog posts + security/scanning backgrounds + some major conceptual contributions from our former chief of ai
Definitely is. Iβve found accuracy improves greatly as you add more βspecialistsβ that work in concert with each other (ie a true multi agent architecture), not just tools and not just prompt engineering. Accuracy scales fairly well and much faster than with prompt tweaks alone.
Dunno. Iβve built one that uses agents to reason through creating upgrade scripts that work by giving it access to search google, scrape content from websites, and execute stuff in a sandbox. If it fails itll correct itself and try again. Knowing when to stop is key tho not hard for narrow use cases
Yep. Basically run the original request and response through a βcriticβ which attempts to refute hallucinated bullshit. LLMs are pretty damn good at text extraction, so you are sort of leaning on that to provide some level of error correction.
Good to be on BS (are we calling it that)? Guess I should update my profile image...