The Gemini 2.5 Technical Report is out: storage.googleapis.com/deepmind-med...
The Gemini 2.5 Technical Report is out: storage.googleapis.com/deepmind-med...
π₯Introducing Gemini 2.5, our most intelligent model with impressive capabilities in advanced reasoning and coding.
Now integrating thinking capabilities, 2.5 Pro Experimental is our most performant Gemini model yet. Itβs #1 on the LM Arena leaderboard. π₯
Weβve been teaching Gemini to think.
Try it here: aistudio.google.com/prompts/new_...
Happy birthday Gemini!
π’We release TΓΌlu 3, a family of fully-open state-of-the-art post-trained models, alongside its data, code, and training recipes, serving as a comprehensive guide for modern post-training techniques!
Good software is an enabler for good science! π₯π§ͺ
Inspired by the below post, I like to point people at libraries like github.com/patrick-kidg... as a template for what a modern Python library looks like: `pre-commit`, ruff, pyright, pyproject.toml, an open-source license, etc. π€
Fun, insightful, useful, cheap: Thinking Like A Large Language Model: Become an AI manager a.co/d/7xMTtJM
A comparison of LLMs mean rating average in presentational and epistemological dimensions.
We compared notable LLMs such as InstructGPT, ChatGPT, GPT4, PaLM2 (text-bison), and Falcon-180B. They excel at presenting climate information, but there's room for improvement in the epistemic qualities of their answers.
This is a tough task for human raters. Our study finds that AI can effectively assist human raters, offering promising avenues for scalable oversight on difficult problems like this.
Excited to share our latest paper: We explore how large language models tackle questions on climate change π, introducing an evaluation framework grounded in #SciComm research.Β
Read the preprint: arxiv.org/abs/2310.02932