Building reliable AI systems at Google DeepMind: Lessons from the trenches

Building reliable AI systems at Google DeepMind: Lessons from the trenches

Deploying large language models (LLMs) that reliably work in real-world applications requires robust evaluation. This talk dives into hands-on techniques for crafting effective evals to measure and improve your LLM's performance, as well as spotlighting common developer mistakes and how to avoid them.

Beyond evals, we share battle-tested insights from integrating Gemini models into production applications used by 100s of millions. Expect practical takeaways on tackling challenges, implementing best practices, and actionable strategies to build LLM-powered applications you can rely on.

If your team is using LLMs for solving real problems, and want to move beyond academic benchmarks to real-world impact, this talk is for you.

Paige Bailey
Paige Bailey
AI Engineering Lead, Google DeepMind
View Recording
Let’s connect
Check out similar events
hasura-community-call

PromptQL: Data Agent on Hasura DDN that
gets you close to 100% accuracy on RAG

Ship a rock-solid API on your data – in minutes!