Pipes Feed Preview: Towards Data Science & The New Stack & DevOps & SRE & DevOps.com & Google DeepMind News

  1. Hallucinations in LLMs Are Not a Bug in the Data

    Mon, 16 Mar 2026 19:15:31 -0000

    <p>It’s a feature of the architecture</p> <p>The post <a href="https://towardsdatascience.com/hallucinations-in-llms-are-not-a-bug-in-the-data/">Hallucinations in LLMs Are Not a Bug in the Data</a> appeared first on <a href="https://towardsdatascience.com">Towards Data Science</a>.</p>
  2. Follow the AI Footpaths

    Mon, 16 Mar 2026 19:00:37 -0000

    <p>Shadow AI and the desire paths of modern work</p> <p>The post <a href="https://towardsdatascience.com/follow-the-ai-footpaths/">Follow the AI Footpaths</a> appeared first on <a href="https://towardsdatascience.com">Towards Data Science</a>.</p>
  3. How to Build a Production-Ready Claude Code Skill

    Mon, 16 Mar 2026 18:04:24 -0000

    <p>What I learned building and distributing my first Skill from scratch</p> <p>The post <a href="https://towardsdatascience.com/how-to-build-a-production-ready-claude-code-skill/">How to Build a Production-Ready Claude Code Skill</a> appeared first on <a href="https://towardsdatascience.com">Towards Data Science</a>.</p>
  4. Bayesian Thinking for People Who Hated Statistics

    Mon, 16 Mar 2026 12:00:00 -0000

    <p>You already think like a Bayesian. Your stats class just taught the formula before the intuition. Here's a 5-step framework to apply it at work.</p> <p>The post <a href="https://towardsdatascience.com/bayesian-thinking-for-people-who-hated-statistics/">Bayesian Thinking for People Who Hated Statistics</a> appeared first on <a href="https://towardsdatascience.com">Towards Data Science</a>.</p>
  5. The 2026 Data Mandate: Is Your Governance Architecture a Fortress or a Liability?

    Sun, 15 Mar 2026 15:00:00 -0000

    <p>Is your data strategy 2026-ready? Get a deep dive into the mandatory shift toward human-in-the-loop oversight, active metadata, and the strategic advantages of European data sovereignty.</p> <p>The post <a href="https://towardsdatascience.com/the-2026-data-mandate-is-your-governance-architecture-a-fortress-or-a-liability/">The 2026 Data Mandate: Is Your Governance Architecture a Fortress or a Liability?</a> appeared first on <a href="https://towardsdatascience.com">Towards Data Science</a>.</p>
  6. The Causal Inference Playbook: Advanced Methods Every Data Scientist Should Master

    Sun, 15 Mar 2026 13:00:00 -0000

    <p>Master six advanced causal inference methods with Python: doubly robust estimation, instrumental variables, regression discontinuity, modern difference-in-differences, heterogeneous treatment effects and sensitivity analysis. Includes code and a practical decision framework.</p> <p>The post <a href="https://towardsdatascience.com/the-causal-inference-playbook-advanced-methods-every-data-scientist-should-master/">The Causal Inference Playbook: Advanced Methods Every Data Scientist Should Master</a> appeared first on <a href="https://towardsdatascience.com">Towards Data Science</a>.</p>
  7. The Multi-Agent Trap

    Sat, 14 Mar 2026 15:00:00 -0000

    <p>Google DeepMind found multi-agent networks amplify errors 17x. Learn 3 architecture patterns that separate $60M wins from the 40% that get canceled.</p> <p>The post <a href="https://towardsdatascience.com/the-multi-agent-trap/">The Multi-Agent Trap</a> appeared first on <a href="https://towardsdatascience.com">Towards Data Science</a>.</p>
  8. The Current Status of The Quantum Software Stack

    Sat, 14 Mar 2026 13:00:00 -0000

    <p>How do we program quantum computers today?</p> <p>The post <a href="https://towardsdatascience.com/the-current-status-of-the-quantum-software-stack/">The Current Status of The Quantum Software Stack</a> appeared first on <a href="https://towardsdatascience.com">Towards Data Science</a>.</p>
  9. Why Care About Prompt Caching in LLMs?

    Fri, 13 Mar 2026 17:09:47 -0000

    <p>Optimizing the cost and latency of your LLM calls with Prompt Caching</p> <p>The post <a href="https://towardsdatascience.com/why-care-about-promp-caching-in-llms/">Why Care About Prompt Caching in LLMs?</a> appeared first on <a href="https://towardsdatascience.com">Towards Data Science</a>.</p>
  10. How Vision Language Models Are Trained from “Scratch”

    Fri, 13 Mar 2026 16:30:00 -0000

    <p>A deep dive into exactly how text-only language models are finetuned to *see* images</p> <p>The post <a href="https://towardsdatascience.com/how-vision-language-models-are-trained-from-scratch/">How Vision Language Models Are Trained from “Scratch”</a> appeared first on <a href="https://towardsdatascience.com">Towards Data Science</a>.</p>
  11. Personalized Restaurant Ranking with a Two-Tower Embedding Variant

    Fri, 13 Mar 2026 15:00:00 -0000

    <p>How a lightweight two-tower model improved restaurant discovery when popularity ranking failed</p> <p>The post <a href="https://towardsdatascience.com/personalized-restaurant-ranking-with-a-two-tower-embedding-variant/">Personalized Restaurant Ranking with a Two-Tower Embedding Variant</a> appeared first on <a href="https://towardsdatascience.com">Towards Data Science</a>.</p>
  12. A Tale of Two Variances: Why NumPy and Pandas Give Different Answers

    Fri, 13 Mar 2026 13:30:00 -0000

    <p>Imagine you are analyzing a small dataset: You want to calculate some summary statistics to get an idea of the distribution of this data, so you use numpy to calculate the mean and variance. Your output Looks like this: Great! Now you have an idea of the distribution of your data. However, your colleague comes [&#8230;]</p> <p>The post <a href="https://towardsdatascience.com/a-tale-of-two-variances-why-numpy-and-pandas-give-different-answers/">A Tale of Two Variances: Why NumPy and Pandas Give Different Answers</a> appeared first on <a href="https://towardsdatascience.com">Towards Data Science</a>.</p>
  13. How to Build Agentic RAG with Hybrid Search

    Fri, 13 Mar 2026 12:00:00 -0000

    <p>Learn how to build a powerful agentic RAG system</p> <p>The post <a href="https://towardsdatascience.com/how-to-build-agentic-rag-with-hybrid-search/">How to Build Agentic RAG with Hybrid Search</a> appeared first on <a href="https://towardsdatascience.com">Towards Data Science</a>.</p>
  14. Exploratory Data Analysis for Credit Scoring with Python

    Thu, 12 Mar 2026 16:30:00 -0000

    <p>Understanding default risk through statistical analysis of borrower and loan characteristics.</p> <p>The post <a href="https://towardsdatascience.com/exploratory-data-analysis-for-credit-scoring-with-python/">Exploratory Data Analysis for Credit Scoring with Python</a> appeared first on <a href="https://towardsdatascience.com">Towards Data Science</a>.</p>
  15. Solving the Human Training Data Problem

    Thu, 12 Mar 2026 15:00:00 -0000

    <p>How AI has completely transformed the way I study as a graduate student</p> <p>The post <a href="https://towardsdatascience.com/solving-the-human-training-data-problem/">Solving the Human Training Data Problem</a> appeared first on <a href="https://towardsdatascience.com">Towards Data Science</a>.</p>
  16. Scaling Vector Search: Comparing Quantization and Matryoshka Embeddings for 80% Cost Reduction

    Thu, 12 Mar 2026 13:30:00 -0000

    <p>Navigating the performance cliff: How pairing MRL with int8 and binary quantization balances infrastructure costs with retrieval accuracy.</p> <p>The post <a href="https://towardsdatascience.com/649627-2/">Scaling Vector Search: Comparing Quantization and Matryoshka Embeddings for 80% Cost Reduction</a> appeared first on <a href="https://towardsdatascience.com">Towards Data Science</a>.</p>
  17. I Finally Built My First AI App (And It Wasn’t What I Expected)

    Thu, 12 Mar 2026 12:00:00 -0000

    <p>A beginner-friendly walkthrough of API calls, environment variables, and real-world AI infrastructure</p> <p>The post <a href="https://towardsdatascience.com/i-finally-built-my-first-ai-app-and-it-wasnt-what-i-expected/">I Finally Built My First AI App (And It Wasn’t What I Expected)</a> appeared first on <a href="https://towardsdatascience.com">Towards Data Science</a>.</p>
  18. An Intuitive Guide to MCMC (Part I): The Metropolis-Hastings Algorithm

    Wed, 11 Mar 2026 16:30:00 -0000

    <p>Tired of the AI hype? Let's talk about the probabilistic algorithms actually driving high-end quantitative finance.</p> <p>The post <a href="https://towardsdatascience.com/an-intuitive-guide-to-mcmc-part-i-the-metropolis-hastings-algorithm/">An Intuitive Guide to MCMC (Part I): The Metropolis-Hastings Algorithm</a> appeared first on <a href="https://towardsdatascience.com">Towards Data Science</a>.</p>
  19. Spectral Clustering Explained: How Eigenvectors Reveal Complex Cluster Structures

    Wed, 11 Mar 2026 15:00:00 -0000

    <p>Understanding why spectral clustering outperforms K-means</p> <p>The post <a href="https://towardsdatascience.com/spectral-clustering-explained-how-eigenvectors-reveal-complex-cluster-structures/">Spectral Clustering Explained: How Eigenvectors Reveal Complex Cluster Structures</a> appeared first on <a href="https://towardsdatascience.com">Towards Data Science</a>.</p>
  20. Why Most A/B Tests Are Lying to You

    Wed, 11 Mar 2026 13:30:00 -0000

    <p>The 4 statistical sins that invalidate most A/B tests, plus a pre-test checklist and Bayesian vs frequentist decision framework you can use Monday.</p> <p>The post <a href="https://towardsdatascience.com/why-most-a-b-tests-are-lying-to-you/">Why Most A/B Tests Are Lying to You</a> appeared first on <a href="https://towardsdatascience.com">Towards Data Science</a>.</p>
  21. Nvidia brings together AI labs to build the next generation of open base models

    Mon, 16 Mar 2026 20:20:18 -0000

    <img width="1024" height="768" src="https://cdn.thenewstack.io/media/2026/03/ef7f08c1-img_3421-1024x768.jpg" class="webfeedsFeaturedVisual wp-post-image wp-stateless-item" alt="" style="display: block; margin: auto; margin-bottom: 20px;max-width: 100%;" link_thumbnail="" decoding="async" fetchpriority="high" data-image-size="large" data-stateless-media-bucket="cdn.thenewstack.io" data-stateless-media-name="media/2026/03/ef7f08c1-img_3421-scaled.jpg" /><p>Nvidia on Monday announced the Nemotron Coalition at its GTC conference. This new coalition of AI labs will pool expertise,</p> <p>The post <a href="https://thenewstack.io/nvidia-tier2-nemotron-coalition/">Nvidia brings together AI labs to build the next generation of open base models</a> appeared first on <a href="https://thenewstack.io">The New Stack</a>.</p>
    The Nemotron coalation brings together a number of AI startups to collaborate on a new base model which will become the basis for Nemotron 4.
  22. Nvidia’s NemoClaw is OpenClaw with guardrails

    Mon, 16 Mar 2026 20:05:24 -0000

    <img width="1024" height="576" src="https://cdn.thenewstack.io/media/2026/03/5d9130eb-nemoclaw-for-openclaw-image-1024x576.png" class="webfeedsFeaturedVisual wp-post-image wp-stateless-item" alt="" style="display: block; margin: auto; margin-bottom: 20px;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" data-image-size="large" data-stateless-media-bucket="cdn.thenewstack.io" data-stateless-media-name="media/2026/03/5d9130eb-nemoclaw-for-openclaw-image.png" /><p>At its annual GTC conference, Nvidia on Monday announced the Nvidia Agent Toolkit, which brings together open models, runtimes, open</p> <p>The post <a href="https://thenewstack.io/nemoclaw-openclaw-with-guardrails/">Nvidia&#8217;s NemoClaw is OpenClaw with guardrails</a> appeared first on <a href="https://thenewstack.io">The New Stack</a>.</p>
    Nvidia&#039;s NemoClaw combines the OpenClaw agent platform with components of its Agent Toolkit to add privacy and security controls.
  23. Cursor built a fleet of security agents to solve a familiar frustration

    Mon, 16 Mar 2026 18:17:20 -0000

    <img width="1024" height="768" src="https://cdn.thenewstack.io/media/2026/03/d31fd2f3-eva-wahyuni-hkh4kzptqh8-unsplash-1024x768.jpg" class="webfeedsFeaturedVisual wp-post-image wp-stateless-item" alt="" style="display: block; margin: auto; margin-bottom: 20px;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" data-image-size="large" data-stateless-media-bucket="cdn.thenewstack.io" data-stateless-media-name="media/2026/03/d31fd2f3-eva-wahyuni-hkh4kzptqh8-unsplash-scaled.jpg" /><p>Cursor&#8216;s security team has built a fleet of AI agents that continuously monitor and secure the company&#8217;s codebase, and it</p> <p>The post <a href="https://thenewstack.io/cursor-open-sources-security-agents/">Cursor built a fleet of security agents to solve a familiar frustration</a> appeared first on <a href="https://thenewstack.io">The New Stack</a>.</p>
    Cursor&#039;s security team built AI agents that monitor code and block vulnerabilities in pull requests — and is now open-sourcing the templates and Terraform.
  24. Anthropic doubles Claude usage outside peak hours — but it won’t last forever

    Mon, 16 Mar 2026 18:02:30 -0000

    <img width="1024" height="576" src="https://cdn.thenewstack.io/media/2026/03/0df7d83b-allison-saeng-ewlb9ghso2y-unsplash-1024x576.jpg" class="webfeedsFeaturedVisual wp-post-image wp-stateless-item" alt="Hourglass filled with gold coins instead of sand, symbolizing time passing as money accumulates, depicting the concept that time equals money." style="display: block; margin: auto; margin-bottom: 20px;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" data-image-size="large" data-stateless-media-bucket="cdn.thenewstack.io" data-stateless-media-name="media/2026/03/0df7d83b-allison-saeng-ewlb9ghso2y-unsplash-scaled.jpg" /><p>AI labs keep searching for ways to pull developers deeper into their ecosystems. The latest move comes from Anthropic, which</p> <p>The post <a href="https://thenewstack.io/anthropic-doubles-claude-usage-outside-peak-hours/">Anthropic doubles Claude usage outside peak hours — but it won&#8217;t last forever</a> appeared first on <a href="https://thenewstack.io">The New Stack</a>.</p>
    Anthropic is doubling Claude&#039;s usage limits during off-peak hours for two weeks — a move that&#039;s as much about competing for developer loyalty as thanking users.
  25. Why AI workloads are breaking traditional Kubernetes observability strategies

    Mon, 16 Mar 2026 14:04:13 -0000

    <img width="1024" height="576" src="https://cdn.thenewstack.io/media/2026/03/4f97602b-shubham-dhage-gc_aoajql2q-unsplash-1024x576.jpg" class="webfeedsFeaturedVisual wp-post-image wp-stateless-item" alt="Wireframe cubes connected by lines on a dark background, representing a distributed network or container orchestration architecture" style="display: block; margin: auto; margin-bottom: 20px;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" data-image-size="large" data-stateless-media-bucket="cdn.thenewstack.io" data-stateless-media-name="media/2026/03/4f97602b-shubham-dhage-gc_aoajql2q-unsplash-scaled.jpg" /><p>For most platform engineering and ITOps teams, the ability to orchestrate containerized workloads at scale has transformed development organizations. But</p> <p>The post <a href="https://thenewstack.io/ai-kubernetes-observability-practices/">Why AI workloads are breaking traditional Kubernetes observability strategies</a> appeared first on <a href="https://thenewstack.io">The New Stack</a>.</p>
    Dynatrace experts will share AI-powered Kubernetes observability best practices for managing rising K8s complexity, security, and toolchain consolidation in 2026.
  26. Anthropic makes a pricing change that matters for Claude’s longest prompts

    Mon, 16 Mar 2026 13:33:09 -0000

    <img width="1024" height="768" src="https://cdn.thenewstack.io/media/2026/03/e2417b7a-rifky-nur-setyadi-cs3dlhdhssc-unsplash-1024x768.jpg" class="webfeedsFeaturedVisual wp-post-image wp-stateless-item" alt="Illustration of a cartoon robot inserting a coin while holding a stack of coins, depicting the cost of running large language models." style="display: block; margin: auto; margin-bottom: 20px;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" data-image-size="large" data-stateless-media-bucket="cdn.thenewstack.io" data-stateless-media-name="media/2026/03/e2417b7a-rifky-nur-setyadi-cs3dlhdhssc-unsplash-scaled.jpg" /><p>Anthropic announced Friday that the 1-million-token context window for Claude Opus 4.6 and Claude Sonnet 4.6 is now generally available,</p> <p>The post <a href="https://thenewstack.io/claude-million-token-pricing/">Anthropic makes a pricing change that matters for Claude&#8217;s longest prompts</a> appeared first on <a href="https://thenewstack.io">The New Stack</a>.</p>
    Anthropic removes long-context pricing surcharge for Claude Opus 4.6 and Sonnet 4.6, making 1-million-token context windows available at standard per-token rates.
  27. Agents write code. They don’t do software engineering.

    Mon, 16 Mar 2026 12:00:28 -0000

    <img width="1024" height="768" src="https://cdn.thenewstack.io/media/2026/03/cf02b0a9-ubaid-e-alyafizi-fq7gvbxlyfi-unsplash-1024x768.jpg" class="webfeedsFeaturedVisual wp-post-image wp-stateless-item" alt="Abstract illustration of a software developer standing at a crossroads, representing the human judgment and strategic trade-offs required in software engineering alongside AI agents." style="display: block; margin: auto; margin-bottom: 20px;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" data-image-size="large" data-stateless-media-bucket="cdn.thenewstack.io" data-stateless-media-name="media/2026/03/cf02b0a9-ubaid-e-alyafizi-fq7gvbxlyfi-unsplash-scaled.jpg" /><p>Long-running and background coding agents have hit a new threshold. When an agent runs for hours, manages its own iteration</p> <p>The post <a href="https://thenewstack.io/ai-agents-software-engineering/">Agents write code. They don&#8217;t do software engineering.</a> appeared first on <a href="https://thenewstack.io">The New Stack</a>.</p>
    The anxiety around agents replacing developers misses a fundamental distinction. Writing code is pattern recognition, but software engineering is not.
  28. A beginner’s guide to vibe coding

    Sun, 15 Mar 2026 16:00:31 -0000

    <img width="1024" height="1024" src="https://cdn.thenewstack.io/media/2026/03/dd8b4ddc-dina-gazizova-tvbofummco8-unsplash-1-1024x1024.jpg" class="webfeedsFeaturedVisual wp-post-image wp-stateless-item" alt="" style="display: block; margin: auto; margin-bottom: 20px;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" data-image-size="large" data-stateless-media-bucket="cdn.thenewstack.io" data-stateless-media-name="media/2026/03/dd8b4ddc-dina-gazizova-tvbofummco8-unsplash-1.jpg" /><p>We talk a lot about vibe coding. And to be honest, I&#8217;d heard the term far too many times before</p> <p>The post <a href="https://thenewstack.io/beginners-guide-to-vibe-coding/">A beginner&#8217;s guide to vibe coding</a> appeared first on <a href="https://thenewstack.io">The New Stack</a>.</p>
    Vibe coding isn&#039;t just for experts — here&#039;s how to describe your way to a working app without writing a single line of code yourself.
  29. Ex-Snowflake engineers say there’s a blind spot in data engineering — so they built Tower to fix it

    Sun, 15 Mar 2026 14:00:36 -0000

    <img width="1024" height="819" src="https://cdn.thenewstack.io/media/2026/03/09d0cb97-erone-stuff-ewo1arbslam-unsplash-1024x819.jpg" class="webfeedsFeaturedVisual wp-post-image wp-stateless-item" alt="Isometric illustration of interconnected servers and monitors displaying data visualizations, network graphs, and image thumbnails, representing cloud data infrastructure and pipeline management." style="display: block; margin: auto; margin-bottom: 20px;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" data-image-size="large" data-stateless-media-bucket="cdn.thenewstack.io" data-stateless-media-name="media/2026/03/09d0cb97-erone-stuff-ewo1arbslam-unsplash-scaled.jpg" /><p>AI coding assistants might have made it easier to generate software, but getting that code to run reliably &#8212; packaging</p> <p>The post <a href="https://thenewstack.io/tower-python-data-pipelines/">Ex-Snowflake engineers say there&#8217;s a blind spot in data engineering — so they built Tower to fix it</a> appeared first on <a href="https://thenewstack.io">The New Stack</a>.</p>
    Tower, a startup founded by ex-Snowflake engineers, raises $6.4M to help teams deploy and run Python data pipelines in production without managing infrastructure.
  30. A practical guide to the 6 categories of AI cloud infrastructure in 2026

    Sun, 15 Mar 2026 12:00:07 -0000

    <img width="1024" height="768" src="https://cdn.thenewstack.io/media/2026/03/ca449f8f-cecilia-miraldi-dbmfru6yuzo-unsplash-1024x768.jpg" class="webfeedsFeaturedVisual wp-post-image wp-stateless-item" alt="Illustration of a green and gold circuit board with a central GPU processor chip, traces, and integrated circuits representing AI cloud computing infrastructure." style="display: block; margin: auto; margin-bottom: 20px;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" data-image-size="large" data-stateless-media-bucket="cdn.thenewstack.io" data-stateless-media-name="media/2026/03/ca449f8f-cecilia-miraldi-dbmfru6yuzo-unsplash-scaled.jpg" /><p>Platform teams and AI engineers are facing an unprecedented wave of decision paralysis. The rollout of NVIDIA’s Blackwell and GB200</p> <p>The post <a href="https://thenewstack.io/ai-cloud-taxonomy-2026/">A practical guide to the 6 categories of AI cloud infrastructure in 2026</a> appeared first on <a href="https://thenewstack.io">The New Stack</a>.</p>
    The 2026 AI cloud market has fragmented into six categories. Here&#039;s a taxonomy and evaluation framework to help platform teams match workloads to providers.
  31. Why AI systems are failing in familiar ways

    Sat, 14 Mar 2026 20:00:53 -0000

    <img width="1024" height="682" src="https://cdn.thenewstack.io/media/2026/03/b2f746e5-ubaid-e-alyafizi-mvs3p5gunbo-unsplash-1024x682.jpg" class="webfeedsFeaturedVisual wp-post-image wp-stateless-item" alt="" style="display: block; margin: auto; margin-bottom: 20px;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" data-image-size="large" data-stateless-media-bucket="cdn.thenewstack.io" data-stateless-media-name="media/2026/03/b2f746e5-ubaid-e-alyafizi-mvs3p5gunbo-unsplash-scaled.jpg" /><p>With the introduction of AI-assisted coding tools and agents, many people hoped we&#8217;d solve all the problems for human teams.</p> <p>The post <a href="https://thenewstack.io/ai-agents-batch-size-gravity/">Why AI systems are failing in familiar ways</a> appeared first on <a href="https://thenewstack.io">The New Stack</a>.</p>
    Multi-agent AI research shows software project failures stem from batch size, not human factors. Why Continuous Delivery matters even when AI writes the code.
  32. Tromjaro is a free-trade Linux distribution with plenty to offer

    Sat, 14 Mar 2026 18:00:28 -0000

    <img width="1024" height="1024" src="https://cdn.thenewstack.io/media/2026/03/252e699f-razvan-vezeteu-kl8rphbr1xs-unsplash-1-1024x1024.jpg" class="webfeedsFeaturedVisual wp-post-image wp-stateless-item" alt="" style="display: block; margin: auto; margin-bottom: 20px;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" data-image-size="large" data-stateless-media-bucket="cdn.thenewstack.io" data-stateless-media-name="media/2026/03/252e699f-razvan-vezeteu-kl8rphbr1xs-unsplash-1.jpg" /><p>Imagine having an OS that won&#8217;t track you, push ads on you, and not force &#8220;free&#8221; trials on you. Sounds</p> <p>The post <a href="https://thenewstack.io/tromjaro-is-a-free-trade-linux-distribution-with-plenty-to-offer/">Tromjaro is a free-trade Linux distribution with plenty to offer</a> appeared first on <a href="https://thenewstack.io">The New Stack</a>.</p>
    Tromjaro packs in privacy tools, six desktop layouts, and a kitchen-sink app selection — but this Manjaro-based distro is best left to users who already know their way around Linux
  33. TypeScript 6.0 RC arrives as a bridge to a faster future

    Sat, 14 Mar 2026 16:00:18 -0000

    <img width="1024" height="1024" src="https://cdn.thenewstack.io/media/2026/03/4263f83c-compagnons-lyna7tjmork-unsplash-1-1024x1024.jpg" class="webfeedsFeaturedVisual wp-post-image wp-stateless-item" alt="" style="display: block; margin: auto; margin-bottom: 20px;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" data-image-size="large" data-stateless-media-bucket="cdn.thenewstack.io" data-stateless-media-name="media/2026/03/4263f83c-compagnons-lyna7tjmork-unsplash-1.jpg" /><p>TypeScript 6.0 Release Candidate (RC) is here, and in some ways, it&#8217;s the most consequential release since the project hit</p> <p>The post <a href="https://thenewstack.io/typescript-6-0-rc-arrives-as-a-bridge-to-a-faster-future/">TypeScript 6.0 RC arrives as a bridge to a faster future</a> appeared first on <a href="https://thenewstack.io">The New Stack</a>.</p>
    The latest release of Microsoft&#039;s popular JavaScript superset clears the decks for a ground-up rewrite — and raises the bar for how developers are expected to write code.
  34. MCP’s biggest growing pains for production use will soon be solved

    Sat, 14 Mar 2026 14:00:57 -0000

    <img width="1024" height="731" src="https://cdn.thenewstack.io/media/2026/03/9702bcf9-tri-wiranto-jqzvz3diuc4-unsplash-1024x731.jpg" class="webfeedsFeaturedVisual wp-post-image wp-stateless-item" alt="Illustration of a person repairing a digital device with exposed components, in muted green, grey and pink tones." style="display: block; margin: auto; margin-bottom: 20px;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" data-image-size="large" data-stateless-media-bucket="cdn.thenewstack.io" data-stateless-media-name="media/2026/03/9702bcf9-tri-wiranto-jqzvz3diuc4-unsplash-scaled.jpg" /><p>The Model Context Protocol (MCP) has emerged as one of the key building blocks of the agentic AI stack, serving</p> <p>The post <a href="https://thenewstack.io/model-context-protocol-roadmap-2026/">MCP&#8217;s biggest growing pains for production use will soon be solved</a> appeared first on <a href="https://thenewstack.io">The New Stack</a>.</p>
    MCP roadmap highlights the priorities maintainers want to solve as the protocol behind AI agents moves into real-world deployments.
  35. AI layoffs are here, the MCP vs API debate, and the rise of the Mac Mini-powered Agent

    Sat, 14 Mar 2026 13:32:07 -0000

    <img width="1024" height="683" src="https://cdn.thenewstack.io/media/2026/03/e3792f2e-img_2202-1024x683.jpg" class="webfeedsFeaturedVisual wp-post-image wp-stateless-item" alt="" style="display: block; margin: auto; margin-bottom: 20px;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" data-image-size="large" data-stateless-media-bucket="cdn.thenewstack.io" data-stateless-media-name="media/2026/03/e3792f2e-img_2202.jpg" /><p>I&#8217;m Matt Burns, Head of Content at Insight Media Group. Each week, I round up the most important AI developments</p> <p>The post <a href="https://thenewstack.io/ai-layoffs-mcp-api-mac-mini-agent/">AI layoffs are here, the MCP vs API debate, and the rise of the Mac Mini-powered Agent</a> appeared first on <a href="https://thenewstack.io">The New Stack</a>.</p>
    Everyone&#039;s building faster, shipping weekly, and quietly hoping nobody notices the humans missing from the org chart.
  36. Andrej Karpathy’s 630-line Python script ran 50 experiments overnight without any human input

    Sat, 14 Mar 2026 12:00:11 -0000

    <img width="960" height="538" src="https://cdn.thenewstack.io/media/2026/03/64d0cd6f-andrej_karpathy_2016.webp.png" class="webfeedsFeaturedVisual wp-post-image wp-stateless-item" alt="" style="display: block; margin: auto; margin-bottom: 20px;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" data-image-size="large" data-stateless-media-bucket="cdn.thenewstack.io" data-stateless-media-name="media/2026/03/64d0cd6f-andrej_karpathy_2016.webp.png" /><p>On the night of March 7, Andrej Karpathy pushed a 630-line Python script to GitHub and went to sleep. By</p> <p>The post <a href="https://thenewstack.io/karpathy-autonomous-experiment-loop/">Andrej Karpathy&#8217;s 630-line Python script ran 50 experiments overnight without any human input</a> appeared first on <a href="https://thenewstack.io">The New Stack</a>.</p>
    Andrej Karpathy&#039;s AutoResearch ran 50 AI experiments overnight on one GPU. The design pattern behind it applies far beyond ML training. Here&#039;s how it works.
  37. NanoClaw and Docker team up to isolate AI agents inside MicroVM sandboxes

    Fri, 13 Mar 2026 19:26:34 -0000

    <img width="1024" height="768" src="https://cdn.thenewstack.io/media/2026/03/fb0dcba1-eva-wahyuni-bw1u-glp6w-unsplash-1024x768.jpg" class="webfeedsFeaturedVisual wp-post-image wp-stateless-item" alt="A locked laptop wrapped in red chains with a padlock, surrounded by floating icons of a password field, credit card, and masked hacker figure, illustrating cybersecurity threats and data protection for AI agent environments." style="display: block; margin: auto; margin-bottom: 20px;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" data-image-size="large" data-stateless-media-bucket="cdn.thenewstack.io" data-stateless-media-name="media/2026/03/fb0dcba1-eva-wahyuni-bw1u-glp6w-unsplash-scaled.jpg" /><p>Like the idea of OpenClaw-style agents, but their insecurity makes you sweat? The combo of NanoClaw and Docker Sandboxes may</p> <p>The post <a href="https://thenewstack.io/nanoclaw-docker-sandboxes-ai-agents/">NanoClaw and Docker team up to isolate AI agents inside MicroVM sandboxes</a> appeared first on <a href="https://thenewstack.io">The New Stack</a>.</p>
    NanoClaw partners with Docker Sandboxes to run AI agents inside isolated MicroVMs, offering a security-focused, open-source alternative to OpenClaw.
  38. F-Droid says Google’s Android developer verification plan is an ‘existential’ threat to alternative app stores

    Fri, 13 Mar 2026 18:33:01 -0000

    <img width="1024" height="768" src="https://cdn.thenewstack.io/media/2026/03/5556a9f3-rizki-ardia-259z_l4k5em-unsplash-1-1024x768.jpg" class="webfeedsFeaturedVisual wp-post-image wp-stateless-item" alt="Illustration of a giant hand pressing down on a small developer working at a laptop, symbolizing corporate power exerting control over individual software developers." style="display: block; margin: auto; margin-bottom: 20px;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" data-image-size="large" data-stateless-media-bucket="cdn.thenewstack.io" data-stateless-media-name="media/2026/03/5556a9f3-rizki-ardia-259z_l4k5em-unsplash-1-scaled.jpg" /><p>Attention, any developers hoping to sell their apps to the world&#8217;s 3.3 billion Android phones. &#8220;Google is changing the way</p> <p>The post <a href="https://thenewstack.io/f-droid-says-googles-android-developer-verification-plan-is-an-existential-threat-to-alternative-app-stores/">F-Droid says Google&#8217;s Android developer verification plan is an &#8216;existential&#8217; threat to alternative app stores</a> appeared first on <a href="https://thenewstack.io">The New Stack</a>.</p>
    F-Droid fights Google&#039;s Android developer verification plan that could kill alternative app stores used by millions. Here&#039;s what developers need to know.
  39. The “files are all you need” debate misses what’s actually happening in agent memory architecture

    Fri, 13 Mar 2026 12:00:28 -0000

    <img width="1024" height="819" src="https://cdn.thenewstack.io/media/2026/03/77b01507-barsrsind-kd0e2nzpmdu-unsplash-1024x819.jpg" class="webfeedsFeaturedVisual wp-post-image wp-stateless-item" alt="Abstract illustration of a hand holding a cross-section of gears and network nodes, representing the complex database storage underlying simple AI agent interfaces." style="display: block; margin: auto; margin-bottom: 20px;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" data-image-size="large" data-stateless-media-bucket="cdn.thenewstack.io" data-stateless-media-name="media/2026/03/77b01507-barsrsind-kd0e2nzpmdu-unsplash-scaled.jpg" /><p>When you look at how top engineering teams actually build agent memory systems, a pattern emerges: There is a filesystem</p> <p>The post <a href="https://thenewstack.io/ai-agent-memory-architecture/">The “files are all you need” debate misses what&#8217;s actually happening in agent memory architecture</a> appeared first on <a href="https://thenewstack.io">The New Stack</a>.</p>
    The AI memory debate is flawed. Discover why top teams decouple filesystem interfaces from database storage.
  40. Before you let AI agents loose, you’d better know what they’re capable of

    Thu, 12 Mar 2026 20:22:11 -0000

    <img width="1024" height="819" src="https://cdn.thenewstack.io/media/2026/03/a0575582-barsrsind-lj_fxmkdchy-unsplash-1024x819.jpg" class="webfeedsFeaturedVisual wp-post-image wp-stateless-item" alt="A stylized illustration of a hand pressing a central node within a complex, ripple-patterned network of interconnected dots and lines, representing the cascading impact of autonomous AI agents on enterprise API infrastructure and the need for system observability." style="display: block; margin: auto; margin-bottom: 20px;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" data-image-size="large" data-stateless-media-bucket="cdn.thenewstack.io" data-stateless-media-name="media/2026/03/a0575582-barsrsind-lj_fxmkdchy-unsplash-scaled.jpg" /><p>For enterprises, agentic AI systems potentially allow staff responsibilities to shift from execution to judgment, oversight, and strategy. This creates</p> <p>The post <a href="https://thenewstack.io/risk-mitigation-agentic-ai/">Before you let AI agents loose, you’d better know what they’re capable of</a> appeared first on <a href="https://thenewstack.io">The New Stack</a>.</p>
    Learn how to manage agentic AI risk through systems thinking, contract testing, and sandboxes to ensure autonomous systems stay predictable.
  41. Google will soon bring Chrome to ARM64 Linux

    Thu, 12 Mar 2026 20:00:43 -0000

    <img width="1024" height="683" src="https://cdn.thenewstack.io/media/2026/03/6a2bd4f6-zulfugar-karimov-gyfe_r7kdzu-unsplash-1024x683.jpg" class="webfeedsFeaturedVisual wp-post-image wp-stateless-item" alt="" style="display: block; margin: auto; margin-bottom: 20px;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" data-image-size="large" data-stateless-media-bucket="cdn.thenewstack.io" data-stateless-media-name="media/2026/03/6a2bd4f6-zulfugar-karimov-gyfe_r7kdzu-unsplash-scaled.jpg" /><p>Google on Thursday announced that it will finally launch Chrome for ARM64 Linux devices in the second quarter of 2026.</p> <p>The post <a href="https://thenewstack.io/google-will-soon-bring-chrome-to-arm64-linux/">Google will soon bring Chrome to ARM64 Linux</a> appeared first on <a href="https://thenewstack.io">The New Stack</a>.</p>
    Firefox got there first, Chromium was already available, but Chrome&#039;s proprietary features were missing on ARM64 Linux.
  42. SurePath AI advances MCP policy controls to tighten the cable on AI’s USB-C

    Thu, 12 Mar 2026 19:54:57 -0000

    <img width="1024" height="724" src="https://cdn.thenewstack.io/media/2026/03/32635f75-milhad-wixbzw9_cl8-unsplash-1024x724.jpg" class="webfeedsFeaturedVisual wp-post-image wp-stateless-item" alt="" style="display: block; margin: auto; margin-bottom: 20px;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" data-image-size="large" data-stateless-media-bucket="cdn.thenewstack.io" data-stateless-media-name="media/2026/03/32635f75-milhad-wixbzw9_cl8-unsplash-scaled.jpg" /><p>AI needs governance. Amid the exponential growth of predictive, generative, and agentic artificial intelligence, humans everywhere have repeatedly asked, &#8220;Is</p> <p>The post <a href="https://thenewstack.io/surepath-ai-mcp-policy-controls/">SurePath AI advances MCP policy controls to tighten the cable on AI&#8217;s USB-C</a> appeared first on <a href="https://thenewstack.io">The New Stack</a>.</p>
    SurePath AI launches MCP Policy Controls to help enterprises govern Model Context Protocol servers, block risky tools, and prevent AI-driven data exfiltration in real time.
  43. New Perplexity APIs give developers access to agentic workflows and orchestration

    Thu, 12 Mar 2026 19:22:06 -0000

    <img width="1024" height="590" src="https://cdn.thenewstack.io/media/2026/03/ddaa934c-perplexity-computer-1024x590.png" class="webfeedsFeaturedVisual wp-post-image wp-stateless-item" alt="Perplexity Computer bubble" style="display: block; margin: auto; margin-bottom: 20px;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" data-image-size="large" data-stateless-media-bucket="cdn.thenewstack.io" data-stateless-media-name="media/2026/03/ddaa934c-perplexity-computer-scaled.png" /><p>On the heels of last month&#8217;s Perplexity Computer launch, the company on Thursday announced an expansion of the Perplexity API</p> <p>The post <a href="https://thenewstack.io/perplexity-agent-api/">New Perplexity APIs give developers access to agentic workflows and orchestration</a> appeared first on <a href="https://thenewstack.io">The New Stack</a>.</p>
    Embeddings API, Agent API, and Sandbox API expose the orchestration, search, and execution tools behind Perplexity Computer.
  44. Anthropic’s Claude can now draw interactive charts and diagrams

    Thu, 12 Mar 2026 18:00:37 -0000

    <img width="1024" height="580" src="https://cdn.thenewstack.io/media/2025/10/677abf78-illustration-1024x580.png" class="webfeedsFeaturedVisual wp-post-image wp-stateless-item" alt="" style="display: block; margin: auto; margin-bottom: 20px;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" data-image-size="large" data-stateless-media-bucket="cdn.thenewstack.io" data-stateless-media-name="media/2025/10/677abf78-illustration.png" /><p>Anthropic&#8217;s Claude has always been great at coding and working with text, but where Google and OpenAI invested heavily in</p> <p>The post <a href="https://thenewstack.io/anthropics-claude-interactive-visualizations/">Anthropic&#8217;s Claude can now draw interactive charts and diagrams</a> appeared first on <a href="https://thenewstack.io">The New Stack</a>.</p>
    The new feature lets Claude generate interactive charts, diagrams, and visualizations inline during a conversation.
  45. Why AI-driven operations are pushing governance beyond a compliance issue and into an operational priority

    Thu, 12 Mar 2026 16:21:37 -0000

    <img width="1024" height="853" src="https://cdn.thenewstack.io/media/2026/03/1a20eb34-prakasit-khuansuwan-vnc-put715s-unsplash-1024x853.jpg" class="webfeedsFeaturedVisual wp-post-image wp-stateless-item" alt="Pastel vector illustration of a high-speed bullet train on an elevated track moving toward a city skyline, representing rapid AI adoption and structural guardrails." style="display: block; margin: auto; margin-bottom: 20px;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" data-image-size="large" data-stateless-media-bucket="cdn.thenewstack.io" data-stateless-media-name="media/2026/03/1a20eb34-prakasit-khuansuwan-vnc-put715s-unsplash-scaled.jpg" /><p>Board members and senior executives are pushing hard to accelerate AI adoption. As a result, significant numbers of organizations have</p> <p>The post <a href="https://thenewstack.io/five-pillars-ai-governance/">Why AI-driven operations are pushing governance beyond a compliance issue and into an operational priority</a> appeared first on <a href="https://thenewstack.io">The New Stack</a>.</p>
    As organizations adopt AI agents, governance must shift from compliance to an operational priority. Explore 5 pillars of AI governance.
  46. Runpod report: Qwen has overtaken Meta’s Llama as the most-deployed self-hosted LLM

    Thu, 12 Mar 2026 13:00:57 -0000

    <img width="1024" height="601" src="https://cdn.thenewstack.io/media/2026/03/bfdbbc98-getty-images-2e66tpvgzn8-unsplash-1024x601.jpg" class="webfeedsFeaturedVisual wp-post-image wp-stateless-item" alt="Digital illustration on a dark navy background depicting a stylized blue robotic hand with segmented fingers pressing down on a backlit keyboard, with glowing pink and white keys highlighted and dotted arc lines suggesting motion or data signals emanating from the keystroke." style="display: block; margin: auto; margin-bottom: 20px;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" data-image-size="large" data-stateless-media-bucket="cdn.thenewstack.io" data-stateless-media-name="media/2026/03/bfdbbc98-getty-images-2e66tpvgzn8-unsplash-scaled.jpg" /><p>The rise of agentic AI services has enabled the enterprise technology market to blossom with a new, fully evolved set</p> <p>The post <a href="https://thenewstack.io/runpod-ai-infrastructure-reality/">Runpod report: Qwen has overtaken Meta&#8217;s Llama as the most-deployed self-hosted LLM</a> appeared first on <a href="https://thenewstack.io">The New Stack</a>.</p>
    Runpod&#039;s State of AI report reveals Qwen has overtaken Llama as the most-deployed LLM, while GPU logs show AI optimization outpaces raw content creation.
  47. Product Manager

    Mon, 02 Mar 2026 17:00:00 -0000

    <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">Managing planned maintenance is critical for ensuring business continuity and application performance. However, as your usage of cloud services grows, staying on top of maintenance schedules can be complex and time-consuming. Current approaches often result in inconsistent notifications and varying levels of control across different products. To help you avoid missed maintenance windows and disruptions, we are announcing the General Availability (GA) of Unified Maintenance, a centralized dashboard that lets you view and manage maintenance events across your Google Cloud services.</span></p> <p><span style="vertical-align: baseline;">Unified Maintenance consolidates maintenance updates into a single view, making it easier to track upcoming events. With Unified Maintenance, you can:</span></p> <ul> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">View planned maintenance:</strong><span style="vertical-align: baseline;"> See events for services like Compute Engine, Google Kubernetes Engine (GKE), Cloud SQL, Memorystore, AlloyDB, and Looker in one dashboard (see </span><a href="https://docs.cloud.google.com/unified-maintenance/docs/supported-services"><span style="text-decoration: underline; vertical-align: baseline;">supported services</span></a><span style="vertical-align: baseline;">).</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">Get standardized alerts:</strong><span style="vertical-align: baseline;"> Receive consistent maintenance information through Cloud Logging, which allows you to set up alerts and integrate them with your existing monitoring or ticketing systems.</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">Understand your options:</strong><span style="vertical-align: baseline;"> The </span><a href="https://console.cloud.google.com/cloud-hub/maintenance" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">dashboard</span></a><span style="vertical-align: baseline;"> clearly indicates which maintenance events offer user controls.</span></p> </li> </ul> <h3><span style="vertical-align: baseline;">What’s next</span></h3> <p><span style="vertical-align: baseline;">We are working to add support for more Google Cloud services and enhance the platform's capabilities. Our roadmap includes expanded scopes for folders and organizations, as well as application-level visibility.</span></p> <p><span style="vertical-align: baseline;">You can access the Unified Maintenance dashboard directly in the Google Cloud console to view upcoming events for your subscribed services. To learn more about how to use these new features, read the </span><a href="https://docs.cloud.google.com/unified-maintenance/docs/overview"><span style="text-decoration: underline; vertical-align: baseline;">documentation</span></a><span style="vertical-align: baseline;"> and the </span><a href="https://docs.cloud.google.com/unified-maintenance/docs/set-up-unified-maintenance"><span style="text-decoration: underline; vertical-align: baseline;">Get started guide</span></a><span style="vertical-align: baseline;">.</span></p></div>
  48. Principal Platform Engineer, John Lewis Partnership

    Wed, 04 Feb 2026 18:00:00 -0000

    <div class="block-paragraph_advanced"><p><span style="font-style: italic; vertical-align: baseline;">For any organization that has invested in an internal developer platform, a question inevitably arises: Is it actually working? </span></p> <p><span style="font-style: italic; vertical-align: baseline;">Simply tracking adoption rates won't tell you if your platform is truly delivering value to your developers. This was the challenge faced by John Lewis, a major UK retailer. In our previous articles (parts </span><a href="https://cloud.google.com/blog/products/application-development/simplifying-platform-engineering-at-john-lewis-part-one"><span style="font-style: italic; text-decoration: underline; vertical-align: baseline;">1</span></a><span style="font-style: italic; vertical-align: baseline;"> and </span><a href="https://cloud.google.com/blog/products/application-development/simplifying-platform-engineering-at-john-lewis-part-two"><span style="font-style: italic; text-decoration: underline; vertical-align: baseline;">2</span></a><span style="font-style: italic; vertical-align: baseline;">) we introduced the John Lewis Digital Platform (JLDP) and how it enabled dozens of product teams to build high-quality software rapidly to power www.johnlewis.com and other critical applications. But how did they know that the platform was actually successful? Traditional product metrics like revenue and sales don’t translate easily to this world. When you focus only on whether your tenants use the platform, you don’t understand whether it’s bringing them value.</span></p> <p><span style="font-style: italic; vertical-align: baseline;">In this article, Alex Moss from the John Lewis platform team discusses how they moved beyond simple usage metrics to develop a sophisticated, multi-stage approach to measuring the real value of their platform — a journey that took them from lead-time metrics, to </span><a href="https://dora.dev/" rel="noopener" target="_blank"><span style="font-style: italic; text-decoration: underline; vertical-align: baseline;">DORA</span></a><span style="font-style: italic; vertical-align: baseline;">, and finally to a "Technical Health" score. Along the way, they explore how the JLDP’s purpose evolved — and its value along with it. - Darren Evans</span></p> <h3><strong style="vertical-align: baseline;">Initial measurement: A focus on platform value</strong></h3> <p><span style="vertical-align: baseline;">In the early days of the platform, understanding its value was actually much easier. This was because the platform was created with a very clear purpose: to enable speed of change. The John Lewis business wanted to create multiple product teams working on several features of johnlewis.com in parallel, and to put those features in front of customers quickly for feedback.</span></p> <p><span style="vertical-align: baseline;">Its origins in the world of the company’s John Lewis Digital online business resulted in it being treated as a product from a very early stage, and therefore integrated with that area’s reporting mechanisms too. Thus, it became normal to link the platform objectives to the online business’s broader goals each quarter and report on measurable key results. This kept the focus on the reasons the platform is important: do improvements to the platform continue to justify using it over seeking out a different one? We cannot afford to rest on our laurels!</span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/1_aSY3nPB.max-1000x1000.png" alt="1"> </a> <figcaption class="article-image__caption "><p data-block-key="nnhmb">The six annual measures reported against every quarter. The specific measures have varied over the years.</p></figcaption> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">In addition to this, in the first few years of the platform’s existence, there were three simple metrics that best indicated how the platform was living up to the rationale for creating it:</span></p> <ol> <li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">Service Creation Lead Time:</strong><span style="vertical-align: baseline;"> How long it took to create a tenancy (the space in which a product team was creating their software)</span></p> </li> <li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">Onboarding Lead Time:</strong><span style="vertical-align: baseline;"> How long it took that product team to deploy something into production</span></p> </li> <li><strong style="vertical-align: baseline;">First Customer Lead Time:</strong><span style="vertical-align: baseline;"> How long it took that product team to designate their service as “live to customers”</span></li> </ol></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/2_DVTZRKS.max-1000x1000.png" alt="2"> </a> <figcaption class="article-image__caption "><p data-block-key="nnhmb">Some screenshots from the early version of the platform's self-written service catalogue, tracking the three metrics mentioned</p></figcaption> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">This was then combined with the number of tenants present on the platform into a report, which was displayed as part of an initial home-grown Service Catalogue shown above (which was later </span><a href="https://medium.com/john-lewis-software-engineering/weve-gone-backstage-this-is-how-we-use-it-on-our-digital-platform-b299cd4acb24" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">replaced with Backstage</span></a><span style="vertical-align: baseline;">). This report served two purposes:</span></p> <ol> <li aria-level="1" style="list-style-type: lower-alpha; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">A very clear visualization for stakeholders of how much their platform was being adopted, and how fast they were able to get up and running (in particular, “Service Creation” being measured in single-digit hours, in comparison to the weeks teams would traditionally have had to wait). This is important, because in the early days of your product, you need to justify its continued growth and investment.</span></p> </li> <li aria-level="1" style="list-style-type: lower-alpha; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">A useful way for the platform team themselves (and stakeholders) to see which teams were taking their time about getting something into production. Is my product actually helping you? And if not, what more could we be doing?</span></p> </li> </ol> <p><span style="vertical-align: baseline;">Using this as a conversation-starter with our tenants opened doors to rich sources of feedback that could be turned into platform features: When we asked tenants “What’s stopping you from going live?”, they often answered that the product they were building was simply complex. But we also often saw that our own processes were getting in the way. This was important, as we could then do something about it.</span></p> <p><span style="vertical-align: baseline;">The easiest of these barriers for us to overcome were typically technology-related. In </span><a href="https://cloud.google.com/blog/products/application-development/simplifying-platform-engineering-at-john-lewis-part-one"><span style="text-decoration: underline; vertical-align: baseline;">previous articles</span></a><span style="vertical-align: baseline;">, we covered two examples, “My team is spending a lot of time writing Terraform to provision PubSub,” and “we’re having trouble learning how to use Kubernetes.” To help, the platform team created “paved roads” to enable self-service provisioning or simplification of Kubernetes, significantly reducing these burdens for teams.</span></p> <p><span style="vertical-align: baseline;">The more significant opportunities to streamline getting new services live were a result of our processes (e.g., security approvals) — and if your platform is empowered to simplify these sorts of organizational functions, then the gains can be extremely beneficial. One such example was the Information Security risk assurance process. Gaining the necessary security sign-offs and producing the required documentation was a necessary but time-consuming task, and - with the rate of change in the business - this was often something that many teams were going through in parallel. Our platform team successfully negotiated a simplified process for its tenants. It was able to do this because, by being resident on the platform, they could guarantee that security controls were in place and that policies were being followed. This was a direct result of the platform building features to meet those needs, and being able to provide evidence that they were being used — removing the need for the tenant team to either document or invent this themselves. This is still simplifying the developer experience through platform engineering, even though the solution is a less technically-based one.</span></p> <p><span style="vertical-align: baseline;">Sometimes the conversation resulted in feedback that wasn’t even platform-shaped — for example, helping teams understand concepts like feature flagging and dark launching, or software design options to help break dependencies with legacy systems. John Lewis’ platform teams are staffed with experienced engineers, ideally ones with software development experience, which helps a lot with these sorts of interactions.</span></p> <p><span style="vertical-align: baseline;">A key point here is that by measuring how effectively teams were making it into production, we could identify who to talk to and elucidate the feedback we needed on what problems needed to be addressed. Simply relying on your tenants thinking of this themselves when they don’t see the bigger picture (or have other priorities) is not nearly as effective.</span></p> <p><span style="vertical-align: baseline;">We then combined the process with more traditional approaches such as sending out a survey or use of Net Promoter Scoring to help build popularity in the product. The results of these were usually very positive, and could be used to generate mindshare — especially where a product team was comfortable talking about their positive experiences in internal tech conferences and the like.</span></p> <h3><strong style="vertical-align: baseline;">Helping understand team performance</strong></h3> <p><span style="vertical-align: baseline;">A few years into the life of the platform, our emphasis started to shift. There was less of a need to prove the value of the platform — the business and our engineers were happy — so we shifted from “how can we get you into production as quickly as possible” towards “how can we enable you to continue to be as fast, but also reduce friction, in your day-to-day activities.” This led us towards DORA metrics.</span></p> <p><span style="vertical-align: baseline;">Our initial DORA implementations involved mining information from our systems of record for change and incident, complimented by our already-mature observability stack for availability data, as well as pulling events from things like cloud audit logs. We built software to do this and stored it in BigQuery, which enabled us to visualize the data in our home-grown Service Catalogue tool. Later, we moved this into Grafana dashboards instead, which are still in use today:</span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/3_N8Q4Xha.max-1000x1000.png" alt="3"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">Looking for patterns in this data led to us discovering additional features that would be useful for us to build. Two major examples of this were </span><span style="font-style: italic; vertical-align: baseline;">handling change</span><span style="vertical-align: baseline;">, and </span><span style="font-style: italic; vertical-align: baseline;">operational readiness</span><span style="vertical-align: baseline;">.</span></p> <p><span style="vertical-align: baseline;">JLP’s service management processes were geared towards handling complex release processes across multiple large systems and/or teams - but we had fundamentally changed our architecture by adopting microservices. This empowered teams to release independently at will, and therefore manage the consequences of failed changes themselves. We used the data we’d collected about change failure rates and frequency of small releases to justify a different approach: allowing tenants to automatically raise and close changes as part of their CI/CD pipelines. After clearing this approach with our Service Management team, we developed a CLI tool that teams could use within their pipelines. This had the additional benefit of allowing us to capture useful data at point of release, rather than scraping more awkward data sources. The automated change “carrot” was very popular and was widely adopted, shifting the approval point left to the pull request rather than later in the release process. This reduced time wastage, change-set size and risk of collisions.</span></p> <p><span style="vertical-align: baseline;">In a similar vein, with more teams operating their own services, the need for a central site-wide operations team was reduced. We could see from our metrics that teams practicing “You Build It, You Run It” had fewer incidents and were resolving them much more quickly. We used this as evidence to bring in tooling to help them respond to incidents faster, and decouple the centralized ops teams from those processes — in some cases allowing them to focus on legacy systems, and in others, removing the need for the service entirely (which resulted in significant cost savings, despite the fact that we had more individual product teams on-call). This, and supporting observability and alerting tooling, was all configured through the platform’s paved-road pipeline described in our </span><a href="https://cloud.google.com/blog/products/application-development/simplifying-platform-engineering-at-john-lewis-part-one"><span style="text-decoration: underline; vertical-align: baseline;">previous article</span></a><span style="vertical-align: baseline;">.</span></p> <p><span style="vertical-align: baseline;">The DORA metrics helped us architecturally as well. Operational data shined a light on the brittleness of third-party and legacy services, thereby driving greater investment into resilience engineering, alternative solutions, and in some cases, causing us to re-evaluate our build vs. buy decisions. </span></p> <h3><strong style="vertical-align: baseline;">Choosing what to measure</strong></h3> <p><span style="vertical-align: baseline;">It’s very important to choose wisely about what to measure. Experts in the field (such as </span><a href="https://www.youtube.com/watch?v=trO_fiTAZeM" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">Laura Tacho</span></a><span style="vertical-align: baseline;">) influenced us to avoid vanity metrics and to be cautious about interpreting the ones we do collect. It’s also important for metrics to be meaningful to the target audience, and presented accordingly.</span></p> <p><span style="vertical-align: baseline;">As an example, we communicate about cost and vulnerability with our teams, but the form this takes depends on the intended audience’s role. For example, we send new vulnerabilities or spikes in cost directly to product teams’ collaboration channels, because experience has taught us that having our engineers see these vulnerabilities results in a faster response. On the other hand, for compliance reporting or review by team leads, reports are more effective at summarising the areas that need action. Because if we know one thing, it’s that nobody wants to be a leader of the “vulnerabilities outside of policy” dashboard!</span></p> <p><span style="vertical-align: baseline;">It was not unusual for us to historically look at measures such as the number or frequency of incidents. But in a world of highly automated response systems, this is a trap, as alerts can be easily duplicated. Focusing too much on a number can drive the wrong behavior — at worst, deliberately avoiding creating an incident at all! Instead, it’s much better to focus on the impact of the parent incident and how long it took to recover. Another example is reporting on the number of vulnerabilities. Imagine you have a package that is used extensively across many components in a distributed system. Disclosing that the package has a vulnerability can create a false sense of scale, when in fact patching the base image deals with the problem swiftly. Instead, it’s better to look at the speed of response than a pre-agreed policy based on severity. This is both a much more effective and reasonable metric for teams to act on, so we see better engagement.</span></p> <p><span style="vertical-align: baseline;">It’s very important that you put across as much context as possible when presenting the data so that the right conclusions can be drawn — especially where those reports are seen by decision-makers. With that in mind, we combined raw metrics we could visualize with user opinion about them. This helped to bring that missing context: Is the team that’s suffering from a high change failure rate also struggling with its release processes and batch size? Is the team that’s not addressing vulnerabilities quickly also reporting that they’re spending too much time on feature development and not enough on operational matters? We reached for a different tool — </span><a href="https://getdx.com/" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">DX</span></a><span style="vertical-align: baseline;"> — to help us bring this sort of information to bear. In our </span><a href="https://cloud.google.com/blog/products/application-development/how-john-lewis-partnership-chose-its-monitoring-metrics"><span style="text-decoration: underline; vertical-align: baseline;">follow-up article</span></a><span style="vertical-align: baseline;">, we’ll elaborate on how we did this and how it prompted us to expand the data we collected about our tenants. Stay tuned!</span></p> <p><span style="font-style: italic; vertical-align: baseline;">To learn more about shifting down with platform engineering on Google Cloud, start </span><a href="https://cloud.google.com/solutions/platform-engineering"><span style="font-style: italic; text-decoration: underline; vertical-align: baseline;">here</span></a><span style="font-style: italic; vertical-align: baseline;">.</span></p></div>
  49. Principal Platform Engineer, John Lewis Partnership

    Wed, 04 Feb 2026 18:00:00 -0000

    <div class="block-paragraph_advanced"><p><span style="font-style: italic; vertical-align: baseline;">In </span><a href="https://cloud.google.com/blog/products/application-development/at-john-lewis-partnership-measuring-developer-platform-value"><span style="font-style: italic; text-decoration: underline; vertical-align: baseline;">part one</span></a><span style="font-style: italic; vertical-align: baseline;"> of this article, Alex Moss from the John Lewis Partnership covered the metrics that they use to measure the value of their developer platform. Now, let's talk about a crucial aspect of any measurement strategy: choosing the right things to measure. It's easy to get lost in a sea of data or to focus on metrics that look impressive, but don't actually reflect the health of your platform or the experience of your developers. Here, Alex shares the John Lewis philosophy on how to choose meaningful metrics and present them in a way that drives the right conversations and actions, ensuring that the data is always presented with as much context as possible. - Darren Evans</span></p> <p><span style="vertical-align: baseline;">While the solution we detailed in the first half of this article worked very well, relying solely on objective measures comes with a number of traps. They are very easy to misinterpret: either wasting time (“the team is working on another product at the moment”) or not telling the right story (“the incident wasn’t closed properly”). This leads to a scaling challenge: Chatting with a small number of teams to understand a situation is one thing. But when you are only one small team trying to build a product, and you need to talk across several dozen teams, it’s not so easy.</span></p> <h3><strong style="vertical-align: baseline;">Collecting engineers’ subjective feedback</strong></h3> <p><span style="vertical-align: baseline;">We needed a way to collate more subjective feedback, ideally in a form that we could visualize and contrast to the objective DORA and other service metrics we held.</span></p> <p><span style="vertical-align: baseline;">Our initial attempt at this involved creating Service Operability Assessments — questionnaires that tenants fill in every quarter. Service Operability Assessments are intended to hold a series of thought-provoking questions aimed at whether the team is following good practices for running their service. This worked well with an experienced facilitator (usually a senior platform engineer) who could ask further probing questions and pull out the key feedback and actions. But as you might imagine, this suffered from scaling challenges. We eventually let this be handled entirely self-service — an imperfect system, since many teams are quite happy to just copy/paste their answers from the previous quarter, which may or may not reflect reality!</span></p> <p><span style="vertical-align: baseline;">We then learned about a tool called </span><a href="https://getdx.com/" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">the DX platform</span></a><span style="vertical-align: baseline;">, which significantly changed how we approached this, and which is now used across our entire Engineering community. It works by surveying individual engineers (rather than teams) for a few minutes every three months. The questions are curated based on DX’s research, backed by the founders of DORA and other similar frameworks. We’ve found it very helpful to be able to slice the results in different ways, including looking at areas across whole platforms or deep-diving on particular teams. The latter, in combination with our DORA data, makes for rich conversations. For example, in the DX tool, a team which recently suffered through some highly impactful incidents might also have registered concerns on “Production Debugging,” while another team that saw a marked drop in release frequency flagged worries around “Change Confidence” or “Ease of Release.” The platforms team can at this point step in to offer advice or potentially implement new features to help with the issues the teams are seeing.</span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/1_J4WNCsj.max-1000x1000.png" alt="1"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">The pre-built drivers and reports in DX are tremendously useful, but we also augment it with our own custom queries to help us understand areas of current focus. For example, we measure Customer Satisfaction (CSAT) for the platform and its portal (Backstage), and collect data on how long it takes for a newcomer to begin submitting pull requests and ask them about how they found the onboarding process. We also recently started assessing engineers’ opinions on the effectiveness of AI coding assistants to help justify further investment in them (instead of just relying on market insight).</span></p> <p><span style="vertical-align: baseline;">An example of where this helped focus our efforts was with documentation, namely, building capabilities into our Backstage developer portal to make it easier for teams to view each others’ docs through pipelines that automatically publish content and make it discoverable.</span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/2_gf9lDAw.max-1000x1000.png" alt="2"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><h3><strong style="vertical-align: baseline;">Service health - Feature adoption &amp; beyond</strong></h3> <p><span style="vertical-align: baseline;">Outside of the insights we generate from the likes of DORA and DX, we’ve recently begun questioning not only whether the platform itself is valuable, but whether tenants are </span><span style="font-style: italic; vertical-align: baseline;">getting the value they should</span><span style="vertical-align: baseline;"> from it. In other words, we’ve effectively started to measure platform feature adoption.</span></p> <p><span style="vertical-align: baseline;">To do this, we built out what we refer to internally as our Technical Health feature. It takes the form of a custom plugin that integrates with our Backstage Developer Portal, which then queries an in-house API that surfaces data fed from a large number of small jobs that collect information on the things we want to measure. These jobs are independently releasable themselves, which allowed us to scale this up pretty quickly. </span></p> <p><span style="vertical-align: baseline;">We currently capture four categories of health measures:</span></p> <ol> <li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">Technical health: </strong><span style="vertical-align: baseline;">We currently have 17 “technical” measures. Examples here include measuring whether teams are using our paved road pipeline and custom Microservice CRD (see previous articles </span><a href="https://cloud.google.com/blog/products/application-development/simplifying-platform-engineering-at-john-lewis-part-one"><span style="text-decoration: underline; vertical-align: baseline;">1</span></a><span style="vertical-align: baseline;"> and </span><a href="https://cloud.google.com/blog/products/application-development/simplifying-platform-engineering-at-john-lewis-part-two"><span style="text-decoration: underline; vertical-align: baseline;">2</span></a><span style="vertical-align: baseline;">) rather than “terraforming” their own resources, following our recommended Kubernetes practices (such as resource sizing, disruption budgets and lifecycle probes), keeping base images up to date, and the like. We also include some “softer” technical measures such as whether they are running pipelines frequently enough to pick up changes (we don’t run this for teams), reviewing their operability assessments, staying on top of git branches, and so on.</span></p> </li> <li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">Operational readiness:</strong><span style="vertical-align: baseline;"> Then, there are 18 measures relating to operational health — things like whether a pre-flight configuration is in place, whether runbooks are written, docs have been published, and so on. This is an evolution of an Operational Readiness checklist from several years ago (back when we used to have separate Delivery and Operations teams, and therefore these sorts of checks were mandatory for “handover”). We tailored this checklist to the specific features of the platform that help teams achieve good operability, rather than being a generic list. This also serves to help our Service Management team feel confident that the right practices are being followed, thereby eliminating a point of friction when carrying out manual reviews.</span></p> </li> <li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">Migrations: </strong><span style="vertical-align: baseline;">From time to time, the Platform requires tenants to carry out work to keep up with changes to the platform itself. A classic example of this is getting teams to deal with deprecated Kubernetes API versions. This also includes adoption of different features that we want to drive more forcefully in order to remove the older way of doing things (say for example, in favour of something more secure). We found that as the Platform grew, we had a long tail of migration work that we needed teams to perform, providing an easy way for Product Managers and Delivery Leads to prioritize their teams’ workloads.</span></p> </li> <li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">Broader engineering practices: </strong><span style="vertical-align: baseline;">We recently opened up the feature to allow other teams to contribute — in this case, our Engineering leadership — to build in their own measures, such as whether teams are keeping up to date with versions of our design system or whether they’re following broader engineering practices that extend beyond just the JL Digital Platform. </span></p> </li> </ol> <p><span style="vertical-align: baseline;">We present this data through aggregated views (like the example shown below), as well as individual tasks and broader leaderboards — all designed to catch the eye of those with influence over a team’s priorities. We’ve found that the desire for an engineer to turn a traffic-light green can be a powerful motivator — far more effective than relying on documentation or announcements.</span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/3_paqGoLi.max-1000x1000.png" alt="3"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">This technology works through custom plugins that we’ve built for the Backstage Portal. Each “health check” is itself its own microservice (often running as a job) which interrogates the appropriate system to determine whether the measure is met. For example, one microservice checks that a PodDisruptionBudget has been created by querying Kubernetes directly, while another that looks at whether distroless base images are in use, does so by inspecting container image layers. There’s a template for creating new metrics, which makes it easy for engineers to create new ones — including those outside the platform team themselves. The results are stored in BigQuery, with an API to make Backstage plugin development simpler.</span></p> <p><span style="vertical-align: baseline;">A reality of introducing measures like this is that it drives more work into the product teams. It is important that your culture be ready for this. If we had implemented these measures very early in the platform’s life, this would likely have affected how the product was perceived — perhaps as very strict or inhibiting the pace of change with guardrails. This can negatively impact overall adoption. By introducing these later on, we benefited from many tenants who already saw the platform as very valuable, as well as the confidence that we had selected the right measures and could apply them consistently. That said, we did still see a small drop in CSAT for the platform after we started doing this. We try to be considerate about the pace that we launch each measure to give product teams the time to absorb the work, as well as provide a means for teams to suppress the indicators that aren’t relevant to them. For example, a tenant might deliberately choose not to use pod autoscaling for performance reasons, or have a functional reason why they can’t use our Microservice CRD.</span></p> <p><span style="vertical-align: baseline;">The introduction of these sorts of assurance measures on tenant behaviour is a reflection of the maturity of the platform. In the early days, we relied on highly skilled teams to do the right thing whilst going fast. But as time has passed, we’ve witnessed a variety of skills and capabilities, combined with shifts in ownership of services, that pushed us to introduce techniques to drive the right outcomes. This is also due to the platform itself becoming complex — the cognitive load for a new team is much higher than it was, due to all its new features. We needed to put some lights along the edges of our paved road to help teams stay on it!</span></p> <p><span style="vertical-align: baseline;">Throughout this evolution, we’ve continued to report on our key results for the business themselves: Are we still doing what they want of us? This has naturally shifted from “go fast, enable teams” (which we largely see as a solved problem, to be honest) towards “do it safely, and manage your technical debt.”</span></p> <h3><strong style="vertical-align: baseline;">Are you being served? Key takeaways</strong></h3> <p><span style="vertical-align: baseline;">Long story short, the question of whether a developer platform has value is complex, and can be answered in many ways. As you embark on building out — and quantifying — your own developer platform, here are a few concluding thoughts to keep in mind:  </span></p> <ol> <li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">Measurement is a journey, not a destination:</strong><span style="vertical-align: baseline;"> Start by measuring something meaningful to your stakeholders, but be prepared to adapt as your platform evolves. In the beginning, it’s okay to prioritize further investment in your product, but it’s better to actually measure how the platform is enabling your teams. The things that mattered when you were initially proving out the platform’s viability are unlikely to be what are important several years later when your features are more mature and your priorities have shifted.</span></p> </li> <li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">Listen to the humans: </strong><span style="vertical-align: baseline;">Don’t assume that just because your platform is being used, that it is providing value. The most powerful metrics are often qualitative; engineers wanting to use your tool and CSAT are strong signals, but asking them questions about how they are using it is a better way to gain insight into how you can improve it. It is hard to figure out what’s working (and what isn’t) through measurement alone.</span></p> </li> <li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">Data is for enabling, not just reporting:</strong><span style="vertical-align: baseline;"> Use your insights to help teams improve, not just to show graphs to leadership. Further, be transparent about what specific data led you to act. For example, when you see a dip in release frequency for a specific team, use that data to start a conversation about potential roadblocks rather than simply flagging it as a problem. By doing this, you build the trust and goodwill with both leadership and your tenants to keep moving the platform forward. </span></p> </li> </ol> <hr/> <p><sub><span style="font-style: italic; vertical-align: baseline;">The evolution of the John Lewis Partnership’s measurement strategy serves as a compelling case study. By transitioning from basic lead-time tracking to a holistic model — blending DORA metrics with qualitative developer feedback — they demonstrated that true platform success is defined by the genuine value it delivers, not merely by adoption rates.</span></sub></p> <p><sub><span style="font-style: italic; vertical-align: baseline;">To learn more about platform engineering on Google Cloud, check out some of our other articles: Using Platform Engineering to simplify the developer experience - </span><a href="https://cloud.google.com/blog/products/application-development/simplifying-platform-engineering-at-john-lewis-part-one"><span style="font-style: italic; text-decoration: underline; vertical-align: baseline;">part one</span></a><span style="font-style: italic; vertical-align: baseline;">, </span><a href="https://cloud.google.com/blog/products/application-development/simplifying-platform-engineering-at-john-lewis-part-two"><span style="font-style: italic; text-decoration: underline; vertical-align: baseline;">part two</span></a><span style="font-style: italic; vertical-align: baseline;">, </span><a href="https://cloud.google.com/blog/products/application-development/common-myths-about-platform-engineering"><span style="font-style: italic; text-decoration: underline; vertical-align: baseline;">5 myths about platform engineering: what it is and what it isn’t</span></a><span style="font-style: italic; vertical-align: baseline;"> and</span><span style="font-style: italic; vertical-align: baseline;"> </span><a href="https://cloud.google.com/blog/products/application-development/another-five-myths-about-platform-engineering"><span style="font-style: italic; text-decoration: underline; vertical-align: baseline;">Another five myths about platform engineering</span></a><span style="font-style: italic; vertical-align: baseline;">. We also recommend reading about </span><a href="https://cloud.google.com/blog/products/application-development/introducing-app-hub"><span style="text-decoration: underline; vertical-align: baseline;">App Hub</span></a><span style="vertical-align: baseline;">, </span><span style="font-style: italic; vertical-align: baseline;">our foundational tool for managing application-centric governance across your organization.</span></sub></p></div>
  50. 10X Lead, delta Team, Google Cloud Consulting

    Thu, 08 Jan 2026 17:00:00 -0000

    <div class="block-paragraph_advanced"><p class="p1">FINRA, the Financial Industry Regulatory Authority, consistently seeks to achieve the highest standards in its technology practices. To elevate its software development lifecycle, FINRA — which oversees member broker-dealers — engaged Google consultants to help apply a metrics-driven methodology to its engineering practices.</p> <p class="p1"><a href="https://dora.dev/" rel="noopener" target="_blank">DORA</a> is a popular framework <span style="vertical-align: baseline;">for helping organization improve software delivery performance through capabilities that can be measured by key metrics. These include </span>deployment frequency, change lead time, change failure rate, failed deployment recovery time, and rework.</p> <p class="p1">While FINRA had begun laying the groundwork to adopt DORA internally, the organization recognized an opportunity to accelerate implementation by tapping Google's firsthand experience.</p> <p class="p1">Google conducted a discovery effort alongside technology leaders to identify opportunities for improvement. The recommendation that followed included increasing the existing focus on continuous improvement, adopting a user-centric approach to developing software and further enabling a generative culture within the department.</p> <p class="p1">The implementation itself was deliberately flexible. Rather than recommending a one-size-fits-all approach, Google helped FINRA tailor its actions to individual team objectives. Teams prioritizing product value concentrated on lead time and deployment frequency metrics, while teams focused on stability concentrated on change failure rates and<span style="vertical-align: baseline;"> failed deployment recovery time</span>.</p> <p class="p1">Over the first year of implementation, engineering teams demonstrated continuous improvement across DORA capabilities, achieving a 9% per-developer productivity gain and reporting directionally positive developer experience feedback.</p> <p class="p1">Sprint velocities also improved by 5%, enabling smaller engineering teams to deliver greater incremental product value to the business. Beyond raw metrics, teams also reported heightened transparency around delivery performance and appreciation for a standardized methodology.</p> <p class="p1">Looking ahead, FINRA is maturing its DORA practice by providing more granular metrics tied to high-level DORA measurements, increasing emphasis on developer experience and correlating product metrics with software delivery performance indicators.</p> <p class="p1"><em>Want to discover what AI can do for governments, nonprofits, and other public sector organizations? Register to attend our upcoming <a href="https://cloudonair.withgoogle.com/events/gemini-for-government-your-front-door-for-mission-ai" rel="noopener" target="_blank">Gemini for Government webinar on February 5</a>, where we will dive deeper into the transformative technology powering the next wave of innovation across the public sector.</em></p></div>
  51. Senior Product Marketing Manager

    Tue, 09 Dec 2025 17:00:00 -0000

    <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">The </span><a href="https://cloud.google.com/blog/products/ai-machine-learning/announcing-the-2025-dora-report"><span style="text-decoration: underline; vertical-align: baseline;">2025 State of AI-assisted Software Development report</span></a><span style="vertical-align: baseline;"> revealed a critical truth: AI is an amplifier. It magnifies the strengths of high-performing organizations and the dysfunctions of struggling ones.</span></p> <p><span style="vertical-align: baseline;">While AI adoption is now near-universal, with 90% of developers using it in their daily workflows, success is not guaranteed. Our cluster analysis of nearly 5,000 technology professionals reveals significant variation in team performance: Not everyone experiences the same outcomes from adopting AI. </span></p> <p><span style="vertical-align: baseline;">From this disparity, we can conclude that how they are using AI is a critical factor. We wanted to understand the particular capabilities and conditions that enable teams to achieve positive outcomes, leading us to develop the </span><a href="https://cloud.google.com/resources/content/2025-dora-ai-capabilities-model-report"><span style="text-decoration: underline; vertical-align: baseline;">DORA AI Capabilities Model report</span></a><span style="vertical-align: baseline;">. </span></p> <p><span style="vertical-align: baseline;">This companion guide to the 2025 DORA Report is designed to help you navigate our new reality. It provides actionable strategies, implementation tactics, and measurement frameworks to help technology leaders build an environment where AI thrives.</span></p> <h3><strong style="vertical-align: baseline;">Seven capabilities that amplify success</strong></h3> <p><span style="vertical-align: baseline;">Successfully using AI requires cultivating your technical and cultural environment. From the same set of respondents who participated in the 2025 DORA survey, we identified seven foundational capabilities that are proven to amplify the positive impact of AI on organizational performance:</span></p> <ol> <li role="presentation"><strong style="vertical-align: baseline;">Clear and communicated AI stance</strong><span style="vertical-align: baseline;">: Ambiguity creates risk. A clear policy provides the psychological safety developers need to experiment effectively.</span></li> <li role="presentation"><strong style="vertical-align: baseline;">Healthy data ecosystems</strong><span style="vertical-align: baseline;">: AI is only as good as the data it learns from. Investing in high-quality, accessible, and unified internal data significantly amplifies AI's benefits.</span></li> <li role="presentation"><strong style="vertical-align: baseline;">AI-accessible internal data</strong><span style="vertical-align: baseline;">: This involves "context engineering," moving beyond simple prompts to securely connect AI tools to your internal documentation and codebases.</span></li> <li role="presentation"><strong style="vertical-align: baseline;">Strong version control practices</strong><span style="vertical-align: baseline;">: As AI increases the volume and velocity of code generation, version control becomes your critical safety net. Frequent commits and robust rollback capabilities are essential for maintaining stability in an AI-assisted world.</span></li> <li role="presentation"><strong style="vertical-align: baseline;">Working in small batches</strong><span style="vertical-align: baseline;">: AI can easily generate massive blocks of code, which are hard to review and test. Enforcing the discipline of small batches counteracts this risk, ensuring that speed translates to product performance rather than instability.</span></li> <li role="presentation"><strong style="vertical-align: baseline;">User-centric focus</strong><span style="vertical-align: baseline;">: Speed is irrelevant if you are moving in the wrong direction. Adopting AI tools can actually harm teams that lack a user-centric focus. Keeping user needs as your North Star is essential for guiding AI-assisted development.</span></li> <li><strong style="vertical-align: baseline;">Quality internal platforms</strong><span style="vertical-align: baseline;">: A platform provides the automated, secure "paved roads" that allow AI benefits to scale across the organization. It prevents individual productivity gains from being lost to downstream bottlenecks.</span></li> </ol></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/dora-ai-capabilities-model.max-1000x1000.jpg" alt="dora-ai-capabilities-model"> </a> <figcaption class="article-image__caption "><p data-block-key="y4u85">The DORA AI Capabilities Model shows which capabilities amplify the effect of AI adoption on</p><p data-block-key="7k909">specific outcomes</p></figcaption> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><h3><strong style="vertical-align: baseline;">Where to start: Assessing your team</strong></h3> <p><span style="vertical-align: baseline;">Every organization starts their AI journey differently. To help you prioritize, this report introduces seven distinct team archetypes derived from our cluster analysis. These profiles range from "harmonious high-achievers," who excel in both performance and well-being, to teams facing "foundational challenges" or those stuck in a "legacy bottleneck," where unstable systems undermine morale.</span></p> <p><span style="vertical-align: baseline;">Identifying the profile that best matches your team can help pinpoint the most impactful interventions. For example, a "high impact, low cadence" team might prioritize automation to improve stability, while a team "constrained by process" might focus on reducing friction through a better AI stance.</span></p> <h3><strong style="vertical-align: baseline;">Digging deeper with Value Stream Mapping</strong></h3> <p><span style="vertical-align: baseline;">Once you understand your team's profile, how do you direct your efforts? The report includes a step-by-step facilitation guide for running a Value Stream Mapping (VSM) exercise.</span></p> <p><span style="vertical-align: baseline;">VSM acts as an AI force multiplier. By visualizing your flow from idea to customer, you can identify where work waits and where friction exists. This ensures that the efficiency gains from AI aren't just creating local optimizations that pile up work downstream, but are instead channeled into solving system-level constraints.</span></p> <h3><strong style="vertical-align: baseline;">Get better at getting better</strong></h3> <p><span style="vertical-align: baseline;">AI adoption is an organizational transformation. The greatest returns come not from the tools themselves, but from investing in the foundational systems that enable them.</span></p> <ul> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><a href="https://cloud.google.com/resources/content/2025-dora-ai-capabilities-model-report"><span style="text-decoration: underline; vertical-align: baseline;">Download the full report</span></a></p> </li> <li><span style="vertical-align: baseline;">Join the </span><a href="https://dora.community/" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">DORA community</span></a></li> </ul></div>
  52. Practice Lead, SRE

    Mon, 08 Dec 2025 17:00:00 -0000

    <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">When was the last time you </span><span style="font-style: italic; vertical-align: baseline;">knew — </span><span style="vertical-align: baseline;">not just </span><span style="font-style: italic; vertical-align: baseline;">hoped</span><span style="vertical-align: baseline;"> — that your disaster recovery plan would work perfectly?</span></p> <p><span style="vertical-align: baseline;">For most of us, the answer is unclear. Sure, you may have a DR plan, a meticulously crafted document stored in a wiki or a shared drive, that gets dusted off for compliance audits or the occasional tabletop drill. You assume its procedures are correct, its contact lists are current, and its dependencies are fully mapped, and you certainly </span><span style="font-style: italic; vertical-align: baseline;">hope</span><span style="vertical-align: baseline;"> it works.</span></p> <p><span style="vertical-align: baseline;">But </span><a href="https://sre.google/prodverbs/?slide=10" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">hope is not a strategy</span></a><span style="vertical-align: baseline;">.</span></p> <p><span style="vertical-align: baseline;">Why wouldn’t it work? One problem is that systems are rarely static anymore. In a world where you deploy new microservices dozens of times per day, make constant configuration changes, and maintain an ever-growing web of third-party API dependencies, the DR plan you wrote last quarter is probably just as useful as one from 10 years ago. </span></p> <p><span style="vertical-align: baseline;">And if the failover does work, will it work well enough to meet the promises you've made to your customers (or board of directors or regulators)? When a key component fails, could you still even meet your target availability and latency targets, a.k.a., your Service Level Objectives (SLOs)?</span></p> <p><span style="vertical-align: baseline;">So, how do you close this gap between your current aspirational DR plan and a DR plan that you actually have confidence in? The answer isn't to write more documents or run more theatrical drills. The answer is to stop </span><span style="font-style: italic; vertical-align: baseline;">assuming</span><span style="vertical-align: baseline;"> and start </span><span style="font-style: italic; vertical-align: baseline;">proving</span><span style="vertical-align: baseline;">.</span></p> <p><span style="vertical-align: baseline;">This is where chaos engineering comes in. Unlike what the name might imply, chaos engineering isn’t a tool for recklessly breaking things. Instead, it’s a framework that provides data-driven confidence in your SLOs under stress. By running controlled experiments that simulate real-world disasters like a database failover or a regional outage, you can quantitatively measure the impact of those failures on your systems’ performance. Chaos engineering is how you transform your DR hypotheses into a proven method to ensure resilience. By validating your plan through experimentation, you create tangible evidence, verifying that your plan will safeguard your infrastructure and keep your promises to customers.</span></p> <h3><strong style="vertical-align: baseline;">Demystifying chaos engineering</strong></h3> <p><span style="vertical-align: baseline;">In a nutshell, chaos engineering is the practice of running controlled, scientific experiments to find weaknesses in your system before they cause a real outage. </span></p> <p><span style="vertical-align: baseline;">At its core, it’s about building confidence in your system’s resilience. The process starts with understanding your system's </span><strong style="vertical-align: baseline;">steady state</strong><span style="vertical-align: baseline;">, which is its normal, measurable, and healthy output. You can't know the true impact of a failure without first defining what "good" looks like. This understanding allows you to form a clear, testable </span><strong style="vertical-align: baseline;">hypothesis</strong><span style="vertical-align: baseline;">: a statement of belief that your system's steady state will persist even when a specific, turbulent condition is introduced.</span></p> <p><span style="vertical-align: baseline;">To test this hypothesis, you then execute a controlled </span><strong style="vertical-align: baseline;">action</strong><span style="vertical-align: baseline;">, which is a precise and targeted failure injected into the system. This isn't random mischief; it's a specific simulation of real-world failures, such as consuming all CPU on a host (</span><strong style="vertical-align: baseline;">resource exhaustion</strong><span style="vertical-align: baseline;">), adding network latency (</span><strong style="vertical-align: baseline;">network failure</strong><span style="vertical-align: baseline;">), or terminating a virtual machine (</span><strong style="vertical-align: baseline;">state failure</strong><span style="vertical-align: baseline;">). While this action is running, automated </span><strong style="vertical-align: baseline;">probes</strong><span style="vertical-align: baseline;"> act as your scientific instruments, continuously monitoring the system's state to measure the effect. </span></p> <p><span style="vertical-align: baseline;">Together, these components form a complete scientific loop: you use a </span><strong style="vertical-align: baseline;">hypothesis</strong><span style="vertical-align: baseline;"> to predict resilience, run an experiment by applying an </span><strong style="vertical-align: baseline;">action</strong><span style="vertical-align: baseline;"> to simulate adversity, and use </span><strong style="vertical-align: baseline;">probes</strong><span style="vertical-align: baseline;"> to measure the impact, turning uncertainty into hard data.</span></p> <h3><strong style="vertical-align: baseline;">Using chaos to validate disaster recovery plans</strong></h3> <p><span style="vertical-align: baseline;">Now that you understand the building blocks of a chaos experiment, you can build the bridge to your ultimate goal: transforming your DR plan from a document of hope into an evidence-based procedure. The key is to stop seeing your DR plan as a set of instructions and start seeing it for what it truly is: a collection of unproven hypotheses.</span></p> <p><span style="vertical-align: baseline;">When you think about it, every significant statement in your DR document is a claim waiting to be tested. When your plan states, </span><span style="font-style: italic; vertical-align: baseline;">"The database will failover to the replica in under 5 minutes,"</span><span style="vertical-align: baseline;"> that isn't a fact, it's a </span><strong style="vertical-align: baseline;">hypothesis</strong><span style="vertical-align: baseline;">. When it says, </span><span style="font-style: italic; vertical-align: baseline;">"In the event of a regional outage, traffic will be successfully rerouted to the secondary region,"</span><span style="vertical-align: baseline;"> that's another hypothesis. Your DR plan is filled with these critical assumptions about how your system </span><span style="font-style: italic; vertical-align: baseline;">should</span><span style="vertical-align: baseline;"> behave under duress. Until you test them, they remain nothing more than educated guesses.</span></p> <p><span style="vertical-align: baseline;">Chaos experiments are the ultimate validation tools, </span><strong style="vertical-align: baseline;">live-fire drills</strong><span style="vertical-align: baseline;"> that put your DR hypotheses to a real, empirical test. Instead of just talking through a scenario, you use controlled </span><strong style="vertical-align: baseline;">actions</strong><span style="vertical-align: baseline;"> to safely and precisely simulate the disaster. You're no longer asking "what if?"; you're actively measuring "what happens when."</span></p> <p><span style="vertical-align: baseline;">For example, imagine you have a DR plan for a regional outage. When you adopt chaos engineering, you break down that plan into a hypothesis and an experiment. For example:</span></p> <ul> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">The hypothesis:</strong><span style="vertical-align: baseline;"> "In case our primary region </span><code style="vertical-align: baseline;">us-central1</code><span style="vertical-align: baseline;"> becomes unreachable, the load balancers will failover all traffic to </span><code style="vertical-align: baseline;">us-east1</code><span style="vertical-align: baseline;"> within 3 minutes, with an error rate below 1%."</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">The chaos experiment:</strong><span style="vertical-align: baseline;"> Run an </span><strong style="vertical-align: baseline;">action</strong><span style="vertical-align: baseline;"> that simulates a regional outage by injecting a "blackhole" that drops all network traffic to and from </span><code style="vertical-align: baseline;">us-central1</code><span style="vertical-align: baseline;"> for a limited time. Your </span><strong style="vertical-align: baseline;">probes</strong><span style="vertical-align: baseline;"> then measure the actual failover time and error rates to validate the hypothesis.</span></p> </li> </ul> <p><span style="vertical-align: baseline;">In other words, by applying the chaos engineering methodology, you systematically move through your DR plan, turning each assumption into a proven fact. You're not just testing your plan; you're forging it in a controlled fire.</span></p> <h3><strong style="vertical-align: baseline;">Connecting chaos readiness to your SLOs</strong></h3> <p><span style="vertical-align: baseline;">Beyond simply proving system availability, chaos engineering builds trust in your reliability metrics, ensuring that you meet your SLOs even when services become unavailable. An SLO is a specific, acceptable target level of your service's performance measured over a specified period that reflects the user's experience. SLOs aren't just internal goals; they are the bedrock of customer trust and the foundation of your contractual service level agreements (SLAs).</span></p> <p><span style="vertical-align: baseline;">A traditional DR drill might get a "pass" because the backup system came online. But what if it took 20 minutes to fail over, during which every user saw errors? What if the backup region was under-provisioned, and performance became so slow that the service was unusable? From a technical perspective, you "recovered." But from a customer's perspective, you were down.</span></p> <p><span style="vertical-align: baseline;">A chaos experiment, however, can help you answer a critical question: </span><strong style="vertical-align: baseline;">"During a failover, did we still meet our SLOs?” </strong><span style="vertical-align: baseline;">Because your probes are constantly measuring performance against your SLOs, you get the full picture. You don't just see that the database failed over; you see that it took 7 minutes, during which your latency SLO was breached and your </span><a href="https://sre.google/sre-book/embracing-risk/#:~:text=Forming%20Your%20Error%20Budget,new%20releases%20can%20be%20pushed." rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">error budget</span></a><span style="vertical-align: baseline;"> was completely burned. This is the crucial, game-changing insight. It shifts the entire goal from simple disaster recovery to </span><strong style="vertical-align: baseline;">SLO preservation</strong><span style="vertical-align: baseline;">, which is what actually determines if a failure was a minor hiccup or a major business-impacting incident. It also provides the data necessary to set goals for system improvement. So the next time you run this experiment, you can measure if and how much your system resilience has improved, and ultimately if you can maintain your SLO during the disaster event.</span></p> <h3><strong style="vertical-align: baseline;">Build a culture of confidence</strong></h3> <p><span style="vertical-align: baseline;">The journey to resilience doesn't start by simulating a full regional failover. It starts with a single, small experiment. The goal is not to boil the ocean; it's to build momentum. Test one timeout, one retry mechanism, or one graceful error message.</span></p> <p><span style="vertical-align: baseline;">The biggest win from your first successful experiment won't be the technical data you gather. It will be the confidence you build. When your team sees that they can safely inject failure, learn from it, and improve the system, their entire relationship with failure changes. Fear is replaced by curiosity. That confidence is the catalyst for building a true, enduring culture of resilience. To learn more and get started with chaos engineering, check out </span><a href="https://cloud.google.com/blog/products/devops-sre/getting-started-with-chaos-engineering?e=48754805"><span style="text-decoration: underline; vertical-align: baseline;">this blog</span></a><span style="vertical-align: baseline;"> and </span><a href="https://sre.google/prodcast/#season3-episode12" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">this podcast</span></a><span style="vertical-align: baseline;">. And if you’re ready to get started, but unsure how, reach out to Google Cloud professional services to discuss how we can help.</span></p></div>
  53. Group Product Manager, Google Cloud

    Mon, 08 Dec 2025 17:00:00 -0000

    <div class="block-paragraph_advanced"><p style="text-align: justify;"><span style="vertical-align: baseline;">Earlier this year, we unveiled a big investment in platform and developer team productivity, with the launch of </span><a href="https://docs.cloud.google.com/application-design-center/docs/overview"><span style="text-decoration: underline; vertical-align: baseline;">Application Design Center</span></a><span style="vertical-align: baseline;">, </span><span style="vertical-align: baseline;">helping them streamline </span><span style="vertical-align: baseline;">the design and deployment of cloud application infrastructure, while ensuring applications are secure, reliable, and aligned with best practices</span><span style="vertical-align: baseline;">. And today, Application Design Center is generally available.</span></p> <p style="text-align: justify;"><span style="vertical-align: baseline;">We built Application Design Center to put applications at the center of your cloud experience, with a visual, canvas-style and AI-powered approach to design and modify Terraform-backed application templates. It also offers full lifecycle management that’s aligned with DevOps best practices across application design and deployment.</span></p> <p style="text-align: justify;"><span style="vertical-align: baseline;">Application Design Center is a core component of our </span><a href="https://docs.cloud.google.com/hub/docs/application-centric-google-cloud"><span style="text-decoration: underline; vertical-align: baseline;">application-centric cloud experience</span></a><span style="vertical-align: baseline;">. When you use Application Design Center to design and deploy your application infrastructure, your applications are easily discoverable, observable, and manageable. Application Design Center works in concert with </span><a href="https://cloud.google.com/app-hub/docs/overview"><span style="text-decoration: underline; vertical-align: baseline;">App Hub</span></a><span style="vertical-align: baseline;"> to automatically register application deployments, enabling a unified view and control plane for your application portfolio, and </span><a href="https://docs.cloud.google.com/hub/docs/overview"><span style="text-decoration: underline; vertical-align: baseline;">Cloud Hub</span></a><span style="vertical-align: baseline;">, to provide operational insights for your applications.</span></p> <p style="text-align: justify; padding-left: 40px;"><span style="font-style: italic; vertical-align: baseline;">“Google Application Design Center is a valuable enabler for Platform Engineering, providing a structured approach to harmonizing resource creation in Google Cloud Platform. By aligning tools, processes, and technologies, it streamlines workflows, reducing friction between development, operations, and other teams. This harmonization enhances collaboration, accelerates delivery, and ensures consistency across Google Cloud environments.”</span><span style="vertical-align: baseline;"> - </span><strong style="vertical-align: baseline;">Ervis Duraj, Principal Engineer,</strong><span style="vertical-align: baseline;"> </span><strong style="vertical-align: baseline;">MediaMarktSaturn Technology</strong></p> <h3><span style="vertical-align: baseline;">The gateway to an app-centric cloud</span></h3> <p style="text-align: justify;"><span style="vertical-align: baseline;">Our goal with Application Design Center is for you to innovate more, and administer less. It consists of </span><span style="vertical-align: baseline;">four key elements to help you minimize administrative overhead and maximize efficiency, so you can design and deploy applications with integrated best practices and essential guardrails. Let’s take a closer look.</span></p> <p role="presentation" style="text-align: justify;"><span style="vertical-align: baseline;">1. </span><strong style="vertical-align: baseline;">Terraform </strong><a href="https://docs.cloud.google.com/application-design-center/docs/supported-resources"><strong style="text-decoration: underline; vertical-align: baseline;">components</strong></a><strong style="vertical-align: baseline;"> and </strong><a href="https://docs.cloud.google.com/application-design-center/docs/design-application-templates"><strong style="text-decoration: underline; vertical-align: baseline;">application templates</strong></a><strong style="vertical-align: baseline;"> <br/></strong><span style="vertical-align: baseline;">Develop applications faster with our growing library of opinionated application templates. These provide well-architected patterns and pre-built components, including innovative "AI inference templates" to help you leverage AI to create dynamic and intelligent application foundations. As an example, at launch, Application Design Center provides opinionated templates for Google Kubernetes Engine (GKE) clusters (</span><a href="https://docs.cloud.google.com/application-design-center/docs/configure-gke-standard-cluster"><span style="text-decoration: underline; vertical-align: baseline;">Standard</span></a><span style="vertical-align: baseline;">, </span><a href="https://docs.cloud.google.com/application-design-center/docs/configure-gke-autopilot-cluster"><span style="text-decoration: underline; vertical-align: baseline;">Autopilot</span></a><span style="vertical-align: baseline;"> and </span><a href="https://docs.cloud.google.com/application-design-center/docs/configure-gke-node-pool"><span style="text-decoration: underline; vertical-align: baseline;">NodePool</span></a><span style="vertical-align: baseline;">) to run AI inference workloads using a variety of LLM models, as well as for enterprise-grade production clusters or single-region web app clusters. </span></p> <p><span style="vertical-align: baseline;">You can also </span><a href="https://docs.cloud.google.com/application-design-center/docs/import-components"><span style="text-decoration: underline; vertical-align: baseline;">ingest and manage your existing Terraform configurations</span></a><span style="vertical-align: baseline;"> (“Bring your own Terraform”) directly from Git repositories. Once imported, you can use Application Design Center to design with your own Terraform, or in combination with Google-provided Terraform, to create standardized, opinionated infrastructure patterns for sharing and reuse across your application teams.</span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/3-_Catalog_Share.gif" alt="3- Catalog Share"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><p role="presentation" style="text-align: justify;"><span style="vertical-align: baseline;">2. </span><strong style="vertical-align: baseline;">AI-powered design for rapid application designing and prototyping <br/></strong><span style="vertical-align: baseline;">Application Design Center integrates with Google's </span><a href="https://cloud.google.com/gemini/docs/cloud-assist/design-application"><span style="text-decoration: underline; vertical-align: baseline;">Gemini Cloud Assist Design Agent,</span></a><span style="vertical-align: baseline;"> empowering you to design actual, deployable application infrastructure application templates on Google Cloud that you can export as Terraform infrastructure-as-code. </span></p> <p style="text-align: justify;"><span style="vertical-align: baseline;">With Gemini Cloud Assist, you can describe your application design intents using natural language. In return, Gemini interactively generates multi-product application template suggestions, complete with visual architecture diagrams and summarized benefits. You can then refine these proposals through multi-turn reasoning or by directly manipulating the architecture within the Application Design Center canvas. </span></p> <p><span style="vertical-align: baseline;">Additionally, all designs that you create with Gemini are automatically observable, optimizable, and enabled for troubleshooting assistance during runtime, thanks to their tight integration with </span><a href="https://cloud.google.com/products/gemini/cloud-assist?hl=en"><span style="text-decoration: underline; vertical-align: baseline;">Gemini Cloud Assist</span></a><span style="vertical-align: baseline;">.</span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/1-Components_and_templates.gif" alt="1-Components and templates"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><p role="presentation" style="text-align: justify;"><span style="vertical-align: baseline;">3. </span><strong style="vertical-align: baseline;">A secure, sharable catalog of application templates with full lifecycle management<br/></strong><span style="vertical-align: baseline;">Platform admins can curate a collection of application templates built from Google's best-practice components. This provides developers a trusted, self-service experience from which they can quickly discover and deploy compliant applications. Tight integration with </span><a href="https://docs.cloud.google.com/hub/docs/overview"><span style="text-decoration: underline; vertical-align: baseline;">Cloud Hub</span></a><span style="vertical-align: baseline;"> transforms these governed templates into a live operational command center, complete with unified visibility into the health and deployment status of the resulting applications. This closes the critical loop between design and runtime, so that your production environments reflect your organization’s approved architectural standards.</span></p> <p><span style="vertical-align: baseline;">Also, Application Design Center’s robust </span><a href="https://docs.cloud.google.com/application-design-center/docs/manage-application-instances#create-application-revision"><span style="text-decoration: underline; vertical-align: baseline;">application template revisions</span></a><span style="vertical-align: baseline;"> serve as an immutable audit trail. It automatically detects and flags configuration drift between your intended designs and deployed applications, so that developers can remediate unauthorized changes or safely push approved configuration updates. This helps ensure continuous state consistency and compliance from Day 1 and through the subsequent evolution of your application.</span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/2-Design_Agent.gif" alt="2-Design Agent"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><p role="presentation" style="text-align: justify;"><span style="vertical-align: baseline;">4. </span><strong style="vertical-align: baseline;">GitOps integration automating developers’ day-to-day software design lifecycle tasks <br/></strong><span style="vertical-align: baseline;">By integrating Application Design Center into existing CI/CD workflows, platform teams empower developers to own the complete software delivery lifecycle right from their IDE. Developers can leverage compliant application </span><span style="font-style: italic; vertical-align: baseline;">and</span><span style="vertical-align: baseline;"> infrastructure (IaC) code using Application Design Center application templates. </span></p> <p style="text-align: justify;"><span style="vertical-align: baseline;">Further, every infrastructure decision made through Application Design Center is committed to code, versioned, and auditable. Specifically, developers can download the application IaC template from Application Design Center and import it into their app repos (the single source of truth), clone their repo, and edit the Terraform directly in their local IDEs. Any modifications go through a Git pull request for review. Once approved, this automatically triggers the existing CI/CD setup to build, test, and deploy both app and infra changes in lockstep. This unified approach minimizes friction, enforcing "golden paths" and providing an end-to-end automated pathway from a line of code in the IDE to a fully deployed change in production. </span></p> <h3 style="text-align: justify;"><span style="vertical-align: baseline;">What's new since preview</span></h3> <p style="text-align: justify;"><span style="vertical-align: baseline;">This GA launch is packed with features that users have been asking for. We’re excited to share powerful new capabilities: enterprise-grade governance and security with </span><a href="https://cloud.google.com/sdk/gcloud/reference/design-center"><span style="text-decoration: underline; vertical-align: baseline;">public APIs and gcloud CLI support</span></a><span style="vertical-align: baseline;">; </span><a href="https://docs.cloud.google.com/application-design-center/docs/set-up-secure-perimeter"><span style="text-decoration: underline; vertical-align: baseline;">full compatibility with VPC service controls</span></a><span style="vertical-align: baseline;">; </span><a href="https://docs.cloud.google.com/application-design-center/docs/import-components"><span style="text-decoration: underline; vertical-align: baseline;">bring your own Terraform</span></a><span style="vertical-align: baseline;"> and </span><a href="https://docs.cloud.google.com/application-design-center/docs/download-and-deploy#export_terraform_code"><span style="text-decoration: underline; vertical-align: baseline;">GitOps support</span></a><span style="vertical-align: baseline;"> for integration with your existing application patterns and automation pipelines; agentic application patterns using GKE templates (</span><a href="https://docs.cloud.google.com/application-design-center/docs/configure-gke-standard-cluster"><span style="text-decoration: underline; vertical-align: baseline;">Standard</span></a><span style="vertical-align: baseline;">, </span><a href="https://docs.cloud.google.com/application-design-center/docs/configure-gke-autopilot-cluster"><span style="text-decoration: underline; vertical-align: baseline;">Autopilot</span></a><span style="vertical-align: baseline;"> and </span><a href="https://docs.cloud.google.com/application-design-center/docs/configure-gke-node-pool"><span style="text-decoration: underline; vertical-align: baseline;">NodePool</span></a><span style="vertical-align: baseline;">); and finally, a simplified onboarding experience with </span><a href="https://docs.cloud.google.com/application-design-center/docs/setup"><span style="text-decoration: underline; vertical-align: baseline;">app-managed project support</span></a><span style="vertical-align: baseline;">, making Application Design Center an AI-powered engine for your applications on Google Cloud.</span></p> <h3 style="text-align: justify;"><span style="vertical-align: baseline;">Get started today</span></h3> <p style="text-align: justify;"><span style="vertical-align: baseline;">To help you get started, Google provides a growing library of curated Google application templates built by experts. These templates combine multiple Google Cloud products and best practices to serve common use cases, which you can configure for deployment, and view as infrastructure as code in-line. Platform teams can then create and securely share the catalogs and collaborate with teammates on designs and self-service deployment for developers. For enterprises with existing Terraform patterns and assets, Application Design Center interoperates by enabling their import and reuse within its native design and configuration experience.</span></p> <p><span style="vertical-align: baseline;">Ready to experience the power of </span><a href="https://docs.cloud.google.com/application-design-center/docs/setup"><span style="text-decoration: underline; vertical-align: baseline;">Application Design Center</span></a><span style="vertical-align: baseline;">? </span><span style="vertical-align: baseline;">You can learn more about ADC and get started building in minutes using the </span><a href="https://docs.cloud.google.com/application-design-center/docs/quickstart-create-template"><span style="text-decoration: underline; vertical-align: baseline;">quickstart</span></a><span style="vertical-align: baseline;">. </span><span style="vertical-align: baseline;">You can start building your first AI-powered application template in minutes, </span><a href="https://cloud.google.com/products/application-design-center/pricing"><span style="text-decoration: underline; vertical-align: baseline;">free of cost</span></a><span style="vertical-align: baseline;">, and quickly deploy applications with working code. For deeper insights, explore the comprehensive public documentation </span><a href="https://docs.cloud.google.com/application-design-center/docs/overview"><span style="text-decoration: underline; vertical-align: baseline;">here</span></a><span style="vertical-align: baseline;">. We can't wait to see how you innovate with the Application Design Center!</span></p></div>
  54. Senior Product Manager

    Wed, 03 Dec 2025 23:00:00 -0000

    <div class="block-paragraph_advanced"><p><strong style="font-style: italic; vertical-align: baseline;">Editor's note</strong><span style="font-style: italic; vertical-align: baseline;">: This blog was updated on Dec. 4, 5, 7, and 12, 2025, with additional guidance on Cloud Armor WAF rule syntax, and WAF enforcement across App Engine Standard, Cloud Functions, and Cloud Run.</span></p> <p><span style="vertical-align: baseline;">Earlier today, Meta and Vercel publicly disclosed two vulnerabilities that expose services built using the popular open-source frameworks </span><strong style="vertical-align: baseline;">React</strong><span style="vertical-align: baseline;"> </span><strong style="vertical-align: baseline;">Server Components</strong><span style="vertical-align: baseline;"> (</span><a href="https://www.cve.org/CVERecord?id=CVE-2025-55182" rel="noopener" target="_blank"><strong style="text-decoration: underline; vertical-align: baseline;">CVE-2025-55182</strong></a><span style="vertical-align: baseline;">) and </span><strong style="vertical-align: baseline;">Next.js </strong><span style="vertical-align: baseline;">to remote code execution risks when used for some server-side use cases. At Google Cloud, we understand the severity of these vulnerabilities, also known as </span><a href="https://react2shell.com/" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">React2Shell</span></a><span style="vertical-align: baseline;">, and our security teams have shared their recommendations to help our customers take immediate, decisive action to secure their applications.</span></p> <h3><span style="vertical-align: baseline;">Vulnerability background</span></h3> <p><span style="vertical-align: baseline;">The </span><strong style="vertical-align: baseline;">React Server Components framework</strong><span style="vertical-align: baseline;"> is commonly used for building user interfaces. On Dec. 3, 2025, </span><a href="http://cve.org" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">CVE.org</span></a><span style="vertical-align: baseline;"> assigned this vulnerability as </span><a href="https://www.cve.org/CVERecord?id=CVE-2025-55182" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">CVE-2025-55182</span></a><span style="vertical-align: baseline;">. The official Common Vulnerability Scoring System (CVSS) base severity score has been determined as Critical, a severity of 10.0. </span></p> <ul> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">Vulnerable versions</strong><span style="vertical-align: baseline;">: React 19.0, 19.1.0, 19.1.1, and 19.2.0</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">Patched</strong><span style="vertical-align: baseline;"> in React 19.2.1</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">Fix</strong><span style="vertical-align: baseline;">: </span><a href="https://github.com/facebook/react/commit/7dc903cd29dac55efb4424853fd0442fef3a8700" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">https://github.com/facebook/react/commit/7dc903cd29dac55efb4424853fd0442fef3a8700</span></a></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">Announcement</strong><span style="vertical-align: baseline;">: </span><a href="https://react.dev/blog/2025/12/03/critical-security-vulnerability-in-react-server-components" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">https://react.dev/blog/2025/12/03/critical-security-vulnerability-in-react-server-components</span></a></p> </li> </ul> <p><span style="vertical-align: baseline;">Next.js is a web development framework that depends on React, and is also commonly used for building user interfaces. (The Next.js vulnerability was referenced as </span><a href="https://www.cve.org/CVERecord?id=CVE-2025-66478" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">CVE-2025-66478</span></a><span style="vertical-align: baseline;"> before being marked as a duplicate.)</span></p> <ul> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">Vulnerable versions</strong><span style="vertical-align: baseline;">: Next.js 15.x, Next.js 16.x, Next.js 14.3.0-canary.77 and later canary releases</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">Patched</strong><span style="vertical-align: baseline;"> versions are listed </span><a href="https://nextjs.org/blog/CVE-2025-66478#required-action" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">here</span></a><span style="vertical-align: baseline;">.</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">Fix</strong><span style="vertical-align: baseline;">: </span><a href="https://github.com/vercel/next.js/commit/6ef90ef49fd32171150b6f81d14708aa54cd07b2" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">https://github.com/vercel/next.js/commit/6ef90ef49fd32171150b6f81d14708aa54cd07b2</span></a></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">Announcement</strong><span style="vertical-align: baseline;">: </span><a href="https://nextjs.org/blog/CVE-2025-66478" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">https://nextjs.org/blog/CVE-2025-66478</span></a></p> </li> </ul> <p><span style="vertical-align: baseline;">Google Threat Intelligence Group (GTIG) has also published a new report to help understand the </span><a href="https://cloud.google.com/blog/topics/threat-intelligence/threat-actors-exploit-react2shell-cve-2025-55182"><span style="text-decoration: underline; vertical-align: baseline;">specific threats exploiting React2Shell</span></a><span style="vertical-align: baseline;">.</span></p> <p><span style="vertical-align: baseline;">We strongly encourage organizations who manage environments relying on the React and Next.js frameworks to update to the latest version, and take the mitigation actions outlined below.</span></p> <h3><span style="vertical-align: baseline;">Mitigating CVE-2025-55182</span></h3> <p><span style="vertical-align: baseline;">We have created and rolled out a new </span><strong style="vertical-align: baseline;">Cloud Armor web application firewall (WAF) rule</strong><span style="vertical-align: baseline;"> designed to detect and block exploitation attempts related to CVE-2025-55182. This new rule is </span><strong style="vertical-align: baseline;">available now</strong><span style="vertical-align: baseline;"> and is intended to help protect your internet-facing applications and services that use global or regional Application Load Balancers. We recommend deploying this rule as a temporary mitigation while your vulnerability management program patches and verifies all vulnerable instances in your environment.</span></p> <p><span style="vertical-align: baseline;">For customers using </span><a href="https://cloud.google.com/appengine/"><strong style="text-decoration: underline; vertical-align: baseline;">App Engine Standard</strong></a><span style="vertical-align: baseline;">, </span><a href="https://cloud.google.com/functions/"><strong style="text-decoration: underline; vertical-align: baseline;">Cloud Functions</strong></a><span style="vertical-align: baseline;">, </span><a href="https://cloud.google.com/run/"><strong style="text-decoration: underline; vertical-align: baseline;">Cloud Run</strong></a><span style="vertical-align: baseline;">, </span><a href="https://firebase.google.com/products/hosting" rel="noopener" target="_blank"><strong style="text-decoration: underline; vertical-align: baseline;">Firebase Hosting</strong></a><span style="vertical-align: baseline;"> or </span><a href="https://firebase.google.com/products/app-hosting" rel="noopener" target="_blank"><strong style="text-decoration: underline; vertical-align: baseline;">Firebase App Hosting</strong></a><span style="vertical-align: baseline;">, we provide an additional layer of defense for serverless workloads by automatically enforcing platform-level WAF rules that can detect and block the most common exploitation attempts related to CVE-2025-55182.</span></p> <p><span style="vertical-align: baseline;">For </span><a href="https://support.projectshield.google/s/article/Protecting-Your-Website-From-Known-Vulnerabilities" rel="noopener" target="_blank"><strong style="text-decoration: underline; vertical-align: baseline;">Project Shield</strong></a><span style="vertical-align: baseline;"> users, we have deployed WAF protections for all sites and no action is necessary to enable these WAF rules. For long-term mitigation, you will need to patch your origin servers as an essential step to eliminate the vulnerability (see additional guidance below).</span></p> <p><span style="vertical-align: baseline;">Cloud Armor and the Application Load Balancer can be used to deliver and protect your applications and services regardless of whether they are deployed on Google Cloud, on-premises, or on another infrastructure provider. If you are not yet using Cloud Armor and the Application Load Balancer, please follow the guidance further down to get started.</span></p> <p><span style="vertical-align: baseline;">While these platform-level rules and the optional Cloud Armor WAF rules (for services behind an Application Load Balancer) help mitigate the risk from exploits of the CVE, we continue to strongly recommend updating your application dependencies as the primary long-term mitigation.</span></p> <h3><span style="vertical-align: baseline;">Deploying the cve-canary WAF rule for Cloud Armor</span></h3> <p><span style="vertical-align: baseline;">To configure Cloud Armor to detect and protect from CVE-2025-55182, you can use the </span><a href="https://docs.cloud.google.com/armor/docs/waf-rules#cves_and_other_vulnerabilities"><code style="text-decoration: underline; vertical-align: baseline;">cve-canary</code><span style="text-decoration: underline; vertical-align: baseline;"> preconfigured WAF rule</span></a><span style="vertical-align: baseline;"> leveraging the new ruleID that we have added for this vulnerability. This rule is opt-in only, and must be added to your policy even if you are already using the cve-canary rules.</span></p> <p><span style="vertical-align: baseline;">In your Cloud Armor backend security policy, create a new rule and configure the following match condition:</span></p></div> <div class="block-code"><dl> <dt>code_block</dt> <dd>&lt;ListValue: [StructValue([(&#x27;code&#x27;, &quot;(has(request.headers[&#x27;next-action&#x27;]) || has(request.headers[&#x27;rsc-action-id&#x27;]) || request.headers[&#x27;content-type&#x27;].contains(&#x27;multipart/form-data&#x27;) || request.headers[&#x27;content-type&#x27;].contains(&#x27;application/x-www-form-urlencoded&#x27;)) &amp;&amp; evaluatePreconfiguredWaf(&#x27;cve-canary&#x27;,{&#x27;sensitivity&#x27;: 0, &#x27;opt_in_rule_ids&#x27;: [&#x27;google-mrs-v202512-id000001-rce&#x27;,&#x27;google-mrs-v202512-id000002-rce&#x27;]})&quot;), (&#x27;language&#x27;, &#x27;&#x27;), (&#x27;caption&#x27;, &lt;wagtail.rich_text.RichText object at 0x7f55c8b0e910&gt;)])]&gt;</dd> </dl></div> <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">This can be accomplished from the Google Cloud console by navigating to Cloud Armor and modifying an existing or creating a new policy.</span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--medium h-c-grid__col h-c-grid__col--4 h-c-grid__col--offset-4 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/20251205_11am_rule_1.max-1000x1000.png" alt="20251205_11am_rule (1)"> </a> <figcaption class="article-image__caption "><p data-block-key="5admg">Cloud Armor rule creation in the Google Cloud console.</p></figcaption> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><p>Alternatively, the gcloud CLI can be used to create or modify a policy with the requisite rule:</p></div> <div class="block-code"><dl> <dt>code_block</dt> <dd>&lt;ListValue: [StructValue([(&#x27;code&#x27;, &#x27;gcloud compute security-policies rules create PRIORITY_NUMBER \\\r\n --security-policy SECURITY_POLICY_NAME \\\r\n --expression &quot;(has(request.headers[\&#x27;next-action\&#x27;]) || has(request.headers[\&#x27;rsc-action-id\&#x27;]) || request.headers[\&#x27;content-type\&#x27;].contains(\&#x27;multipart/form-data\&#x27;) || request.headers[\&#x27;content-type\&#x27;].contains(\&#x27;application/x-www-form-urlencoded\&#x27;)) &amp;&amp; evaluatePreconfiguredWaf(\&#x27;cve-canary\&#x27;,{\&#x27;sensitivity\&#x27;: 0, \&#x27;opt_in_rule_ids\&#x27;: [\&#x27;google-mrs-v202512-id000001-rce\&#x27;,\&#x27;google-mrs-v202512-id000002-rce\&#x27;]})&quot; \\\r\n --action=deny-403&#x27;), (&#x27;language&#x27;, &#x27;&#x27;), (&#x27;caption&#x27;, &lt;wagtail.rich_text.RichText object at 0x7f55c8b0e1f0&gt;)])]&gt;</dd> </dl></div> <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">Additionally, if you are managing your rules with Terraform, you may implement the rule via the following syntax:</span></p></div> <div class="block-code"><dl> <dt>code_block</dt> <dd>&lt;ListValue: [StructValue([(&#x27;code&#x27;, &#x27;rule {\r\n action = &quot;deny(403)&quot;\r\n priority = &quot;PRIORITY_NUMBER&quot;\r\n match {\r\n expr {\r\n expression = &quot;(has(request.headers[\&#x27;next-action\&#x27;]) || has(request.headers[\&#x27;rsc-action-id\&#x27;]) || request.headers[\&#x27;content-type\&#x27;].contains(\&#x27;multipart/form-data\&#x27;) || request.headers[\&#x27;content-type\&#x27;].contains(\&#x27;application/x-www-form-urlencoded\&#x27;)) &amp;&amp; evaluatePreconfiguredWaf(\&#x27;cve-canary\&#x27;,{\&#x27;sensitivity\&#x27;: 0, \&#x27;opt_in_rule_ids\&#x27;: [\&#x27;google-mrs-v202512-id000001-rce\&#x27;,\&#x27;google-mrs-v202512-id000002-rce\&#x27;]})&quot;\r\n }\r\n }\r\n description = &quot;Applies protection for CVE-2025-55182 (React/Next.JS)&quot;\r\n }&#x27;), (&#x27;language&#x27;, &#x27;&#x27;), (&#x27;caption&#x27;, &lt;wagtail.rich_text.RichText object at 0x7f55c8b0ea00&gt;)])]&gt;</dd> </dl></div> <div class="block-paragraph_advanced"><h3><span style="vertical-align: baseline;">Verifying WAF rule safety for your application and consuming telemetry</span></h3> <p><span style="vertical-align: baseline;">Cloud Armor rules can be </span><a href="https://docs.cloud.google.com/armor/docs/security-policy-overview#preview_mode"><span style="text-decoration: underline; vertical-align: baseline;">configured in preview mode</span></a><span style="vertical-align: baseline;">, a logging-only mode to test or monitor the expected impact of the rule without Cloud Armor enforcing the configured action. We recommend that the new rule described above first be deployed in preview mode in your production environments so that you can see what traffic it would block. </span></p> <p><span style="vertical-align: baseline;">Once you verify that the new rule is behaving as desired in your environment, then you can disable preview mode to allow Cloud Armor to actively enforce it.</span></p> <p><span style="vertical-align: baseline;">Cloud Armor per-request WAF logs are emitted as part of the Application Load Balancer logs to Cloud Logging. To see what Cloud Armor’s decision was on every request, load balancer logging first </span><a href="https://docs.cloud.google.com/load-balancing/docs/https/https-logging-monitoring"><span style="text-decoration: underline; vertical-align: baseline;">needs to be enabled on a per backend service basis</span></a><span style="vertical-align: baseline;">. Once it is enabled, all subsequent Cloud Armor decisions will be logged and can be found in Cloud Logging by </span><a href="https://docs.cloud.google.com/armor/docs/request-logging"><span style="text-decoration: underline; vertical-align: baseline;">following these instructions</span></a><span style="vertical-align: baseline;">.</span></p> <h3><span style="vertical-align: baseline;">Interaction of Cloud Armor rules with </span><span style="vertical-align: baseline;">vulnerability</span><span style="vertical-align: baseline;"> scanning tools</span></h3> <p><span style="vertical-align: baseline;">There has been a proliferation of scanning tools designed to help identify vulnerable instances of React and Next.js in your environments. Many of those scanners are designed to identify the version number of relevant frameworks in your servers and do so by crafting a </span><span style="vertical-align: baseline;">legitimate</span><span style="vertical-align: baseline;"> query and inspecting the response from the server to detect the version of React and </span><span style="vertical-align: baseline;">Next.js</span><span style="vertical-align: baseline;"> that is running. </span></p> <p><span style="vertical-align: baseline;">Our WAF rule is designed to detect and prevent exploit attempts of </span><span style="vertical-align: baseline;">CVE-2025-55182</span><span style="vertical-align: baseline;">. As the scanners discussed above are not attempting an exploit, but sending a safe query to </span><span style="vertical-align: baseline;">elicit</span><span style="vertical-align: baseline;"> a response revealing indications of the version of the software, </span><strong style="vertical-align: baseline;">the above Cloud Armor rule will not detect or block such scanners. </strong></p> <p><span style="vertical-align: baseline;">If the findings of these scanners indicate a vulnerable instance of software protected by Cloud Armor, that does not mean that an actual exploit attempt of the vulnerability will successfully get through your Cloud Armor security policy. Instead, such findings mean that the version React or Next.js detected is known to be vulnerable and should be patched.</span></p> <h3><span style="vertical-align: baseline;">How to get started with Cloud Armor for new users</span></h3> <p><span style="vertical-align: baseline;">If your workload is already using an Application Load Balancer to receive traffic from the internet, you can configure Cloud Armor to protect your workload from this and other application-level vulnerabilities (as well as DDoS attacks) by following </span><a href="https://docs.cloud.google.com/armor/docs/configure-security-policies"><span style="text-decoration: underline; vertical-align: baseline;">these instructions</span></a><span style="vertical-align: baseline;">. </span></p> <p><span style="vertical-align: baseline;">If you are not yet using an Application Load Balancer and Cloud Armor, you can get started with the </span><a href="https://docs.cloud.google.com/load-balancing/docs/https"><span style="text-decoration: underline; vertical-align: baseline;">external Application Load Balancer overview</span></a><span style="vertical-align: baseline;">, the </span><a href="https://docs.cloud.google.com/armor/docs/security-policy-overview"><span style="text-decoration: underline; vertical-align: baseline;">Cloud Armor overview</span></a><span style="vertical-align: baseline;">, and the </span><a href="https://docs.cloud.google.com/armor/docs/best-practices"><span style="text-decoration: underline; vertical-align: baseline;">Cloud Armor best practices</span></a><span style="vertical-align: baseline;">.</span></p> <p><span style="vertical-align: baseline;">If your workload is using </span><a href="http://docs.cloud.google.com/run/"><span style="text-decoration: underline; vertical-align: baseline;">Cloud Run</span></a><span style="vertical-align: baseline;">, </span><a href="https://cloud.google.com/functions"><span style="text-decoration: underline; vertical-align: baseline;">Cloud Run functions</span></a><span style="vertical-align: baseline;">, or </span><a href="https://cloud.google.com/appengine"><span style="text-decoration: underline; vertical-align: baseline;">App Engine</span></a><span style="vertical-align: baseline;"> and receives traffic from the internet, you must first </span><a href="https://docs.cloud.google.com/load-balancing/docs/https/setup-global-ext-https-serverless"><span style="text-decoration: underline; vertical-align: baseline;">set up an Application Load Balancer in front of your endpoint</span></a><span style="vertical-align: baseline;"> to leverage Cloud Armor security policies to protect your workload. You will then need to </span><a href="https://docs.cloud.google.com/armor/docs/integrating-cloud-armor#serverless"><span style="text-decoration: underline; vertical-align: baseline;">configure the appropriate controls</span></a><span style="vertical-align: baseline;"> to ensure that Cloud Armor and the Application Load Balancer can’t be bypassed.</span></p> <h3><span style="vertical-align: baseline;">Best practices and additional risk mitigations</span></h3> <p><span style="vertical-align: baseline;">Once you configure Cloud Armor, we recommend consulting our </span><a href="https://docs.cloud.google.com/armor/docs/best-practices"><span style="text-decoration: underline; vertical-align: baseline;">best practices guide</span></a><span style="vertical-align: baseline;">. Be sure to account for </span><a href="https://docs.cloud.google.com/armor/docs/security-policy-overview#limitations"><span style="text-decoration: underline; vertical-align: baseline;">limitations</span></a><span style="vertical-align: baseline;"> </span><span style="vertical-align: baseline;">discussed in the documentation to minimize risk and optimize performance while ensuring the safety and availability of your workloads. </span></p> <h3><span style="vertical-align: baseline;">Serverless platform protections</span></h3> <p><span style="vertical-align: baseline;">Google Cloud is enforcing platform-level protections across App Engine Standard, Cloud Functions, and Cloud Run to automatically help protect against common exploit attempts of CVE-2025-55182. This protection supplements the protections already in place for Firebase Hosting and Firebase App Hosting.</span></p> <p><strong style="vertical-align: baseline;">What this means for you:</strong></p> <ul> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">Applications deployed to those serverless services benefit from these WAF rules that are enabled by default to help provide a base level of protection without requiring manual configuration.</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">These rules are designed to block known malicious payloads targeting this vulnerability.</span></p> </li> </ul> <p><strong style="vertical-align: baseline;">Important considerations:</strong></p> <ul> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">Patching is still critical:</strong><span style="vertical-align: baseline;"> These platform-level defenses are intended to be a temporary mitigation. The most effective long-term solution is to update your application's dependencies to non-vulnerable versions of React and Next.js, and redeploy them.</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">Potential impacts:</strong><span style="vertical-align: baseline;"> While unlikely, if you believe this platform-level filtering is incorrectly impacting your application's traffic, please contact </span><a href="https://support.google.com/cloud/answer/6282346" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">Google Cloud Support</span></a><span style="vertical-align: baseline;"> and reference issue number 465748820.</span></p> </li> </ul> <h3><span style="vertical-align: baseline;">Long-term mitigation: Mandatory framework update and redeployment</span></h3> <p><span style="vertical-align: baseline;">While WAF rules provide critical frontline defense, the most comprehensive long-term solution is to patch the underlying frameworks.</span></p> <p><strong style="vertical-align: baseline;">While Google Cloud is providing platform-level protections and Cloud Armor options, we urge all customers running React and Next.js applications on Google Cloud to immediately update their dependencies to the latest stable versions (React 19.2.1 or the relevant version of Next.js listed </strong><a href="https://nextjs.org/blog/CVE-2025-66478#required-action" rel="noopener" target="_blank"><strong style="text-decoration: underline; vertical-align: baseline;">here</strong></a><strong style="vertical-align: baseline;">), and redeploy their services.</strong></p> <p><span style="vertical-align: baseline;">This applies specifically to applications deployed on:</span></p> <ul> <li role="presentation"><strong style="vertical-align: baseline;">Cloud Run, Cloud Run functions, or App Engine</strong><span style="vertical-align: baseline;">: Update your application dependencies with the updated framework versions and redeploy.</span></li> <li role="presentation"><strong style="vertical-align: baseline;">Google Kubernetes Engine (GKE)</strong><span style="vertical-align: baseline;">: Update your container images with the latest framework versions and redeploy your pods.</span></li> <li role="presentation"><strong style="vertical-align: baseline;">Compute Engine</strong><span style="vertical-align: baseline;">:</span><strong style="vertical-align: baseline;"> </strong><span style="vertical-align: baseline;">The public OS images provided by Google Cloud do not have React or Next.js packages installed by default. If you have installed a custom OS with the affected packages, update your workloads to include the latest framework versions and enable WAF rules in front of all workloads.</span></li> <li role="presentation"><strong style="vertical-align: baseline;">Firebase</strong><span style="vertical-align: baseline;">:</span><strong style="vertical-align: baseline;"> </strong><span style="vertical-align: baseline;">If you’re using Cloud Functions for Firebase, Firebase Hosting, or Firebase App Hosting, update your application dependencies with the updated framework versions and redeploy. Firebase Hosting and App Hosting are also automatically enforcing a rule to limit exploitation of CVE-2025-55182 through requests to custom and default domains.</span></li> </ul> <p><span style="vertical-align: baseline;">Patching your applications is an essential step to eliminate the vulnerability at its source and ensure the continued integrity and security of your services.</span></p> <p><span style="vertical-align: baseline;">We will continue to monitor the situation closely and provide further updates and guidance as necessary. Please refer to our official </span><a href="https://docs.cloud.google.com/support/bulletins#gcp-2025-072"><span style="text-decoration: underline; vertical-align: baseline;">Google Cloud Security advisories</span></a><span style="vertical-align: baseline;"> for the most current information and detailed steps.</span></p> <p><span style="vertical-align: baseline;">If you have any questions or require assistance, please contact </span><a href="https://support.google.com/cloud/answer/6282346" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">Google Cloud Support</span></a><span style="vertical-align: baseline;"> and reference issue number 465748820.</span></p></div>
  55. Key Enterprise Architect

    Mon, 13 Oct 2025 16:00:00 -0000

    <div class="block-paragraph"><p data-block-key="6kd7s">As engineers, we all dream of perfectly resilient systems — ones that scale perfectly, provide a great user experience, and never ever go down. What if we told you the key to building these kinds of resilient systems isn't avoiding failures, but deliberately causing them? Welcome to the world of chaos engineering, where you stress test your systems by <i>introducing</i> chaos, i.e., failures, into a system under a controlled environment. In an era where downtime can cost millions and destroy reputations in minutes, the most innovative companies aren't just waiting for disasters to happen — they're causing them and learning from the resulting failures, so they can build immunity to chaos before it strikes in production.</p><p data-block-key="396qd">Chaos engineering is useful for all kinds of systems, but particularly for cloud-based distributed ones. Modern architectures have evolved from monolithic to microservices-based systems, often comprising hundreds or thousands of services. These complex service dependencies introduce multiple points of failure, and it’s difficult if not impossible to predict all the possible failure modes through traditional testing methods. When these applications are deployed on the cloud, they are deployed across multiple availability zones and regions. This increases the likelihood of failure due to the highly distributed nature of cloud environments and the large number of services that coexist within them.</p><p data-block-key="93kcq">A common misconception is that cloud environments automatically provide application resiliency, eliminating the need for testing. Although cloud providers do offer various levels of resiliency and SLAs for their cloud products, these alone do not guarantee that your business applications are protected. If applications are not designed to be fault-tolerant or if they assume constant availability of cloud services, they will fail when a particular cloud service they depend on is not available.</p><p data-block-key="62d5j">In short, chaos engineering can take a team's worst "what if?" scenarios and transform them into well-rehearsed responses. Chaos engineering isn’t about breaking systems — engineering chaotically, as it were — it's about building teams that face production incidents with the calm confidence that only comes from having weathered that chaos before, albeit in controlled conditions.</p><p data-block-key="aipko">Google Cloud’s Professional Service Organization (PSO) Enterprise Architecture team consults on and provides hands-on expertise on customers’ cloud transformation journeys, including application development, cloud migrations, and enterprise architecture. And when advising on designing resilient architecture for cloud environments, we routinely introduce the principles and practices of chaos engineering and Site Reliability Engineering (SRE) practices.</p><p data-block-key="6ro3d">In this first blog post in a series, we explain the basics of chaos engineering — what it is and its core principles and elements. We then explore how chaos engineering is particularly helpful and important for teams running distributed applications in the cloud. Finally, we’ll talk about how to get started, and point you to further resources.</p><h2 data-block-key="pqp"><b>Understanding chaos engineering</b></h2><p data-block-key="fun25">Chaos engineering is a methodology invented by Netflix in 2010 when it created and popularized ‘Chaos Monkey’ to address the need to build more resilient and reliable systems in the face of increasing complexity in their AWS environment. Around the same time, Google introduced Disaster Resilience Testing, or DiRT, which enabled continuous and automated disaster readiness, response, and recovery of Google’s business, systems, and data. Here on Google Cloud’s PSO team, we offer various services to help customers implement DiRT as part of SRE practices. These offerings also include training on how to perform DiRT on applications and systems operating on Google Cloud. The central concept is straightforward: deliberately introduce controlled disruptions into a system to identify vulnerabilities, evaluate its resilience, and enhance its overall reliability.</p><p data-block-key="6t531">As a proactive discipline, chaos engineering enables organizations to identify weaknesses in their systems before they lead to significant outages or failures, where a system includes not only the technology components but also the people and processes of an organization. By introducing controlled, real-world disruptions, chaos engineering helps test a system's robustness, recoverability, and fault tolerance. This approach allows teams to uncover potential vulnerabilities, so that systems are better equipped to handle unexpected events and continue functioning smoothly under stress.</p><h3 data-block-key="59nsr"><b>Principles and practices of chaos engineering</b></h3><p data-block-key="df1o7">Chaos engineering is guided by a set of core principles about why it should be done, while practices define what needs to be done.</p><p data-block-key="8ao4o">Below are the principles of chaos engineering:</p><ol><li data-block-key="ftol1"><b>Build a hypothesis around steady state</b>: Prior to initiating any disruptive actions, you need to define what "normal" looks like for your system, commonly referred to as the "steady state hypothesis."</li><li data-block-key="6vvb8"><b>Replicate real-world conditions</b>: Chaos experiments should emulate realistic failure scenarios that the system might encounter in a production environment.</li><li data-block-key="decbe"><b>Run experiments in production</b>: Chaos engineering is firmly rooted in the belief that only a production environment with real traffic and dependencies can provide an accurate picture of resiliency. This is what separates chaos engineering from traditional testing.</li><li data-block-key="3de29"><b>Automate experiments:</b> Make resiliency testing part of a continuous ongoing process rather than a one-off test.</li><li data-block-key="am2bk"><b>Determine the blast radius</b>: Experiments should be meticulously designed to minimize adverse impacts on production systems. This requires categorizing applications and services in different tiers based on the impact the experiments can have on customers and other applications and services.</li></ol><p data-block-key="hldj">With these principles established, follow these practices when conducting a chaos engineering experiment:</p><ol><li data-block-key="1bkn"><b>Define steady state:</b> Identifies the specific metrics (e.g., latency, throughput) that you will look at and establish a baseline for them.</li><li data-block-key="c86r7"><b>Formulate a hypothesis</b>: This is the practice of creating a single testable statement, for example, ‘By deleting this container pod, user login will not be affected’. Hypotheses are generally created by identifying customer user journeys and deriving test scenarios from them.</li><li data-block-key="39bql"><b>Use a controlled environment:</b> While one chaos engineering principle states that experiments need to run in production, you should still start small and run your experiment in a non-production environment first, learn and adjust, and then gradually expand the scope to production environment.</li><li data-block-key="gtlb"><b>Inject failures</b>: This is the practice of causing disruption by injecting failures either directly into the system (e.g., deleting a VM, stopping a database instance) or indirectly by injecting failures in the environment (e.g. deleting a network route, adding a firewall rule).</li><li data-block-key="1410c"><b>Automate experimental execution</b>: Automation is crucial for establishing chaos engineering as a repeatable and scalable practice. This includes using automated tools for fault injection (e.g., making it part of a CI/CD pipeline) and automated rollback mechanisms.</li><li data-block-key="58mg2"><b>Derive actionable insights</b>: The primary objective of using chaos engineering is to gain insights into system vulnerabilities, thereby enhancing resilience. This involves rigorous analysis of experimental results; identifying weaknesses and areas for improvement; and disseminating findings to relevant teams to inform subsequent experimental design and system enhancements.</li></ol><p data-block-key="fh7in">In other words, chaos engineering isn't about breaking things for the sake of it, but about building more resilient systems by understanding their limitations and addressing them proactively.</p><h3 data-block-key="ftslk"><b>Elements of chaos engineering</b></h3><p data-block-key="evq8f">Here are the core elements you'll use in a chaos engineering experiment, derived from these five principles:</p><ul><li data-block-key="2isvq"><b>Experiments</b>: A chaos experiment constitutes a deliberate, pre-planned procedure wherein faults are introduced into a system to ascertain its response.</li><li data-block-key="d6djm"><b>Steady-state hypotheses</b>: A steady-state hypothesis defines the baseline operational state, or "normal" behavior, of the system under evaluation.</li><li data-block-key="3d8o5"><b>Actions</b>: An action represents a specific operation executed upon the system being experimented on.</li><li data-block-key="bpbv8"><b>Probes</b>: A probe provides a mechanism for observing defined conditions within the system during experimentation.</li><li data-block-key="f50fb"><b>Rollbacks</b>: An experiment may incorporate a sequence of actions designed to reverse any modifications implemented during the experiment.</li></ul><h2 data-block-key="327mk"><b>Getting started with chaos engineering</b></h2><p data-block-key="123gj">Now that you have a good understanding of chaos engineering and why to use it in your cloud environment, the next step is to try it out for yourself in your own development environment.</p><p data-block-key="6i4s2">There are multiple chaos engineering solutions in the market; some are paid products and some are open-source frameworks. To get started quickly, we recommend that you use <a href="https://chaostoolkit.org/" target="_blank">Chaos Toolkit</a> as your chaos engineering framework.</p><p data-block-key="atl4d">Chaos Toolkit is an open-source framework written in Python that provides a modular architecture where you can plug in other libraries (also known as ‘drivers’) to extend your chaos engineering experiments. For example, there are extension libraries for <a href="https://chaostoolkit.org/drivers/gcp/" target="_blank">Google Cloud</a>, <a href="https://chaostoolkit.org/drivers/kubernetes/" target="_blank">Kubernetes</a>, and many other technologies. Since Chaos Toolkit is a Python-based developer tool, you can begin by configuring your Python environment. You can find a good example of a Chaos Toolkit experiment and step-by-step explanation <a href="https://chaostoolkit.org/reference/tutorial/#getting-started-with-the-chaos-toolkit" target="_blank">here</a>.</p><p data-block-key="r2pl">Finally, to enable Google Cloud customers and engineers to introduce chaos testing in their applications, we’ve created a series of Google Cloud-specific chaos engineering recipes. Each recipe covers a specific scenario to introduce chaos in a particular Google Cloud service. For example, one recipe covers introducing chaos in an application/service running behind a Google Cloud internal or external application load balancer; another recipe covers simulating a network outage between an application running on Cloud Run and connecting to a Cloud SQL database by leveraging another Chaos Toolkit extension named <a href="https://chaostoolkit.org/drivers/toxiproxy/" target="_blank">ToxiProxy</a>.</p><p data-block-key="7bkoj">You can find a complete collection of recipes, including step-by-step instructions, scripts, and sample code, to learn how to introduce chaos engineering in your Google Cloud environment on <a href="https://github.com/GoogleCloudPlatform/chaos-engineering/blob/main/Chaos-Engineering-Recipes-Book.md" target="_blank">GitHub</a>. Then, stay tuned for subsequent posts, where we’ll talk about chaos engineering techniques, such as how to introduce faults into your Google Cloud environment.</p></div>
  56. Researcher

    Tue, 23 Sep 2025 14:00:00 -0000

    <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">Today, we are excited to announce the </span><a href="http://cloud.google.com/dora"><span style="text-decoration: underline; vertical-align: baseline;">2025 DORA Report: State of AI-assisted Software Development</span></a><span style="vertical-align: baseline;">. Drawing on insights from over 100 hours of qualitative data and survey responses from nearly 5,000 technology professionals from around the world. </span></p> <p><span style="vertical-align: baseline;">The report reveals a key insight: AI doesn't fix a team; it amplifies what's already there. Strong teams use AI to become even better and more efficient. Struggling teams will find that AI only highlights and intensifies their existing problems. The greatest return comes not from the AI tools themselves, but from a strategic focus on the quality of internal platforms, the clarity of workflows, and the alignment of teams.</span></p> <h3><strong style="vertical-align: baseline;">AI, the great amplifier</strong></h3> <p><span style="vertical-align: baseline;">As we established from the </span><a href="https://dora.dev/research/2024/" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">2024 report</span></a><span style="vertical-align: baseline;"> as well as the special report published this year called </span><a href="https://dora.dev/research/ai/" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">“Impact of Generative AI in Software Development”</span></a><span style="vertical-align: baseline;">, organizations are continuing to heavily adopt AI and receive substantial benefits across important outcomes. And there is evidence of learning to better integrate these tools into our workflow. Unlike last year, we observe a positive relationship between AI adoption on both software delivery throughput and product performance. It appears that people, teams, and tools are learning where, when, and how AI is most useful. However, AI adoption does continue to have a negative relationship with software delivery stability.</span></p> <p><span style="vertical-align: baseline;">This confirms our central theory - AI accelerates software development, but that acceleration can expose weaknesses downstream. Without robust control systems, like strong automated testing, mature version control practices, and fast feedback loops, an increase in change volume leads to instability. Teams working in loosely coupled architectures with fast feedback loops see gains, while those constrained by tightly coupled systems and slow processes see little or no benefit.</span></p> <p><strong style="vertical-align: baseline;">Key findings from the 2025 report</strong></p> <p><span style="vertical-align: baseline;">Beyond this central theme, this year’s research highlighted the following about modern software development:</span></p> <ul> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">AI adoption is near-universal</strong><span style="vertical-align: baseline;">: 90% of survey respondents report using AI at work. More than 80% believe it has increased their productivity. However, skepticism remains as 30% report little or no trust in the code generated by AI, a slightly lower percentage than last year but a key trend to note.</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">User-centricity is a prerequisite for AI success</strong><span style="vertical-align: baseline;">: AI becomes most useful when it's pointed at a clear problem, and a user-centric focus provides that essential direction. Our data shows this focus amplifies AI’s positive influence on team performance.</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">Platform engineering is the foundation</strong><span style="vertical-align: baseline;">: Our data shows that 90% of organizations have adopted at least one platform and there is a direct correlation between a high quality internal platform and an organization’s ability to unlock the value of AI, making it an essential foundation for success.</span></p> </li> </ul> <h3><strong style="vertical-align: baseline;">The seven team archetypes</strong></h3> <p><span style="vertical-align: baseline;">Simple software delivery metrics alone aren’t sufficient. They tell you what is happening but not why it’s happening. To connect performance data to experience, we conducted a cluster analysis that reveals seven common team profiles or archetypes, each with a unique interplay of performance, stability, and well-being. This model provides leaders with a way to diagnose team health and apply the right interventions. </span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/2_YtpOb3P.max-1000x1000.jpg" alt="2"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">The ‘Foundational challenges’ group are trapped in survival mode and face significant gaps in their processes and environment, leading to low performance, high system stability, and high levels of burnout and friction. While the ‘Harmonious high achievers’ excel across multiple areas, showing positive metrics for team well-being, product outcomes, and software delivery. </span></p> <p><span style="vertical-align: baseline;">Read more details of each archetype in the "Understanding your software delivery performance: A look at seven team profiles" chapter of the report.</span></p> <h3><strong style="vertical-align: baseline;">Unlocking the value of AI with the ‘DORA AI Capabilities Model’</strong></h3> <p><span style="vertical-align: baseline;">This year, we went beyond identifying AI’s impact to investigating the conditions in which AI-assisted technology-professionals  realize the best outcomes. The value of AI is unlocked not by the tools themselves, but by the surrounding technical practices and cultural environment.</span></p> <p><span style="vertical-align: baseline;">Our research identified seven capabilities that are shown to magnify the positive impact of AI in organizations.</span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/DORA_inline_2.max-1000x1000.png" alt="image1"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><h3><strong style="vertical-align: baseline;">Where leaders should get started</strong></h3> <p><span style="vertical-align: baseline;">One of the key insights derived from the research this year is that the value of AI will be unlocked by reimagining the system of work it inhabits. Technology leaders should treat AI adoption as an organizational transformation.</span></p> <p><span style="vertical-align: baseline;">Here’s where we suggest you begin:</span></p> <ul> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">Clarify and socialize your AI policies</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">Connect AI to your internal context</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">Prioritize foundational practices</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">Fortify your safety nets</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">Invest in your internal platform</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">Focus on your end-users</span></p> </li> </ul> <p><span style="vertical-align: baseline;">The </span><a href="https://dora.dev/" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">DORA research program</span></a><span style="vertical-align: baseline;"> is committed to serving as a compass to teams and organizations as we navigate the important and transformative period with AI. We hope the new team profiles and the DORA AI capabilities model provide a clear roadmap for you to move beyond simply adopting AI to unlocking its value by investing in teams and people. We look forward to learning how you put these insights into practice. To learn more:</span></p> <ul> <li role="presentation"><a href="http://cloud.google.com/dora"><span style="text-decoration: underline; vertical-align: baseline;">Download</span></a><span style="vertical-align: baseline;"> the full report</span></li> <li role="presentation"><span style="vertical-align: baseline;">Join the </span><a href="https://dora.community/" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">DORA community</span></a></li> <li><span style="vertical-align: baseline;">Share this </span><a href="https://dora.dev/research/2025/" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">overview</span></a><span style="vertical-align: baseline;"> with your colleagues</span></li> </ul></div>
  57. Cloud Solutions Architect Manager, Google Cloud

    Wed, 13 Aug 2025 16:00:00 -0000

    <div class="block-paragraph"><p data-block-key="bgr19">What guides your approach to software development? In our roles at Google, we’re constantly working to build better software, faster. Within Google, our Developer Platform team and Google Cloud have a strategic partnership and a shared strategy: together, we take our internal capabilities and engineering tools and package them up for Google Cloud customers.</p><p data-block-key="e2l3s">At the heart of this is understanding the many ways that software teams, big and small, need to balance efficiency, quality, and cost, all while delivering value. In our recent <a href="https://www.youtube.com/watch?v=T6a9gPSoqxo" target="_blank">talk at PlatformCon 2025</a>, we shared key parts of our platform strategy, which we call “shift down.”</p><p data-block-key="d6oe8"><b>Shift down is an approach that advocates for embedding decisions and responsibilities into underlying internal developer platforms (IDPs)</b>, thereby reducing the operational burden on developers. This contrasts with the <a href="https://cloud.google.com/devops">DevOps</a> trend of "shift left," which pushes more effort earlier into the development cycle, a method that is proving difficult at scale due to the sheer volume and rate of change in requirements. Our shift down strategy helps us maximize value with existing resources so businesses can achieve high innovation velocity with acceptable quality, acceptable risk, and sustainable costs across a diverse range of business models. In the talk, we share learnings that have been really helpful to us in our software and <a href="https://cloud.google.com/solutions/platform-engineering">platform engineering</a> journey:</p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/image1_98vVMdt.max-1000x1000.jpg" alt="image1"> </a> </figure> </div> </div> </div> <div class="block-aside"><dl> <dt>aside_block</dt> <dd>&lt;ListValue: []&gt;</dd> </dl></div> <div class="block-paragraph"><ol><li data-block-key="bgr19"><b>Work backwards from the business model:</b> By starting with the business model, organizations can intentionally guide platform evolution and investment to align with desired margins, risk tolerance, and quality requirements. At Google, our central platform must support diverse business models, necessitating continuous strategic refinement and adaptation.</li><li data-block-key="fs6ra"><b>Focus on quality attributes for central software control:</b> Quality attributes, such as reliability, security, efficiency, and performance, are <a href="https://en.wikipedia.org/wiki/Emergence" target="_blank">emergent</a> properties of software systems and are important for creating business value and managing risk. These are often referred to as “non-functional requirements” because they define how our software behaves, not what it functionally does. With a shift down strategy, we can embed the responsibility for assuring quality attributes directly into the underlying platform systems and infrastructure, thereby significantly reducing the operational burden on individual developers.</li><li data-block-key="5a5sh"><b>Abstractions and coupling are key technical tools to gain control of quality attributes:</b> We define two key technical components in the way we build platforms: <i>abstractions</i> and <i>coupling</i>. In a shift down strategy, abstractions provide understandability, risk management levers, accountability, and cost control by encapsulating complexity. Coupling refers to the interconnectedness and interdependence of components within a system or development ecosystem. For a successful shift down strategy, the right degree of coupling is crucial because it allows the development platform and ecosystem design to directly implement and influence quality attributes. In fact, coupling is how we offer entire infrastructure and platform solutions as coherent services like <a href="https://cloud.google.com/kubernetes-engine">Google Kubernetes Engine</a> (GKE).</li><li data-block-key="2pktp"><b>Shared responsibility, education, and policy are equally important social tools:</b> Shared responsibility is a crucial social tool within software at scale. This is actively cultivated through education, such as training engineers on platform and AI usage, and fostering a "one team" culture that encourages a shift from artifact-bound identities to overarching mission goals and client-focused engagement. Furthermore, explicit policies like centrally enforced style guides and secure-by-design APIs are fundamental for embedding quality attribute assurance directly into the platform and infrastructure, significantly reducing the operational burden on individual developers by ensuring consistency and automated controls at scale.</li><li data-block-key="bh7kd"><b>Use a map.</b> Supporting many business units with one platform is a vast and complex problem; we need a map. The ecosystem model is a framework that categorizes different types of software development environments, ranging from highly flexible, developer-controlled systems to highly opinionated, vertically integrated ones where the ecosystem itself assures quality attributes. Its critical purpose is to provide a visual and conceptual tool for evaluating how well our ecosystem controls match our business risk. This helps us ensure that the level of oversight and assurance of quality attributes aligns with the potential cost of mistakes. The goal is to be in the "ecosystem effectiveness zone," where controls are balanced to mitigate significant risks from human error without imposing overly restrictive systems that negatively impact velocity and developer satisfaction.</li></ol></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/1_xiA9TUH.max-1000x1000.png" alt="1"> </a> </figure> </div> </div> </div> <div class="block-paragraph"><p data-block-key="bgr19">6. <b>Divide up the problem space by identifying different platform and ecosystem types.</b></p><p data-block-key="dk549">Because the developer experience and platform infrastructure change with scale and degree of shifting down, it’s not enough to just know where the ecosystem effectiveness zone is — you have to identify the ecosystem by type. We differentiate ecosystem types by the degree of oversight and assurance for quality attributes. As an ecosystem becomes more vertically integrated, such as Google's highly optimized "Assured" (Type 4) ecosystem, the platform itself assumes increasing responsibility for vital quality attributes, allowing specialists like site reliability engineers (SRE) and security teams to have full ownership in taking action through large-scale observability and embedded capabilities. Conversely, in less uniform "YOLO," "AdHoc," or "Guided" (Type 0-2) ecosystems, developers have more responsibility for assuring these attributes, while central specialist teams have less direct control and enforcement mechanisms are less pervasive. It’s really important to note here that this is <b>not</b> a maturity model — the best ecosystem and platform type is the one that best fits your business need (see point #1 above!).</p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/2_SQqhW9d.max-1000x1000.png" alt="2"> </a> </figure> </div> </div> </div> <div class="block-paragraph"><h3 data-block-key="bgr19"><b>Intentional choices in platform engineering</b></h3><p data-block-key="2cujr">The most important takeaway is to make active choices. Tailor platform engineering for each business unit and application to achieve the best outcomes. Place critical emphasis on identifying and solving stable sub-problems in reliable, reusable ways across various business problems. This approach directly underpins our "shift down" strategy, moving toward composable platforms that embed decisions and responsibilities for software quality directly into the underlying platform infrastructure, thereby improving our ability to maximize business value with the right resources, at the right quality level, and with sustainable costs.</p><p data-block-key="8q0du"><a href="https://www.youtube.com/watch?v=T6a9gPSoqxo" target="_blank">Watch our full discussion</a> for more insights on effective platform engineering.</p></div>
  58. Product Manager

    Mon, 04 Aug 2025 16:00:00 -0000

    <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">Application owners are looking for three things when they think about optimizing cloud costs:</span></p> <ol> <li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">What are the most expensive resources?</span></p> </li> <li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">Which resources are costing me more this week or month?</span></p> </li> <li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">Which resources are poorly utilized?</span></p> </li> </ol> <p><span style="vertical-align: baseline;">To help you answer these questions quickly and easily, we </span><a href="https://cloud.google.com/blog/products/application-development/an-application-centric-ai-powered-cloud?e=13802955"><span style="text-decoration: underline; vertical-align: baseline;">announced</span></a><span style="vertical-align: baseline;"> Cloud Hub Optimization and Cost Explorer, in private preview, at Google Cloud Next 2025. And today, we are excited to announce that both Cloud Hub Optimization and Cost Explorer are now in public preview.</span></p> <h2><span style="vertical-align: baseline;">Application cost and utilization</span></h2> <p><span style="vertical-align: baseline;">As an app owner, your primary objective is keeping your application healthy at all times. Yet, monitoring all the individual components of your application, which may straddle dozens of Projects, can be quite overwhelming. </span><a href="https://cloud.google.com/products/app-hub"><span style="text-decoration: underline; vertical-align: baseline;">AppHub Applications</span></a><span style="vertical-align: baseline;"> allow you to reorganize cloud around your application, giving you the information and controls you need at your fingertips.</span></p> <p><span style="vertical-align: baseline;">In addition to supporting Google Cloud Projects, Cloud Hub Optimization and Cost Explorer leverage </span><a href="https://cloud.google.com/products/app-hub"><span style="text-decoration: underline; vertical-align: baseline;">App Hub</span></a><span style="vertical-align: baseline;"> applications to show you the cost-efficiency of your application’s workloads and services instantly. This is great for instance when you are trying to pinpoint deployments running on GKE clusters that might be wasting valuable resources, such as GPUs.</span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/1_CHO_utilization_summary_app.max-1000x1000.jpg" alt="1_CHO_utilization summary app"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><h2><span style="vertical-align: baseline;">Not just another cost dashboard</span></h2> <p><span style="vertical-align: baseline;">When you bring up Cloud Hub Optimization, you can immediately see the resources that are costing you the most, along with the percentage change in their cost. With this highly granular cost information, you can now attribute your costs to specific resources and resource owners to reason about any changes in costs.</span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/2_CHO_cost_summary.max-1000x1000.jpg" alt="2_CHO_cost summary"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">We have additionally integrated granular cost data from Cloud Billing and resource utilization data from Cloud Monitoring to give you a comprehensive picture of your cost efficiency. This includes average vCPU utilization for your Project, which helps you find the most promising optimization candidates across hundreds of Google Cloud Projects.</span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/3_CHO_utilization_summary_project.max-1000x1000.jpg" alt="3_CHO_utilization summary project"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">The Cost Explorer dashboard also shows you your costs logically organized at the product level, for even more cost explainability. Instead of seeing a lump sum cost for Compute Engine, you can now see your exact spend on individual products including Google Kubernetes Engine (GKE) clusters, Persistent Disks, Cloud Load Balancing, and more.</span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/4_CHO_cost_explorer.max-1000x1000.jpg" alt="4_CHO_cost explorer"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><h2><strong style="vertical-align: baseline;">Simple is powerful</strong></h2> <p><span style="vertical-align: baseline;">Customers who have tried these new tools love the information that is surfaced as well as the simplicity of the interfaces.</span></p> <p style="padding-left: 40px;"><span style="font-style: italic; vertical-align: baseline;">“My team has to keep an eye on cloud costs across tens of business units and hundreds of developers. The Cloud Hub Optimization and Cost Explorer dashboards are a force multiplier for my team as they tell us where to look for cost savings and potential optimization opportunities.”</span><span style="vertical-align: baseline;"> - Frank Dice, Principal Cloud Architect, Major League Baseball</span></p> <p><span style="vertical-align: baseline;">Customers especially appreciate the </span><a href="https://cloud.google.com/stackdriver/docs/costs/optimize-costs#supported_products"><span style="text-decoration: underline; vertical-align: baseline;">breadth of product coverage</span></a><span style="vertical-align: baseline;"> available out of the box without any additional setup, and the fact that there is no additional charge to using these features.</span></p> <h2><strong style="vertical-align: baseline;">What’s next</strong></h2> <p><span style="vertical-align: baseline;">As your organization “shifts left” on cloud cost management, we are working to help application owners and developers understand and optimize their cloud costs. You can try Cloud Hub Optimize and Cost Explorer </span><a href="https://console.cloud.google.com/cloud-hub/optimization"><span style="text-decoration: underline; vertical-align: baseline;">here</span></a><span style="vertical-align: baseline;">.</span></p> <p><span style="vertical-align: baseline;">You can also see a live demo of how Cloud Hub Optimization and Cost Explorer can be used to identify underutilized GKE clusters within seconds in the Google Cloud Next 2025 talk Maximize Your Cloud ROI.</span></p></div> <div class="block-video"> <div class="article-module article-video "> <figure> <a class="h-c-video h-c-video--marquee" href="https://youtube.com/watch?v=7csgD3iIc2Q" data-glue-modal-trigger="uni-modal-7csgD3iIc2Q-" data-glue-modal-disabled-on-mobile="true"> <div class="article-video__aspect-image" style="background-image: url(https://storage.googleapis.com/gweb-cloudblog-publish/images/maxresdefault_LGJSUja.max-1000x1000.jpg);"> <span class="h-u-visually-hidden">Maximize your cloud ROI: A practical approach to efficiency and optimization</span> </div> <svg role="img" class="h-c-video__play h-c-icon h-c-icon--color-white"> <use xlink:href="#mi-youtube-icon"></use> </svg> </a> </figure> </div> <div class="h-c-modal--video" data-glue-modal="uni-modal-7csgD3iIc2Q-" data-glue-modal-close-label="Close Dialog"> <a class="glue-yt-video" data-glue-yt-video-autoplay="true" data-glue-yt-video-height="99%" data-glue-yt-video-vid="7csgD3iIc2Q" data-glue-yt-video-width="100%" href="https://youtube.com/watch?v=7csgD3iIc2Q" ng-cloak> </a> </div> </div> <div class="block-paragraph_advanced"><hr/> <p><sup><span style="font-style: italic; vertical-align: baseline;">Major League Baseball trademarks and copyrights are used with permission of Major League Baseball. Visit MLB.com.</span></sup></p></div>
  59. Senior Product Manager

    Fri, 01 Aug 2025 16:00:00 -0000

    <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">Are you ready to unlock the power of Google Cloud and want guidance on how to set up your environment effectively? Whether you're a cloud novice or part of an experienced team looking to migrate critical workloads, getting your foundational infrastructure right is the key to success. That's where </span><a href="https://cloud.google.com/docs/enterprise/setup-checklist"><strong style="text-decoration: underline; vertical-align: baseline;">Google Cloud Setup</strong></a><span style="vertical-align: baseline;"> comes in — your guided pathway to a secure cloud foundation and quick start on Google Cloud.</span></p> <p><span style="vertical-align: baseline;">Google Cloud Setup helps you quickly implement Google Cloud's recommended best practices. Our goal is to provide a fast and easy path to deploying your workloads without unnecessary configuration effort. Think of it as your expert guide, walking you through the essential first steps so you can focus on what truly matters: rapidly deploying your innovative applications and services. To help you get started without financial barriers, all components and service integrations enabled during the setup process are free or include some level of no-cost access.</span></p></div> <div class="block-aside"><dl> <dt>aside_block</dt> <dd>&lt;ListValue: [StructValue([(&#x27;title&#x27;, &#x27;Try Google Cloud for free&#x27;), (&#x27;body&#x27;, &lt;wagtail.rich_text.RichText object at 0x7f55ca3e4df0&gt;), (&#x27;btn_text&#x27;, &#x27;Get started for free&#x27;), (&#x27;href&#x27;, &#x27;https://console.cloud.google.com/freetrial?redirectPath=/welcome&#x27;), (&#x27;image&#x27;, None)])]&gt;</dd> </dl></div> <div class="block-paragraph_advanced"><h3><strong style="vertical-align: baseline;">Choose the foundation that fits your needs</strong></h3> <p><span style="vertical-align: baseline;">We understand that every organization and project has unique requirements. That's why Cloud Setup offers three distinct guided flows to choose from:</span></p> <ul> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">Proof-of-concept:</strong><span style="vertical-align: baseline;"> Designed for users who want to set up a lightweight environment to explore Google Cloud and run initial tests or sandbox workloads. This flow focuses on the minimum configuration to get you started quickly.</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">Production:</strong><span style="vertical-align: baseline;"> This flow is recommended for supporting production-ready workloads with security and scalability in mind. It aligns with Google Cloud’s best practices and is tailored for administrators setting up basic foundational infrastructure for production workloads.</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">Enhanced security:</strong><span style="vertical-align: baseline;"> Designed for organizations, regions or workloads with advanced security and compliance requirements, this flow defaults to more advanced security controls and is designed to help you meet rigorous requirements. Even this advanced foundation sets you up with a perpetual free tier up to certain usage limits.</span></p> </li> </ul></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/image1_LQ4uQKn.max-1000x1000.png" alt="1"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><h3><strong style="vertical-align: baseline;">Building blocks for a solid foundation</strong></h3> <p><span style="vertical-align: baseline;">Cloud Setup guides you through a series of onboarding steps, presenting defaults backed by</span><strong style="vertical-align: baseline;"> </strong><a href="https://cloud.google.com/security/best-practices"><strong style="text-decoration: underline; vertical-align: baseline;">Google Cloud best practices</strong></a><span style="vertical-align: baseline;">. Throughout the process, you'll also encounter key features designed to help protect your organization and prepare it for growth, including:</span></p> <ul> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><a href="https://cloud.google.com/kms/docs/kms-autokey"><strong style="text-decoration: underline; vertical-align: baseline;">Cloud KMS AutoKey</strong></a><strong style="vertical-align: baseline;">:</strong><span style="vertical-align: baseline;"> Automates the provisioning and assignment of customer-managed encryption keys (CMEK).</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><a href="https://cloud.google.com/security/products/security-command-center"><strong style="text-decoration: underline; vertical-align: baseline;">Security Command Center</strong></a><strong style="vertical-align: baseline;">: </strong><span style="vertical-align: baseline;">Provides security posture management for Google Cloud deployments including automatic project scanning for security issues such as open ports and misconfigured access controls.</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><a href="https://cloud.google.com/docs/observability"><strong style="text-decoration: underline; vertical-align: baseline;">Centralized Logging and Monitoring</strong></a><strong style="vertical-align: baseline;">:</strong><span style="vertical-align: baseline;"> Enables you to easily set up infrastructure to monitor your system's health and performance from a central location — critical for audit logging compliance and visualizing metrics across projects.</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><a href="https://cloud.google.com/vpc/docs/shared-vpc"><strong style="text-decoration: underline; vertical-align: baseline;">Shared VPC Networks</strong></a><strong style="vertical-align: baseline;">: </strong><span style="vertical-align: baseline;">Allows you to establish a centralized network across multiple projects, enabling secure and efficient communication between your Google Cloud resources.</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><a href="https://cloud.google.com/hybrid-connectivity"><strong style="text-decoration: underline; vertical-align: baseline;">Hybrid Connectivity</strong></a><strong style="vertical-align: baseline;">:</strong><span style="vertical-align: baseline;"> Facilitates connecting your Google Cloud environment to your on-premises infrastructure or other cloud providers. This is often a critical step for workload migrations.</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><a href="https://cloud.google.com/support"><strong style="text-decoration: underline; vertical-align: baseline;">Support plan</strong></a><strong style="vertical-align: baseline;">:</strong><span style="vertical-align: baseline;"> Enables you to quickly resolve any issues with help from experts at Google Cloud.</span></p> </li> </ul> <p><span style="vertical-align: baseline;">At the end of the guided flow, you can deploy your configuration directly via the Google Cloud console or download a </span><a href="https://cloud.google.com/docs/enterprise/deploy-foundation-using-terraform-from-console"><span style="text-decoration: underline; vertical-align: baseline;">Terraform configuration file</span></a><span style="vertical-align: baseline;"> for later deployment using other Infrastructure as Code (IaC) methods.</span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/2_RwqPvpA.gif" alt="2"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><h3><strong style="vertical-align: baseline;">Experience the cloud faster and smarter</strong></h3> <p><span style="vertical-align: baseline;">Organizations using Cloud Setup experience enjoy:</span></p> <ul> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">Faster application deployment: </strong><span style="vertical-align: baseline;">By simplifying the initial setup, you can get your applications up and running more quickly, accelerating your cloud journey.</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">Reduced setup effort:</strong><span style="vertical-align: baseline;"> Our streamlined flow significantly reduces the number of manual steps, allowing you to establish a basic foundation with less effort.</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">Greater access to Google Cloud's full potential: </strong><span style="vertical-align: baseline;">By establishing a solid foundation quickly, you can more easily explore and leverage a wider range of Google Cloud services to meet your evolving needs and unlock greater value.</span></p> </li> </ul> <p><span style="vertical-align: baseline;">Ready to start your Google Cloud journey? Visit Google Cloud Setup today for a streamlined path to a secure cloud foundation. Let us guide you through the initial steps so you can focus on innovation and growth.</span></p> <p><span style="vertical-align: baseline;">To learn more, visit:</span></p> <ul> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><a href="https://cloud.google.com/docs/enterprise/setup-checklist"><span style="text-decoration: underline; vertical-align: baseline;">Cloud Setup documentation</span></a></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><a href="https://console.cloud.google.com/cloud-setup/overview" style="font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, Cantarell, 'Open Sans', 'Helvetica Neue', sans-serif;"><span style="text-decoration: underline; vertical-align: baseline;">Cloud Setup overview</span></a><span style="vertical-align: baseline;"> (requires login)</span></p> </li> </ul></div>
  60. Product Manager

    Fri, 18 Jul 2025 16:00:00 -0000

    <div class="block-paragraph_advanced"><p style="text-align: justify;"><span style="vertical-align: baseline;">As developers and operators, you know that having access to the right information in the proper context is crucial for effective troubleshooting. This is why organizations invest a lot upfront curating monitoring resources across different business units: so information is easy to find and contextualize when needed.</span></p> <p style="text-align: justify;"><span style="vertical-align: baseline;">Today we are reducing the need for this upfront investment with an out-of-the-box </span><strong style="vertical-align: baseline;">Application Monitoring</strong><span style="vertical-align: baseline;"> experience for your organization on Google Cloud within </span><a href="https://cloud.google.com/stackdriver/docs"><span style="text-decoration: underline; vertical-align: baseline;">Cloud Observability</span></a><span style="vertical-align: baseline;">. </span></p> <p style="text-align: justify;"><span style="vertical-align: baseline;">Application Monitoring consists of a set of pre-curated dashboards with relevant metrics and logs mapped to a user-defined application in </span><a href="https://cloud.google.com/products/app-hub"><span style="text-decoration: underline; vertical-align: baseline;">App Hub</span></a><span style="vertical-align: baseline;">. It incorporates best practices pioneered by Google Site Reliability Engineers (SRE) to optimize manual troubleshooting and unlock AI-assisted troubleshooting.</span></p> <p><span style="vertical-align: baseline;">Application Monitoring automatically labels and brings together key telemetry for your application into a centralized experience, making it easy to discover, filter and correlate trends. It also feeds application context into </span><a href="https://cloud.google.com/gemini/docs/cloud-assist/investigations"><span style="text-decoration: underline; vertical-align: baseline;">Gemini Cloud Assist Investigations</span></a><span style="vertical-align: baseline;">, for AI-assisted troubleshooting. </span></p></div> <div class="block-aside"><dl> <dt>aside_block</dt> <dd>&lt;ListValue: [StructValue([(&#x27;title&#x27;, &#x27;Try Google Cloud for free&#x27;), (&#x27;body&#x27;, &lt;wagtail.rich_text.RichText object at 0x7f55c86f1c70&gt;), (&#x27;btn_text&#x27;, &#x27;Get started for free&#x27;), (&#x27;href&#x27;, &#x27;https://console.cloud.google.com/freetrial?redirectPath=/welcome&#x27;), (&#x27;image&#x27;, None)])]&gt;</dd> </dl></div> <div class="block-paragraph_advanced"><h3><span style="vertical-align: baseline;">1. Application, service and workload dashboards </span></h3> <p style="text-align: justify;"><strong style="font-style: italic; vertical-align: baseline;">No more spending hours configuring application dashboards. </strong></p> <p style="text-align: justify;"><span style="vertical-align: baseline;">From the moment you </span><a href="https://cloud.google.com/app-hub/docs/set-up-app-hub-folder"><span style="text-decoration: underline; vertical-align: baseline;">describe your application in App Hub</span></a><span style="vertical-align: baseline;">, Application Monitoring starts to automatically build dashboards tailored to your environment. Each dashboard comprises relevant telemetry for your application and is searchable, filterable and ready for deep dives — no configuration required. </span></p> <p style="text-align: justify;"><span style="vertical-align: baseline;">The dashboards offer an overview of charts detailing the </span><a href="https://sre.google/sre-book/monitoring-distributed-systems/#xref_monitoring_golden-signals" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">SRE Four Golden Signals</span></a><span style="vertical-align: baseline;">: traffic, latency, error rate, and saturation. This provides a high-level view of application performance, integrating automatically collected system metrics across various services and workloads such as load balancers, Cloud Run, GKE workloads, MIGs, and databases. From this overview, you can then drill down into services or workloads with performance issues or active alerts to access detailed metrics and logs.</span></p> <p><span style="vertical-align: baseline;">For example in the image below, a user defined an App Hub application called </span><span style="font-style: italic; vertical-align: baseline;">Cymbal BnB app</span><span style="vertical-align: baseline;">, with multiple services and workloads. The flow below shows the automatically generated experience with golden signals, alerts and relevant logs.</span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/1_zgV6J6C.gif" alt="1"> </a> <figcaption class="article-image__caption "><p data-block-key="g1e0b">Figure 1 - A user’s flow from an App Hub defined application (i.e. Cymbal BnB) to the automatic prebuilt Application Monitoring experience in Cloud Observability</p></figcaption> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><h3 role="presentation" style="text-align: justify;"><span style="vertical-align: baseline;">2. Labels and context propagation </span></h3> <p style="text-align: justify;"><strong style="font-style: italic; vertical-align: baseline;">See application labels propagated seamlessly across Google Cloud </strong></p> <p style="text-align: justify;"><span style="vertical-align: baseline;">Once Application Monitoring is enabled, your application labels are propagated across Google Cloud, so you can see and use them to filter and focus on the most essential signals across the logs, metrics and trace explorers.</span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/2_yj24vCu.max-1000x1000.png" alt="2"> </a> <figcaption class="article-image__caption "><p data-block-key="g1e0b">Figure 2 - Logs Explorer showing application automatically tagged with application labels</p></figcaption> </figure> </div> </div> </div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/3_kukVdIB.max-1000x1000.png" alt="3"> </a> <figcaption class="article-image__caption "><p data-block-key="g1e0b">Figure 3 - Metrics Explorer showing application labels automatically associated with metrics</p></figcaption> </figure> </div> </div> </div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/4_BGEDIwf.max-1000x1000.png" alt="4"> </a> <figcaption class="article-image__caption "><p data-block-key="g1e0b">Figure 4 - Trace Explorer showing AppHub label Integration</p></figcaption> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><h3><span style="vertical-align: baseline;">3. Gemini Cloud Assist Investigations</span></h3> <p style="text-align: justify;"><strong style="font-style: italic; vertical-align: baseline;">Troubleshoot issues faster with AI powered Investigations. </strong></p> <p><a href="https://cloud.google.com/gemini/docs/cloud-assist/investigations"><span style="text-decoration: underline; vertical-align: baseline;">Gemini Cloud Assist’s investigation feature</span></a><span style="vertical-align: baseline;"> makes it easier to troubleshoot issues because application boundaries and relationships have been propagated into the AI model, grounding it in context about your environment.  </span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/5_O7Wiid5.gif" alt="5"> </a> <figcaption class="article-image__caption "><p data-block-key="g1e0b">Figure 5 - Seamless entry point into Gemini Cloud Assist powered Investigations from application logs</p></figcaption> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><p style="text-align: justify;"><span style="vertical-align: baseline;">Note - Gemini Cloud Assist Investigations is currently in private preview</span></p> <h3><span style="vertical-align: baseline;">Try Application Monitoring today</span></h3> <p style="text-align: justify;"><span style="vertical-align: baseline;">The new</span><span style="vertical-align: baseline;"> Application Monitoring experience provides a low-effort unified view of application and infrastructure performance for your troubleshooting needs.</span></p> <p style="text-align: justify;"><span style="vertical-align: baseline;">Take advantage of the new Google Cloud Application Monitoring experience by:</span></p> <ol> <li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"> <p role="presentation" style="text-align: justify;"><span style="vertical-align: baseline;">Visiting your Cloud console</span></p> </li> <li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"> <p role="presentation" style="text-align: justify;"><a href="https://cloud.google.com/app-hub/docs/set-up-app-hub-folder"><span style="text-decoration: underline; vertical-align: baseline;">Setting up </span><strong style="text-decoration: underline; vertical-align: baseline;">Applications</strong><span style="text-decoration: underline; vertical-align: baseline;"> in AppHub</span></a></p> </li> <ol> <li aria-level="2" style="list-style-type: lower-alpha; vertical-align: baseline;"> <p role="presentation" style="text-align: justify;"><span style="vertical-align: baseline;">Adding </span><strong style="vertical-align: baseline;">Services</strong><span style="vertical-align: baseline;"> and </span><strong style="vertical-align: baseline;">Workloads</strong><span style="vertical-align: baseline;"> to your Application</span></p> </li> </ol> <li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"> <p role="presentation" style="text-align: justify;"><span style="vertical-align: baseline;">Navigating to </span><strong style="vertical-align: baseline;">Application Monitoring</strong><span style="vertical-align: baseline;"> in Cloud Observability to see your automatically built experience</span></p> </li> <li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"> <p role="presentation" style="text-align: justify;"><span style="vertical-align: baseline;">Enable your Gemini Cloud Assist SKU and </span><a href="https://cloud.google.com/earlyaccess/gemini-cloud-assist?e=48754805&amp;hl=en"><span style="text-decoration: underline; vertical-align: baseline;">sign up for the trusted tester program</span></a><span style="vertical-align: baseline;"> to get access to the</span><strong style="vertical-align: baseline;"> Investigations experience</strong></p> </li> </ol> <h3 style="text-align: justify;"><span style="vertical-align: baseline;">Related docs</span></h3> <ol style="list-style-type: lower-alpha;"> <li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"> <p role="presentation" style="text-align: justify;"><span style="vertical-align: baseline;">Application Monitoring </span><a href="https://cloud.google.com/stackdriver/docs/observability/about-application-monitoring"><span style="text-decoration: underline; vertical-align: baseline;">docs</span></a></p> </li> <li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"> <p role="presentation" style="text-align: justify;"><span style="vertical-align: baseline;">AppHub </span><a href="https://cloud.google.com/app-hub/docs/set-up-app-hub"><span style="text-decoration: underline; vertical-align: baseline;">docs</span></a></p> <ol style="list-style-type: lower-alpha;"> <li role="presentation" style="text-align: justify;"><span style="vertical-align: baseline;">Apphub </span><a href="https://cloud.google.com/app-hub/docs/supported-resources" style="font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, Cantarell, 'Open Sans', 'Helvetica Neue', sans-serif;"><span style="text-decoration: underline; vertical-align: baseline;">coverage docs</span></a></li> </ol> </li> </ol></div>
  61. Director of Engineering, Google Cloud

    Thu, 10 Jul 2025 09:30:00 -0000

    <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">At Google Cloud, we are committed to making it as seamless as possible for you to build and deploy the next generation of AI and agentic applications. Today, we’re thrilled to announce that we are </span><a href="https://docker.com/blog/build-ai-agents-with-docker-compose/" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">collaborating with Docker</span></a><span style="vertical-align: baseline;"> to drastically simplify your deployment workflows, enabling you to bring your sophisticated AI applications from local development to </span><a href="https://cloud.google.com/run"><span style="text-decoration: underline; vertical-align: baseline;">Cloud Run</span></a><span style="vertical-align: baseline;"> with ease. </span></p> <h3><strong style="vertical-align: baseline;">Deploy your compose.yaml directly to Cloud Run</strong></h3> <p><span style="vertical-align: baseline;">Previously, bridging the gap between your development environment and managed platforms like Cloud Run required you to manually translate and configure your infrastructure. Agentic applications that use MCP servers and self-hosted models added additional complexity. </span></p> <p><span style="vertical-align: baseline;">The open-source </span><a href="http://compose-spec.io" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">Compose Specification</span></a><span style="vertical-align: baseline;"> is one of the most popular ways for developers to iterate on complex applications in their local environment, and is the basis of Docker Compose. And now, </span><strong style="vertical-align: baseline;">gcloud run compose up</strong><span style="vertical-align: baseline;"> brings the simplicity of Docker Compose to Cloud Run, automating this entire process. Now in </span><a href="https://forms.gle/XDHCkbGPWWcjx9mk9" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">private preview</span></a><span style="vertical-align: baseline;">, you can deploy your existing</span><code style="vertical-align: baseline;"> compose.yaml</code><span style="vertical-align: baseline;"> file to Cloud Run with a single command, including building containers from source and leveraging Cloud Run’s volume mounts for data persistence.  </span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/compose.gif" alt="compose"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">Supporting the Compose Specification with Cloud Run makes for easy transitions across your local and cloud deployments, where you can keep the same configuration format, ensuring consistency and accelerating your dev cycle.</span></p> <p style="padding-left: 40px;"><span style="font-style: italic; vertical-align: baseline;">“We’ve recently evolved Docker Compose to support agentic applications, and we’re excited to see that innovation extend to Google Cloud Run with support for GPU-backed execution. Using Docker and Cloud Run, developers can now iterate locally and deploy intelligent agents to production at scale with a single command. It’s a major step forward in making AI-native development accessible and composable. We’re looking forward to continuing our close collaboration with Google Cloud to simplify how developers build and run the next generation of intelligent applications.” - </span><span style="vertical-align: baseline;">Tushar Jain, EVP Engineering and Product, Docker</span></p> <h3><strong style="vertical-align: baseline;">Cloud Run, your home for AI applications</strong></h3> <p><span style="vertical-align: baseline;">Support for the compose spec isn’t the only AI-friendly innovation you’ll find in Cloud Run. We recently announced </span><a href="https://cloud.google.com/blog/products/serverless/cloud-run-gpus-are-now-generally-available"><span style="text-decoration: underline; vertical-align: baseline;">general availability of Cloud Run GPUs</span></a><span style="vertical-align: baseline;">, removing a significant barrier to entry for developers who want access to GPUs for AI workloads. With its pay-per-second billing, scale to zero, and rapid scaling (which takes approximately 19 seconds for a gemma3:4b model for time-to-first-token), Cloud Run is a great hosting solution for deploying and serving LLMs. </span></p> <p><span style="vertical-align: baseline;">This also makes Cloud Run a strong solution for Docker’s recently </span><a href="https://www.docker.com/blog/docker-mcp-gateway-secure-infrastructure-for-agentic-ai/" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">announced</span></a><span style="vertical-align: baseline;"> OSS MCP Gateway and Model Runner, making it easy for developers to take the AI applications locally to production in the cloud seamlessly. By supporting Docker’s recent addition of </span><a href="https://github.com/compose-spec/compose-spec/blob/main/spec.md#models" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">‘models’ to the open Compose Spec</span></a><span style="vertical-align: baseline;">, you can deploy these complex solutions to the cloud with a single command.  </span></p> <h3><strong style="vertical-align: baseline;">Bringing it all together</strong></h3> <p><span style="vertical-align: baseline;">Let's review the compose file for the above demo. It consists of a multi-container application (defined in </span><code style="vertical-align: baseline;">services</code><span style="vertical-align: baseline;">) built from sources and leveraging a storage volume (defined in </span><code style="vertical-align: baseline;">volumes</code><span style="vertical-align: baseline;">). It also uses the new </span><code style="vertical-align: baseline;">models</code><span style="vertical-align: baseline;"> attribute to define AI models and a Cloud Run-extension defining the runtime image to use:</span></p></div> <div class="block-code"><dl> <dt>code_block</dt> <dd>&lt;ListValue: [StructValue([(&#x27;code&#x27;, &#x27;name: agent\r\nservices:\r\n webapp:\r\n build: .\r\n ports:\r\n - &quot;8080:8080&quot;\r\n volumes:\r\n - web_images:/assets/images\r\n depends_on:\r\n - adk\r\n\r\n adk:\r\n image: us-central1-docker.pkg.dev/jmahood-demo/adk:latest\r\n ports:\r\n - &quot;3000:3000&quot;\r\n models:\r\n - ai-model\r\n\r\nmodels:\r\n ai-model:\r\n model: ai/gemma3-qat:4B-Q4_K_M\r\n x-google-cloudrun:\r\n inference-endpoint: docker/model-runner:latest-cuda12.2.2\r\n\r\nvolumes:\r\n web_images:&#x27;), (&#x27;language&#x27;, &#x27;&#x27;), (&#x27;caption&#x27;, &lt;wagtail.rich_text.RichText object at 0x7f55c7f2c8b0&gt;)])]&gt;</dd> </dl></div> <div class="block-paragraph_advanced"><h3><strong style="vertical-align: baseline;">Building the future of AI</strong></h3> <p><span style="vertical-align: baseline;">We’re committed to offering developers maximum flexibility and choice by adopting open standards and supporting various agent frameworks.</span><strong style="vertical-align: baseline;"> </strong><span style="vertical-align: baseline;">This collaboration on Cloud Run and Docker is another example of how we aim to simplify the process for developers to build and deploy intelligent applications. </span></p> <p><span style="vertical-align: baseline;">Compose Specification support is available for our trusted users — </span><a href="https://forms.gle/XDHCkbGPWWcjx9mk9" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">sign up here for the private preview</span></a><span style="vertical-align: baseline;">. </span></p></div>
  62. Principal Platform Engineer, John Lewis Partnership

    Thu, 26 Jun 2025 16:00:00 -0000

    <div class="block-paragraph_advanced"><p><strong style="font-style: italic; vertical-align: baseline;">Editor's note:</strong><span style="font-style: italic; vertical-align: baseline;"> This is part one of the story. After you’re finished reading, head over to </span><a href="https://cloud.google.com/blog/products/application-development/simplifying-platform-engineering-at-john-lewis-part-two"><span style="font-style: italic; text-decoration: underline; vertical-align: baseline;">part two</span></a><span style="font-style: italic; vertical-align: baseline;">. </span></p> <hr/> <p><span style="vertical-align: baseline;">In 2017, John Lewis, a major UK retailer with a £2.5bn annual online turnover, was hampered by its monolithic e-commerce platform. This outdated approach led to significant cross-team dependencies, cumbersome and infrequent releases (monthly at best), and excessive manual testing, all further hindered by complex on-premises infrastructure. What was needed were some bold decisions to drive a quick and significant transformation.</span></p> <p><span style="vertical-align: baseline;">The John Lewis engineers knew there was a better way. Working with Google Cloud, they modernized their e-commerce operations with </span><a href="https://cloud.google.com/kubernetes-engine"><span style="text-decoration: underline; vertical-align: baseline;">Google Kubernetes Engine</span></a><span style="vertical-align: baseline;">. They started with the frontend, and started to see results fast: the frontend was moved onto Google Cloud in mere months, releases to the frontend browser journey started to happen weekly, and the business gladly backed expansion into other areas.</span></p> <p><span style="vertical-align: baseline;">At the same time, the team had a broader strategy in mind: to take </span><a href="https://cloud.google.com/solutions/platform-engineering"><span style="text-decoration: underline; vertical-align: baseline;">a platform engineering approach</span></a><span style="vertical-align: baseline;">, creating many product teams who built their own microservices to replace the functionality of the legacy commerce engine, as well as creating brand new experiences for customers. </span></p> <p><span style="vertical-align: baseline;">And so The John Lewis Digital Platform was born. The vision was to empower development teams and arm them with the tools and processes they needed to go to market fast, with full ownership of their own business services. The team’s motto? "You Build It. You Run It. You Own It." This decentralization of development and operational responsibilities would also enable the team to scale. </span></p> <p><span style="vertical-align: baseline;">This article features insights from Principal Platform Engineer Alex Moss, who delves into their strategy, platform build, and key learnings of John Lewis’ journey to modernize and streamline its operations with platform engineering — so you can begin to think about how you might apply platform engineering to your own organization.</span></p></div> <div class="block-aside"><dl> <dt>aside_block</dt> <dd>&lt;ListValue: [StructValue([(&#x27;title&#x27;, &#x27;Try Google Cloud for free&#x27;), (&#x27;body&#x27;, &lt;wagtail.rich_text.RichText object at 0x7f55c7f1a910&gt;), (&#x27;btn_text&#x27;, &#x27;Get started for free&#x27;), (&#x27;href&#x27;, &#x27;https://console.cloud.google.com/freetrial?redirectPath=/welcome&#x27;), (&#x27;image&#x27;, None)])]&gt;</dd> </dl></div> <div class="block-paragraph_advanced"><h3><strong style="vertical-align: baseline;">Step 1: From monolithic to multi-tenant</strong></h3> <p><span style="vertical-align: baseline;">In order to make this happen, John Lewis needed to adopt a multi-tenant architecture — one tenant for each business service, allowing each owning team to work independently without risk to others -- and thereby permitting the Platform team to give the team a greater degree of freedom.</span></p> <p><span style="vertical-align: baseline;">Knowing that the business' primary objective was to greatly increase the number of product teams helped inform our initial design thinking, positioning ourselves to enable many independent teams even though we only had a handful of tenants. </span></p> <p><span style="vertical-align: baseline;">This foundational design has served us very well and is largely unchanged now, seven years later. Central to the multi-tenant concept is what we chose to term a "Service" — a logical business application, usually composed of several microservices plus components for storing data.</span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/article1-image1.max-1000x1000.png" alt="article1-image1"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">We largely position our platform as a “bring your own container” experience, but encourage teams to make use of other Google Cloud services — particularly for handling state. Adopting services like Firestore and Pub/Sub reduces the complexity that our platform team has to work with, particularly for areas like resilience and disaster recovery. We also favor Kubernetes over compute products like Cloud Run because it strikes the right balance for us between enabling development teams to have freedom whilst allowing our platform to drive certain certain behaviours, e.g., the right level of guardrails, without introducing too much friction.</span></p> <p><span style="vertical-align: baseline;">On our platform, Product Teams (i.e., tenants) have a large amount of control over their own Namespaces and Projects. This allows them to prototype, build, and ultimately operate, their workloads without dependency on others — a crucial element of enabling scale. </span></p> <p><span style="vertical-align: baseline;">Our early-adopter teams were extremely helpful in helping evolve the platform; they were accepting of the lack of features and willing to develop their own solutions, and provided very rich feedback on whether we were building something that met their needs.</span></p> <p><span style="vertical-align: baseline;">The first tenant to adopt the platform was rebuilding the </span><a href="http://johnlewis.com" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">johnlewis.com</span></a><span style="vertical-align: baseline;">, search capability, replacing a commercial-off-the-shelf solution. This team was staffed with experienced engineers familiar with modern software development and the advantages of a microservice-based architecture. They quickly identified the need for supporting services for their application to store data and asynchronously communicate between their components. They worked with the Platform Team to identify options, and were onboard with our desire to lean into Google Cloud native services to avoid running our own databases or messaging. This led to us adopting Cloud Datastore and Pub/Sub for our first features that extended beyond Google Kubernetes Engine.</span></p> <h3><strong style="vertical-align: baseline;">All roads lead to success</strong></h3> <p><span style="vertical-align: baseline;">A risk with a platform that allows very high team autonomy is that it can turn into a bit of a wild-west of technology choices and implementation patterns. To handle this, but to do so in a way that remained developer-centric, we adopted the concept of a </span><strong style="vertical-align: baseline;">paved road, </strong><span style="vertical-align: baseline;"> analogous to a “golden path.” </span></p> <p><span style="vertical-align: baseline;">We found that the paved road approach made it easier to:</span></p> <ul> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">build useful platform features to help developers do things rapidly and safely</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">share approaches and techniques, and engineers to move between teams</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">demonstrate to the wider organisation that teams are following required practices (which we do by building assurance capabilities, </span><strong style="vertical-align: baseline;">not </strong><span style="vertical-align: baseline;">by gating release)</span></p> </li> </ul> <p><span style="vertical-align: baseline;">The concept of the paved road permeates most of what the platform builds, and has inspired other areas of the John Lewis Partnership beyond the John Lewis Digital space.</span></p> <p><span style="vertical-align: baseline;">Our paved road is powered by two key features to enable simplification for teams:</span></p> <ol> <li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">The Paved Road Pipeline</strong><span style="vertical-align: baseline;">. This operates on the whole Service and drives capabilities such as Google Cloud resource provisioning and observability tools.</span></p> </li> <li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">The Microservice CRD</strong><span style="vertical-align: baseline;">. As the name implies, this is an abstraction at the microservice level. The majority of the benefit here is in making it easier for teams to work with Kubernetes.</span></p> </li> </ol> <p><span style="vertical-align: baseline;">Whilst both features were created with the developer experience in mind, we discovered that they also hold a number of benefits for the platform team too.</span></p> <p><span style="vertical-align: baseline;">The Paved Road Pipeline is driven by a configuration file — in yaml (of course!) — which we call the Service Definition. This allows </span><strong style="vertical-align: baseline;">the team that owns the tenancy</strong><span style="vertical-align: baseline;"> to describe, through easy-to-reason-about configuration, what they would like the platform to provide for them. Supporting documentation and examples help them understand what can be achieved. Pushes to this file then drive a CI/CD pipeline for a number of platform-owned jobs, which we refer to as provisioners. These provisioners are microservices-like themselves in that they are independently releasable and generally focus on performing one task well. Here are some examples of our provisioners and what they can do:</span></p> <ul> <li role="presentation"><span style="vertical-align: baseline;">Create Google Cloud resources in a tenant’s Project. For example, </span><a href="https://cloud.google.com/storage/docs/creating-buckets"><span style="text-decoration: underline; vertical-align: baseline;">Buckets</span></a><span style="vertical-align: baseline;">, </span><a href="https://cloud.google.com/pubsub/docs/overview"><span style="text-decoration: underline; vertical-align: baseline;">PubSub</span></a><span style="vertical-align: baseline;">, and </span><a href="https://firebase.google.com/docs/firestore" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">Firestore</span></a><span style="vertical-align: baseline;"> — amongst many others</span></li> <li role="presentation"><span style="vertical-align: baseline;">Configure platform-provided dashboards and custom dashboards based on golden-signal and self-instrumented metrics</span></li> <li role="presentation"><span style="vertical-align: baseline;">Tune alert configurations for a given microservice’s SLOs, and the incident response behaviour for those alerts</span></li> </ul></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/article1-image2.max-1000x1000.png" alt="article1-image2"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">Our product teams are therefore freed from the need to familiarize themselves deeply with how Google Cloud resource provisioning works, or Infrastructure-as-Code (IaC) tooling for that matter. Our preferred technologies and good practices can be curated by our experts, and developers can focus on building differentiating software for the business, while remaining fully in control of what is provisioned and when.</span></p> <p><span style="vertical-align: baseline;">Earlier, we mentioned that this approach has the added benefit of being something that the platform team can rely upon to build their own features. The configuration updated by teams for their Service can be combined with metadata about their team and surfaced via an API and events published to Pub/Sub. This can then drive updates to other features like incident response and security tooling, pre-provision documentation repositories, and more. This is an example of how something that was originally intended as a means to help teams avoid writing their own IaC can also be used to make it easier for us to build platform features, further improving the value-add — without the developer even needing to be aware of it!</span></p> <p><span style="vertical-align: baseline;">We think this approach is also more scalable than providing pre-built Terraform modules for teams to use. That approach still burdens teams with being familiar with Terraform, and versioning and dependency complexities can create maintenance headaches for platform engineers. Instead, we provide an easy-to-reason-about API and </span><strong style="vertical-align: baseline;">deliberately burden the platform team,</strong><span style="vertical-align: baseline;"> ensuring that the Service provides all the functionality our tenants require. This abstraction also means we can make significant refactoring choices if we need to.</span></p> <p><span style="vertical-align: baseline;">Adopting this approach also results in a broad consistency in technologies across our platform. For example, why would a team implement Kafka when the platform makes creating resources in Pub/Sub so easy? When you consider that this spans not just the runtime components that assemble into a working business service, but also all the ancillary needs for operating that software — resilience engineering, monitoring &amp; alerting, incident response, security tooling, service management, and so on—  this has a massive amplifying effect on our engineers’ productivity. All of these areas have full paved road capabilities on the John Lewis Digital Platform, reducing the cognitive load for teams in recognizing the need for, identifying appropriate options, and then implementing technology or processes to use them.</span></p> <p><span style="vertical-align: baseline;">That being said, one of the reasons we particularly like the paved road concept is because it doesn't preclude teams choosing to "go off-road." A paved road shouldn’t be mandatory, but it should be compelling to use, so that engineers aren’t tempted to do something else. Preventing use of other approaches risks stifling innovation and the temptation to think the features you've built are "good enough." The paved road challenges our Platform Engineers to keep improving their product so that it continues to meet our Developers' changing needs. Likewise, development teams tempted to go off-road are put off by the increasing burden of replicating powerful platform features. </span></p> <p><span style="vertical-align: baseline;">The needs of our Engineers don’t remain fixed, and Google Cloud are of course releasing new capabilities all the time, so we have extended the analogy to include a “dusty path” representing brand new platform features that aren’t as feature-rich as we’d like (perhaps they lack self-service provisioning or out-the-box observability). Teams are trusted to try different options and make use of Google Cloud products that we haven't yet paved. The Paved Road Pipeline allows for this experimentation - what we term "snowflaking". We then have an unofficial "rule of three", whereby if we notice at least 3 teams requesting the same feature, we move to make the use of it self-service.</span></p> <p><span style="vertical-align: baseline;">At the other end of the scale, teams can go completely solo — which we refer to as “crazy paving” — and might be needed to support wild experimentation or to accommodate a workload which cannot comply with the platform’s expectations for safe operation. Solutions in this space are generally not long-lived.</span></p> <p><span style="vertical-align: baseline;">In this article, we've covered how John Lewis revolutionized its e-commerce operations by adopting a multi-tenant, "paved road" approach to platform engineering. We explored how this strategy empowered development teams and streamlined their ability to provision Google Cloud resources and deploy operational and security features.</span></p> <p><span><span style="vertical-align: baseline;">In </span><a href="https://cloud.google.com/blog/products/application-development/simplifying-platform-engineering-at-john-lewis-part-two?e=48754805"><span style="text-decoration: underline; vertical-align: baseline;">part 2</span></a><span style="vertical-align: baseline;"> of this series, we'll dive deeper into how John Lewis further simplified the developer experience by introducing the Microservice CRD. You'll discover how this custom Kubernetes abstraction significantly reduced the complexity of working with Kubernetes at the component level, leading to faster development cycles and enhanced operational efficiency.</span></span></p> <p><span style="vertical-align: baseline;">To learn more about shifting down with platform engineering on Google Cloud, you can find more information available </span><a href="https://cloud.google.com/solutions/platform-engineering"><span style="text-decoration: underline; vertical-align: baseline;">here</span></a><span style="vertical-align: baseline;">. To learn more about how Google Kubernetes Engine (GKE) empowers developers to effortlessly deploy, scale, and manage containerized applications with its fully managed, robust, and intelligent Kubernetes service, you can find more information </span><a href="https://cloud.google.com/kubernetes-engine"><span style="text-decoration: underline; vertical-align: baseline;">here</span></a><span style="vertical-align: baseline;">.</span></p></div>
  63. Principal Platform Engineer, John Lewis Partnership

    Thu, 26 Jun 2025 16:00:00 -0000

    <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">In our </span><a href="https://cloud.google.com/blog/products/application-development/simplifying-platform-engineering-at-john-lewis-part-one"><span style="text-decoration: underline; vertical-align: baseline;">previous article</span></a><span style="vertical-align: baseline;"> we introduced the John Lewis Digital Platform and its approach to simplifying the developer experience through platform engineering and so-called paved road features. We focused on the ways that platform engineering enables teams to create resources in Google Cloud and deploy the platform's operational and security features within dedicated tenant environments. In this article, we will build upon that concept for the next level of detail — how the platform simplifies build and run at a component (typically for us, a microservice) level too.</span></p> <p><span style="vertical-align: baseline;">Within just over a year, the John Lewis Digital Platform had fully evolved into a product. We had approximately 25 teams using our platform, with several key parts of the johnlewis.com retail website running in production. We had built a self-service capability to help teams provision resources in Google Cloud, and firmly established that the foundation of our platform was on Google Kubernetes Engine (GKE). But we were hearing signals from some of the recent teams that there was a learning curve to Kubernetes. This was expected — we were driving a cultural change for teams to build and run their own services, and so we anticipated that our application developers would need some Kubernetes skills to support their own software. But our vision was that we wanted to make developers' lives easier — and their feedback was clear. In some cases, we observed that teams weren't following "good practice"  (despite the existence of good documentation!) such as not using anti-affinity rules or </span><code style="vertical-align: baseline;">PodDisruptionBudgets</code><span style="vertical-align: baseline;"> to help their workloads tolerate failure.</span></p></div> <div class="block-aside"><dl> <dt>aside_block</dt> <dd>&lt;ListValue: [StructValue([(&#x27;title&#x27;, &#x27;Try Google Cloud for free&#x27;), (&#x27;body&#x27;, &lt;wagtail.rich_text.RichText object at 0x7f55c7ea1c40&gt;), (&#x27;btn_text&#x27;, &#x27;Get started for free&#x27;), (&#x27;href&#x27;, &#x27;https://console.cloud.google.com/freetrial?redirectPath=/welcome&#x27;), (&#x27;image&#x27;, None)])]&gt;</dd> </dl></div> <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">All the way back in 2017, Kelsey Hightower wrote: “</span><span style="font-style: italic; vertical-align: baseline;">Kubernetes is a platform for building platforms. It's a better place to start, not the endgame.”</span></p> <p><span style="vertical-align: baseline;">Kelsey's quote inspired us to act. We had the idea to write our own custom controller to simplify the point of interaction for a developer with Kubernetes — a John Lewis-specific abstraction that aligned to our preferred approaches. And thus the JL </span><code style="vertical-align: baseline;">Microservice</code><span style="vertical-align: baseline;"> was born.</span></p> <p><span style="vertical-align: baseline;">To do this, we declared a Kubernetes  </span><code style="vertical-align: baseline;">CustomResourceDefinition</code><span style="vertical-align: baseline;"> with a simplified specification containing just the fields we felt our developers needed to set. For example, as we expect our tenants to build and operate their applications themselves, attributes such as the number of replicas and the amount of resources needed are best left up to the developers themselves. But do they really need to be able to customize the rules defining how to distribute pods across nodes? How often do they need to change the </span><code style="vertical-align: baseline;">Service</code><span style="vertical-align: baseline;"> pointing towards their </span><code style="vertical-align: baseline;">Deployment</code><span style="vertical-align: baseline;">? When we looked closer, we realized just how much duplication there was — our analysis at the time suggested that only around 33% of the lines in the yaml files developers were producing were relevant to their application. This was a target-rich scenario for simplification.</span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/article2-image1.max-1000x1000.png" alt="article2-image1"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">To help us build this feature, we selected </span><a href="https://github.com/kubernetes-sigs/kubebuilder" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">Kubebuilder,</span></a><span style="vertical-align: baseline;">  using it to declare our </span><code style="vertical-align: baseline;">CustomResourceDefinition</code><span style="vertical-align: baseline;"> and then build the Controller (what we call </span><code style="vertical-align: baseline;">MicroserviceManager</code><span style="vertical-align: baseline;">). This turned out to be a beneficial decision — initial prototyping was quick, and the feature was launched a few months later, and very well-received. Our team had to skill up in the </span><a href="https://go.dev/" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">Go programming language</span></a><span style="vertical-align: baseline;">, but this trade-off felt worthwhile due to the advantages Kubebuilder was bringing to the table, and it has continued to be helpful for other software engineering since.</span></p> <p><span style="vertical-align: baseline;">The initial implementation replaced an engineer's need to understand and fully configure a </span><code style="vertical-align: baseline;">Deployment</code><span style="vertical-align: baseline;"> and </span><code style="vertical-align: baseline;">Service</code><span style="vertical-align: baseline;">, instead applying a much briefer yaml file containing only the fields they need to change. As well as direct translation of identical fields (</span><code style="vertical-align: baseline;">image</code><span style="vertical-align: baseline;"> and </span><code style="vertical-align: baseline;">replicas </code><span style="vertical-align: baseline;">are equivalent to what you would see in a </span><code style="vertical-align: baseline;">Deployment</code><span style="vertical-align: baseline;">, for example), it also allowed us to simplify the choices made by the Kubernetes APIs, because in John Lewis we didn't need some of that functionality. For example, </span><code style="vertical-align: baseline;">writablePaths: []</code><span style="vertical-align: baseline;"> is an easy concept for our engineers to understand, and behind the scenes, our controller is converting those into the more complex combination of </span><code style="vertical-align: baseline;">Volumes </code><span style="vertical-align: baseline;">and </span><code style="vertical-align: baseline;">VolumeMounts</code><span style="vertical-align: baseline;">. Likewise, </span><code style="vertical-align: baseline;">visibleToOtherServices: true</code><span style="vertical-align: baseline;"> is an example of us simplifying the interaction with Kubernetes </span><code style="vertical-align: baseline;">NetworkPolicy</code><span style="vertical-align: baseline;"> — rather than requiring teams to read our documentation to understand the necessary incantations to label their resources correctly, the controller understands those conventions and handles it for them.</span></p> <p><span style="vertical-align: baseline;">With the core concept of the </span><code style="vertical-align: baseline;">Microservice </code><span style="vertical-align: baseline;">resource established, we were able to improve the value-add by augmenting it with further features. We rapidly extended it out to define our Prometheus scrape configuration, then more complex features such as allowing teams to declare that they use Google Cloud Endpoints, and have the controller inject the necessary sidecar container into their </span><code style="vertical-align: baseline;">Deployment</code><span style="vertical-align: baseline;"> and wiring it up to the </span><code style="vertical-align: baseline;">Service</code><span style="vertical-align: baseline;">. As we added more features, existing tenants converted to use this specification, and it now makes up the majority of workloads declared on the platform.</span></p> <h3><strong style="vertical-align: baseline;">Moving the platform boundary</strong></h3> <p><span style="vertical-align: baseline;">Our motivation to build MicroserviceManager was focused on making developers' lives easier. But we discovered an additional benefit that we had not initially expected - it was something we could greatly benefit from </span><span style="font-style: italic; vertical-align: baseline;">within</span><span style="vertical-align: baseline;"> the platform as well. It enabled us to make changes behind the scenes without needing to involve our tenants — reducing toil for them and making it easier for us to improve our product. This was a slightly unexpected but an exceptionally powerful benefit. It is generally difficult to change the agreement that you’ve established between your tenants and the platform, and creating an abstraction like this has allowed us to bring more under our control, for everyone’s benefit.</span></p> <p><span style="vertical-align: baseline;">An example of this was something we observed through our live load testing of johnlewis.com when certain workloads burst up to several hundred </span><code style="vertical-align: baseline;">Pods</code><span style="vertical-align: baseline;"> — numbers that exceeded the typical number of </span><code style="vertical-align: baseline;">Nodes</code><span style="vertical-align: baseline;"> we had running in the cluster. This led to new </span><code style="vertical-align: baseline;">Node</code><span style="vertical-align: baseline;"> creation — therefore slower </span><code style="vertical-align: baseline;">Pod</code><span style="vertical-align: baseline;"> autoscaling and poor bin-packing. Experienced Kubernetes operators can probably guess what was happening here: our default antiAffinity rules were set to optimize for resilience such that no more than one replica was allowed on any given </span><code style="vertical-align: baseline;">Node</code><span style="vertical-align: baseline;">. The good news though was that because the workloads were under the control of our Microservice Manager, rather than us having to instruct our tenants to copy the relevant yaml into their Deployments, it was a straightforward change for us to replace the antiAffinity rules with the more modern </span><a href="https://kubernetes.io/docs/concepts/scheduling-eviction/topology-spread-constraints/" rel="noopener" target="_blank"><code style="text-decoration: underline; vertical-align: baseline;">podTopologyConstraints</code></a><span style="vertical-align: baseline;">, allowing us to customize the number of replicas that could be stacked on a Node for workloads exceeding a certain replica count. And this happened with no intervention from our tenants.</span></p> <p><span style="vertical-align: baseline;">A more complex example of this was when we rolled out our service mesh. In keeping with our general desire to let Google Cloud handle the complexity of running control planes components, we opted to use </span><a href="https://cloud.google.com/products/service-mesh"><span style="text-decoration: underline; vertical-align: baseline;">Google's Cloud Service Mesh</span></a><span style="vertical-align: baseline;"> product. But even then, rolling out a mesh to a business-critical platform in constant use is not without its risks. Microservice Manager allowed us to control the rate at which we enrolled workloads into the mesh through the use of a feature flag on the </span><code style="vertical-align: baseline;">Microservice</code><span style="vertical-align: baseline;"> resource. We could start rollout with platform-owned workloads first to test our approach, then make tenants aware of the flag for early adopters to validate and take advantage of some of Cloud Service Mesh’s features. To scale the rollout, we could then manipulate the flag to release in waves based on business importance, providing an opt-out mechanism if needed to. This again greatly simplified the implementation — product teams had very little to do, and we avoided having to chase approximately 40 teams running hundreds of Microservices to make the appropriate changes in their configuration. This feature flagging technique is something we make extensive use of to support our own experimentation.</span></p> <h3><strong style="vertical-align: baseline;">Beyond the microservice</strong></h3> <p><span style="vertical-align: baseline;">Building the Microservice Manager has led to further thinking in Kubernetes-native ways: the </span><a href="https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/#customresourcedefinitions" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">Custom Resource + Controller concept</span></a><span style="vertical-align: baseline;"> is a powerful technique, and we have built other features since using it. One example is a controller that converts the need for external connectivity into Istio resources to route via our egress gateway. Istio in particular is an example of a very powerful platform capability that comes with a high cognitive load for its users, and so is a perfect example of where platform engineering can help manage that for teams whilst still allowing them to take advantage of it. We have a number of ideas in this area now that our confidence in the technology has grown.</span></p> <p><span style="vertical-align: baseline;">In summary, the John Lewis Partnership leveraged Google Cloud and platform engineering to modernize their e-commerce operations and developer experience. By implementing a "paved road" approach with a multi-tenant architecture, they empowered development teams, accelerated deployment cycles, and simplified Kubernetes interactions using a custom Microservice CRD. This strategy allowed them to scale effectively and enhance the developer experience by reducing complexity while maintaining operational efficiency and scaling engineering teams effectively.</span></p> <p><span style="vertical-align: baseline;">To learn more about platform engineering on Google Cloud, check out some of our other articles:</span><span style="vertical-align: baseline;"> </span><a href="https://cloud.google.com/blog/products/application-development/common-myths-about-platform-engineering"><span style="text-decoration: underline; vertical-align: baseline;">5 myths about platform engineering: what it is and what it isn’t</span></a><span style="vertical-align: baseline;">, </span><a href="https://cloud.google.com/blog/products/application-development/another-five-myths-about-platform-engineering"><span style="text-decoration: underline; vertical-align: baseline;">Another five myths about platform engineering</span></a><span style="vertical-align: baseline;">, and </span><a href="https://cloud.google.com/blog/products/application-development/golden-paths-for-engineering-execution-consistency"><span style="text-decoration: underline; vertical-align: baseline;">Light the way ahead: Platform Engineering, Golden Paths, and the power of self-service</span></a><span style="vertical-align: baseline;">.</span></p></div>
  64. Sr. Staff UX Designer

    Wed, 28 May 2025 16:00:00 -0000

    <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">In the event of a cloud incident, everyone wants swift and clear communication from the cloud provider, and to be able to leverage that information effectively. </span><a href="https://cloud.google.com/blog/products/devops-sre/personalized-service-health-is-now-generally-available?e=48754805?utm_source%3Dmarketingweb"><span style="text-decoration: underline; vertical-align: baseline;">Personalized Service Health</span></a><span style="vertical-align: baseline;"> in the Google Cloud console addresses this need with fast, transparent, relevant, and actionable communications about Google Cloud service disruptions, customized to your specific footprint. This helps you to quickly identify the source of the problem, helping you answer the question, “Is it Google or is it me?” You can then integrate this information into your incident response workflows to resolve the incident more efficiently.</span></p> <p><span style="vertical-align: baseline;">We're excited to announce that you can prompt </span><a href="https://g.co/kgs/j2BVWVE" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">Gemini Cloud Assist</span></a><span style="vertical-align: baseline;"> to pull real-time information about active incidents, powered by Personalized Service Health, providing you with streamlined incident management, including discovery, impact assessment, and recovery. By combining Gemini's guidance with Personalized Service Health insights and up-to-the-minute information, you can assess the scope of impact and begin troubleshooting – all within a single, AI-driven Gemini Cloud Assist chat. Further, you  can initiate this sort of incident discovery from anywhere within the console, offering immediate access to relevant incidents without interrupting your workflow. You can also check for active incidents impacting your projects, gathering details on their scope and the latest updates directly sourced from Personalized Service Health</span><span style="vertical-align: baseline;">.</span></p></div> <div class="block-aside"><dl> <dt>aside_block</dt> <dd>&lt;ListValue: [StructValue([(&#x27;title&#x27;, &#x27;Try Google Cloud for free&#x27;), (&#x27;body&#x27;, &lt;wagtail.rich_text.RichText object at 0x7f55c74d63d0&gt;), (&#x27;btn_text&#x27;, &#x27;Get started for free&#x27;), (&#x27;href&#x27;, &#x27;https://console.cloud.google.com/freetrial?redirectPath=/welcome&#x27;), (&#x27;image&#x27;, None)])]&gt;</dd> </dl></div> <div class="block-paragraph_advanced"><h3><strong style="vertical-align: baseline;">Using Gemini Cloud Assist with Personalized Service Health</strong></h3> <p><span style="vertical-align: baseline;">We designed Gemini Cloud Assist with a user-friendly layout and a well-organized information structure. Crucial details, including dynamic timelines, latest updates, symptoms, and workarounds sourced directly from Personalized Service Health, are now presented in the console, enabling conversational follow-ups. Gemini Cloud Assist highlights critical insights from Personalized Service Health, helping you refine your investigations and understand the impact of incidents.</span></p> <p><span style="vertical-align: baseline;">To illustrate the power of this integration, the following demo showcases a typical incident response workflow leveraging the combined capabilities of Gemini and Personalized Service Health.</span></p> <p><strong style="vertical-align: baseline;">Incident discovery and triage<br/></strong><span style="vertical-align: baseline;">In the crucial first moments of an incident, Gemini Cloud Assist helps you answer "Is it Google or is it me?" Gemini Cloud Assist accesses data directly from Personalized Service Health, and provides feedback on which projects and at what locations are affected by a Google Cloud incident, speeding up the triage process.</span></p> <p><span style="vertical-align: baseline;">To illustrate how you can start this process, try asking Gemini Cloud Assist questions like:</span></p> <ul> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">Is my project impacted by a Google Cloud incident?</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">Are there any incidents impacting Google Cloud at the moment?</span></p> </li> </ul></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/1_UpdatedNew.gif" alt="1 UpdatedNew"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><p><strong style="vertical-align: baseline;">Investigating and evaluating impact<br/></strong><span style="vertical-align: baseline;">Once you’ve identified a relevant Google Cloud incident, you can use Gemini Cloud Assist to delve deeper into the specifics and evaluate its impact on your environment. Furthermore, by asking follow-up questions, Gemini Cloud Assist can retrieve updates from Personalized Service Health about the incident as it evolves. You can then further investigate by asking Gemini to pinpoint exactly which of your apps or projects, and at what locations, might be affected by the reported incident.</span></p> <p><span style="vertical-align: baseline;">Here are examples of prompts you might pose to Gemini Cloud Assist:</span></p> <ul> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">Tell me more about the ongoing Incident ID [X] (Replace [X] with the Incident ID)</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">Is [X] impacted? (Replace [X] with your specific location or Google Cloud product)</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">What is the latest update on Incident ID [X]?</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">Show me the details of Incident ID [X].</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">Can you guide me through some troubleshooting steps for [impacted Google Cloud product]?</span></p> </li> </ul></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--medium h-c-grid__col h-c-grid__col--4 h-c-grid__col--offset-4 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/2_Updated.gif" alt="2"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><p><strong style="vertical-align: baseline;">Mitigation and recovery<br/></strong><span style="vertical-align: baseline;">Finally, Gemini Cloud Assist can also act as an intelligent assistant during the recovery phase, providing you with actionable guidance. You can gain access to relevant logs and monitoring data for more efficient resolution. Additionally, Gemini Cloud Assist can help surface potential workarounds from Personalized Service Health and direct you to the tools and information you need to restore your projects or applications. Here are some sample prompts:</span></p> <ul> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">What are the workarounds for the incident ID [X]? (Replace [X] with the Incident ID)</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">Can you suggest a temporary solution to keep my application running?</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">How can I find logs for this impacted project?</span></p> </li> </ul></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--medium h-c-grid__col h-c-grid__col--4 h-c-grid__col--offset-4 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/3_Updated_tpPYqpq.gif" alt="3 Updated"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">From these prompts, Gemini retrieves relevant information from Personalized Service Health to provide you with personalized insights into your Google Cloud environment's health — both for ongoing events and incidents from up to one year in the past. This helps when investigating an incident to narrow down its impact, as well as assisting in recovery. </span></p> <h3><strong style="vertical-align: baseline;">Next steps</strong></h3> <p><span style="vertical-align: baseline;">Looking ahead, we are excited to provide even deeper insights and more comprehensive incident management with Gemini Cloud Assist and Personalized Service Health, extending these AI-driven capabilities beyond a single project view. Ready to get started? </span></p> <ul> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">Learn more about </span><a href="https://cloud.google.com/service-health/docs/overview"><span style="text-decoration: underline; vertical-align: baseline;">Personalized Service Health</span></a><span style="vertical-align: baseline;">, or reach out to your account team to enable it.</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">Get started with </span><a href="https://cloud.google.com/products/gemini/cloud-assist?e=48754805?utm_source%3Dmarketingweb" style="font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, Cantarell, 'Open Sans', 'Helvetica Neue', sans-serif;"><span style="text-decoration: underline; vertical-align: baseline;">Gemini Cloud Assist</span></a><span style="vertical-align: baseline;">. Refine your prompts to ask about your specific regions or Google Cloud products, and experiment to discover how it can help you proactively manage incidents.</span></p> </li> </ul></div> <div class="block-related_article_tout"> <div class="uni-related-article-tout h-c-page"> <section class="h-c-grid"> <a href="https://cloud.google.com/blog/products/devops-sre/personalized-service-health-is-now-generally-available/" data-analytics='{ "event": "page interaction", "category": "article lead", "action": "related article - inline", "label": "article: {slug}" }' class="uni-related-article-tout__wrapper h-c-grid__col h-c-grid__col--8 h-c-grid__col-m--6 h-c-grid__col-l--6 h-c-grid__col--offset-2 h-c-grid__col-m--offset-3 h-c-grid__col-l--offset-3 uni-click-tracker"> <div class="uni-related-article-tout__inner-wrapper"> <p class="uni-related-article-tout__eyebrow h-c-eyebrow">Related Article</p> <div class="uni-related-article-tout__content-wrapper"> <div class="uni-related-article-tout__image-wrapper"> <div class="uni-related-article-tout__image" style="background-image: url('https://storage.googleapis.com/gweb-cloudblog-publish/images/psh-hero_Ty1sB8V.max-500x500.jpg')"></div> </div> <div class="uni-related-article-tout__content"> <h4 class="uni-related-article-tout__header h-has-bottom-margin">Personalized Service Health is now generally available: Get started today</h4> <p class="uni-related-article-tout__body">Personalized Service Health provides visibility into incidents relevant to your environment, allowing you to evaluate their impact and tr...</p> <div class="cta module-cta h-c-copy uni-related-article-tout__cta muted"> <span class="nowrap">Read Article <svg class="icon h-c-icon" role="presentation"> <use xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="#mi-arrow-forward"></use> </svg> </span> </div> </div> </div> </div> </a> </section> </div> </div>
  65. Staff Site Reliability Engineer, Waze

    Mon, 28 Apr 2025 16:00:00 -0000

    <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">In 2023, the Waze platform engineering team transitioned to Infrastructure as Code (IaC) using Google Cloud's </span><a href="https://cloud.google.com/config-connector/docs/overview"><span style="text-decoration: underline; vertical-align: baseline;">Config Connector</span></a><span style="vertical-align: baseline;"> (KCC) — and we haven’t looked back since. We embraced Config Connector, an open-source Kubernetes add-on, to manage Google Cloud resources through Kubernetes. To streamline management, we also leverage Config Controller, a hosted version of Config Connector on Google Kubernetes Engine (GKE), incorporating Policy Controller and Config Sync. This shift has significantly improved our infrastructure management and is shaping our future infrastructure.</span></p> <h3><strong style="vertical-align: baseline;">The shift to Config Connector</strong></h3> <p><span style="vertical-align: baseline;">Previously, Waze relied on Terraform to manage resources, particularly during our dual-cloud, VM-based phase. However, maintaining state and ensuring reconciliation proved challenging, leading to inconsistent configurations and increased management overhead.</span></p> <p><span style="vertical-align: baseline;">In 2023, we adopted Config Connector, transforming our Google Cloud infrastructure into </span><a href="https://github.com/kubernetes/design-proposals-archive/blob/main/architecture/resource-management.md" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">Kubernetes Resource Modules</span></a><span style="vertical-align: baseline;"> (KRMs) within a GKE cluster. This approach addresses the reconciliation issues encountered with Terraform. Config Sync, paired with Config Connector, automates KRM synchronization from source repositories to our live GKE cluster. This managed solution eliminates the need for us to build and maintain custom reconciliation systems.</span></p> <p><span style="vertical-align: baseline;">The shift helped us meet the needs of three key roles within Waze’s infrastructure team: </span></p> <ol> <li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">Infrastructure consumers:</strong><span style="vertical-align: baseline;"> Application developers who want to easily deploy infrastructure without worrying about the maintenance and complexity of underlying resources.</span></p> </li> <li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">Infrastructure owners:</strong><span style="vertical-align: baseline;"> Experts in specific resource types (e.g., Spanner, Google Cloud Storage, Load Balancers, etc.), who want to define and standardize best practices in how resources are created across Waze on Google Cloud.</span></p> </li> <li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">Platform engineers: </strong><span style="vertical-align: baseline;">Engineers who build the system that enables infrastructure owners to codify and define best practices, while also providing a seamless API for infrastructure consumers.</span></p> </li> </ol></div> <div class="block-aside"><dl> <dt>aside_block</dt> <dd>&lt;ListValue: [StructValue([(&#x27;title&#x27;, &#x27;$300 in free credit to try Google Cloud containers and Kubernetes&#x27;), (&#x27;body&#x27;, &lt;wagtail.rich_text.RichText object at 0x7f55c75874c0&gt;), (&#x27;btn_text&#x27;, &#x27;Start building for free&#x27;), (&#x27;href&#x27;, &#x27;http://console.cloud.google.com/freetrial?redirectpath=/marketplace/product/google/container.googleapis.com&#x27;), (&#x27;image&#x27;, None)])]&gt;</dd> </dl></div> <div class="block-paragraph_advanced"><h3><strong style="vertical-align: baseline;">First stop: Config Connector</strong></h3> <p><span style="vertical-align: baseline;">It may seem circular to define all of our Google Cloud infrastructure as KRMs within a Google Cloud service, however, KRM is actually a great representation for our infrastructure as opposed to existing IaC tooling.</span></p> <p><span style="vertical-align: baseline;">Terraform's reconciliation issues – state drift, version management, out of band changes – are a significant pain. Config Connector, through Config Sync, offers out-of-the-box reconciliation, a managed solution we prefer. Both KRM and Terraform offer templating, but KCC's managed nature aligns with our shift to Google Cloud-native solutions and reduces our maintenance burden. </span></p> <p><span style="vertical-align: baseline;">Infrastructure complexity requires generalization regardless of the tool. We can see this when we look at the Spanner requirements at Waze:</span></p> <ul> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">Consistent backups for all Spanner databases</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">Each Spanner database utilizes a dedicated Cloud Storage bucket and Service Account to automate the execution of DDL jobs.</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">All IAM policies for Spanner instances, databases, and Cloud Storage buckets are defined in code to ensure consistent and auditable access control. </span></p> </li> </ul></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/1_-_Spanner_at_Waze.max-1000x1000.jpg" alt="1 - Spanner at Waze"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">To define these resources, we evaluated various templating and rendering tools and selected Helm, a robust CNCF package manager for Kubernetes. Its strong open-source community, rich templating capabilities, and native rendering features made it a natural fit. We can now refer to our bundled infrastructure configurations as 'Charts.' While </span><a href="https://cloud.google.com/blog/products/containers-kubernetes/introducing-kube-resource-orchestrator"><span style="text-decoration: underline; vertical-align: baseline;">KRO</span></a><span style="vertical-align: baseline;"> has since emerged that achieves a similar purpose, our selection process predated its availability.</span></p> <h3><strong style="vertical-align: baseline;">Under the hood</strong></h3> <p><span style="vertical-align: baseline;">Let's open the hood and dive into how the system works and is driving value for Waze.</span></p> <ol> <li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">Waze infrastructure owners generically define Waze-flavored infrastructure in Helm Charts. </span></p> </li> <li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">I<span><span style="vertical-align: baseline;">nfrastructure consumers use these Charts with simplified inputs to generate infrastructure (</span><a href="https://www.youtube.com/watch?v=B4RI4MwXOgg" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">demo</span></a><span style="vertical-align: baseline;">).</span></span></span></p> </li> <li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">Infrastructure code is stored in repositories, enabling validation and presubmit checks.</span></p> </li> </ol> <p><span style="vertical-align: baseline;">Code is uploaded to a </span><a href="https://cloud.google.com/artifact-registry/docs"><span style="text-decoration: underline; vertical-align: baseline;">Artifact Registry</span></a><span style="vertical-align: baseline;"> where Config Sync and Config Connector align Google Cloud infrastructure with the code definitions. </span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/2_-_Provisioning_Cloud_Resources_at_Waze.max-1000x1000.jpg" alt="2 - Provisioning Cloud Resources at Waze"> </a> <figcaption class="article-image__caption "><p data-block-key="98gzx">This diagram represents a single "data domain," a collection of bounded services, databases, networks, and data. Many tech orgs today consist of Prod, QA, Staging, Development, etc.</p></figcaption> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><h3><strong style="vertical-align: baseline;">Approaching our destination</strong></h3> <p><span style="vertical-align: baseline;">So why does all of this matter? Adopting this approach allowed us to move from Infrastructure as Code to Infrastructure as Software. By treating each Chart as a software component, our infrastructure management goes beyond simple code declaration. Now, versioned Charts and configurations enable us to leverage a rich ecosystem of software practices, including sophisticated release management, automated rollbacks, and granular change tracking.</span></p> <p><span style="vertical-align: baseline;">Here's where we apply this in practice: our configuration inheritance model minimizes redundancy. Resource Charts inherit settings from Projects, which inherit from Bootstraps. All three are defined as Charts. Consequently, Bootstrap configurations apply to all Projects, and Project configurations apply to all Resources.</span></p> <p><span style="vertical-align: baseline;">Every change to our infrastructure – from changes on existing infrastructure to rolling out new resource types – can be treated like a software rollout. </span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/3_-_Resource_Inheritance.max-1000x1000.jpg" alt="3 - Resource Inheritance"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">Now that all of our infrastructure is treated like software, we can see what this does for us system-wide:</span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/4_-_Data_Domain_Flow.max-1000x1000.jpg" alt="4 - Data Domain Flow"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><h3><strong style="vertical-align: baseline;">Reaching our destination</strong></h3> <p><span style="vertical-align: baseline;">In summary, Config Connector and Config Controller have enabled Waze to achieve true Infrastructure as Software, providing a robust and scalable platform for our infrastructure needs, along with many other benefits including: </span></p> <ul> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">Infrastructure consumers receive the latest best practices through versioned updates.</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">Infrastructure owners can iterate and improve infrastructure safely.</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">Platform Engineers and Security teams are confident our resources are auditable and compliant</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">Config Connector leverages </span><a href="https://cloud.google.com/kubernetes-engine/enterprise/config-controller/docs/overview"><span style="text-decoration: underline; vertical-align: baseline;">Google's managed services</span></a><span style="vertical-align: baseline;">, reducing operational overhead.</span></p> </li> </ul></div>
  66. Engineering Manager

    Mon, 24 Feb 2025 17:00:00 -0000

    <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">Distributed tracing is a critical part of an observability stack, letting you troubleshoot latency and errors in your applications. Cloud Trace, part of </span><a href="https://cloud.google.com/stackdriver/docs"><span style="text-decoration: underline; vertical-align: baseline;">Google Cloud Observability</span></a><span style="vertical-align: baseline;">, is Google Cloud’s native tracing product, and we’ve made numerous improvements to the Trace explorer UI on top of a new analytics backend.</span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/1_Components_of_the_new_trace_explorer.max-1000x1000.jpg" alt="1_Components of the new trace explorer"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">The new Trace explorer page contains:</span></p> <ol> <li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">A filter bar with options for users to choose a Google Cloud project-based trace scope, all/root spans and a custom attribute filter.</span></p> </li> <li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">A faceted span filter pane that displays commonly used filters based on </span><a href="https://opentelemetry.io/docs/specs/semconv/general/trace/" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">OpenTelemetry conventions</span></a><span style="vertical-align: baseline;">.</span></p> </li> <li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">A visualization of matching spans including an interactive span duration heatmap (default), a span rate line chart, and a span duration percentile chart.</span></p> </li> <li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">A table of matching spans that can be narrowed down further by selecting a cell of interest on the heatmap.</span></p> </li> </ol> <h3><strong style="vertical-align: baseline;">A tour of the new Trace explorer</strong></h3> <p><span style="vertical-align: baseline;">Let’s take a closer look at these new features and how you can use them to troubleshoot your applications. Imagine you’re a developer working on the </span><span style="font-style: italic; vertical-align: baseline;">checkoutservice</span><span style="vertical-align: baseline;"> of a retail webstore application and you’ve been paged because there’s an ongoing incident.</span></p></div> <div class="block-aside"><dl> <dt>aside_block</dt> <dd>&lt;ListValue: [StructValue([(&#x27;title&#x27;, &#x27;Try Google Cloud for free&#x27;), (&#x27;body&#x27;, &lt;wagtail.rich_text.RichText object at 0x7f55c7ec70a0&gt;), (&#x27;btn_text&#x27;, &#x27;Get started for free&#x27;), (&#x27;href&#x27;, &#x27;https://console.cloud.google.com/freetrial?redirectPath=/welcome&#x27;), (&#x27;image&#x27;, None)])]&gt;</dd> </dl></div> <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">This application is instrumented using OpenTelemetry and sends trace data to Google Cloud Trace, so you navigate to the Trace explorer page on the Google Cloud console with the context set to the Google Cloud project that hosts the </span><span style="font-style: italic; vertical-align: baseline;">checkoutservice</span><span style="vertical-align: baseline;">.</span></p> <p><span style="vertical-align: baseline;">Before starting your investigation, you remember that your admin recommended using the </span><span style="font-style: italic; vertical-align: baseline;">webstore-prod</span><span style="vertical-align: baseline;"> trace scope when investigating webstore app-wide prod issues. By using this Trace scope, you'll be able to see spans stored in other Google Cloud projects that are relevant to your investigation.</span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/2_Scope_selection.max-1000x1000.jpg" alt="2_Scope selection"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">You set the trace scope to </span><span style="font-style: italic; vertical-align: baseline;">webstore-prod</span><span style="vertical-align: baseline;"> and your queries will now include spans from all the projects included in this trace scope.</span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/3_User_Journey.max-1000x1000.jpg" alt="3_User Journey"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">You select </span><span style="font-style: italic; vertical-align: baseline;">checkoutservice</span><span style="vertical-align: baseline;"> in </span><strong style="vertical-align: baseline;">Span filters</strong><span style="vertical-align: baseline;"> (1) and the following updates load on the page:</span></p> <ul> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">Other sections such as </span><strong style="vertical-align: baseline;">Span name</strong><span style="vertical-align: baseline;"> in the span filter pane (2) are updated with counts and percentages that take into account the selection made under service name. This can help you narrow down your search criteria to be more specific.</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">The span </span><strong style="vertical-align: baseline;">Filter</strong><span style="vertical-align: baseline;"> bar (3) is updated to display the active filter.</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">The heatmap visualization (4)  is updated to only display spans from the </span><span style="font-style: italic; vertical-align: baseline;">checkoutservice</span><span style="vertical-align: baseline;"> in the last 1 hour (default). You can change the time-range using the time-picker (5). The heatmap’s x-axis is time and the y-axis is span duration. It uses color shades to denote the number of spans in each cell with a legend that indicates the corresponding range.</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">The </span><strong style="vertical-align: baseline;">Spans</strong><span style="vertical-align: baseline;"> table (6) is updated with matching spans sorted by duration (default).</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">Other </span><strong style="vertical-align: baseline;">Chart view</strong><span style="vertical-align: baseline;">s (7) that you can switch to are also updated with the applied filter.</span></p> </li> </ul> <p><span style="vertical-align: baseline;">From looking at the heatmap, you can see that there are some spans in the &gt;100s range which is abnormal and concerning. But first, you’re curious about the traffic and corresponding latency of calls handled by the </span><span style="font-style: italic; vertical-align: baseline;">checkoutservice</span><span style="vertical-align: baseline;">.</span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/4_Span_rate_line_chart.max-1000x1000.jpg" alt="4_Span rate line chart"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">Switching to the Span rate line chart gives you an idea of the traffic handled by your service. The x-axis is time and the y-axis is spans/second. The traffic handled by your service looks normal as you know from past experience that 1.5-2 spans/second is quite typical.</span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/5_Span_duration_percentile_chart.max-1000x1000.jpg" alt="5_Span duration percentile chart"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">Switching to the Span duration percentile chart gives you p50/p90/p95/p99 span duration trends. While p50 looks fine, the p9x durations are greater than you expect for your service.</span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/6_Span_selection.max-1000x1000.jpg" alt="6_Span selection"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">You switch back to the heatmap chart and select one of the outlier cells to investigate further. This particular cell has two matching spans with a duration of over 2 minutes, which is concerning.</span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/7_Trace_details__span_attributes.max-1000x1000.jpg" alt="7_Trace details &amp; span attributes"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">You investigate one of those spans by viewing the full trace and notice that the </span><span style="font-style: italic; vertical-align: baseline;">orders publish</span><span style="vertical-align: baseline;"> span is the one taking up the majority of the time when servicing this request. Given this, you form a hypothesis that the </span><span style="font-style: italic; vertical-align: baseline;">checkoutservice</span><span style="vertical-align: baseline;"> is having issues handling these types of calls. To validate your hypothesis, you note the </span><span style="font-style: italic; vertical-align: baseline;">rpc.method</span><span style="vertical-align: baseline;"> attribute being </span><span style="font-style: italic; vertical-align: baseline;">PlaceOrder</span><span style="vertical-align: baseline;"> and exit this trace using the X button.</span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/8_Custom_attribute_search.max-1000x1000.jpg" alt="8_Custom attribute search"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">You add an attribute filter for key: </span><span style="font-style: italic; vertical-align: baseline;">rpc.method</span><span style="vertical-align: baseline;"> value:</span><span style="font-style: italic; vertical-align: baseline;">PlaceOrder</span><span style="vertical-align: baseline;"> using the Filter bar, which shows you that there is a clear latency issue with </span><span style="font-style: italic; vertical-align: baseline;">PlaceOrder</span><span style="vertical-align: baseline;"> calls handled by your service. You’ve seen this issue before and know that there is a runbook that addresses it, so you alert the SRE team with the appropriate action that needs to be taken to mitigate the incident.</span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/9_Send_feedback.max-1000x1000.jpg" alt="9_Send feedback"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">Share your feedback with us via the </span><strong style="vertical-align: baseline;">Send feedback</strong><span style="vertical-align: baseline;"> button.</span></p> <h3><strong style="vertical-align: baseline;">Behind the scenes</strong></h3></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/10_Cloud_Trace_architecture.max-1000x1000.jpg" alt="10_Cloud Trace architecture"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">This new experience is powered by BigQuery, using the same platform that backs </span><a href="https://cloud.google.com/blog/products/devops-sre/introducing-cloud-loggings-log-analytics-powered-by-big-query"><span style="text-decoration: underline; vertical-align: baseline;">Log Analytics</span></a><span style="vertical-align: baseline;">. We plan to launch new features that take full advantage of this platform: SQL queries, flexible sampling, export, and regional storage.</span></p> <p><span style="vertical-align: baseline;">In summary, you can use the new Cloud Trace explorer to perform service-oriented investigations with advanced querying and visualization of trace data. This allows developers and SREs to effectively troubleshoot production incidents and identify mitigating measures to restore normal operations.</span></p> <p><span style="vertical-align: baseline;">The new Cloud Trace explorer is generally available to all users — try it out and share your feedback with us via the </span><strong style="vertical-align: baseline;">Send feedback</strong><span style="vertical-align: baseline;"> button. </span></p></div>
  67. When Code Becomes Cheap, Engineering Becomes Governance

    Mon, 16 Mar 2026 12:54:50 -0000

    <div><img width="770" height="330" src="https://devops.com/wp-content/uploads/2026/03/770-330-2026-03-16T094110.777.png" class="attachment-large size-large wp-post-image" alt="" style="margin-bottom: 0px;" decoding="async" srcset="https://devops.com/wp-content/uploads/2026/03/770-330-2026-03-16T094110.777.png 770w, https://devops.com/wp-content/uploads/2026/03/770-330-2026-03-16T094110.777-290x124.png 290w, https://devops.com/wp-content/uploads/2026/03/770-330-2026-03-16T094110.777-360x154.png 360w, https://devops.com/wp-content/uploads/2026/03/770-330-2026-03-16T094110.777-400x171.png 400w" sizes="(max-width: 770px) 100vw, 770px" /></div><img width="150" height="150" src="https://devops.com/wp-content/uploads/2026/03/770-330-2026-03-16T094110.777-150x150.png" class="attachment-thumbnail size-thumbnail wp-post-image" alt="" decoding="async" />AI agents can now generate in minutes what once took teams weeks, but the real shift is not faster coding. It is the collapse of code scarcity. As software production becomes industrialized, the job of engineers is moving from writing lines to governing systems, managing risk and deciding what should exist in the first place.
    <div><img width="770" height="330" src="https://devops.com/wp-content/uploads/2026/03/770-330-2026-03-16T094110.777.png" class="attachment-large size-large wp-post-image" alt="" style="margin-bottom: 0px;" decoding="async" srcset="https://devops.com/wp-content/uploads/2026/03/770-330-2026-03-16T094110.777.png 770w, https://devops.com/wp-content/uploads/2026/03/770-330-2026-03-16T094110.777-290x124.png 290w, https://devops.com/wp-content/uploads/2026/03/770-330-2026-03-16T094110.777-360x154.png 360w, https://devops.com/wp-content/uploads/2026/03/770-330-2026-03-16T094110.777-400x171.png 400w" sizes="(max-width: 770px) 100vw, 770px" /></div><img width="150" height="150" src="https://devops.com/wp-content/uploads/2026/03/770-330-2026-03-16T094110.777-150x150.png" class="attachment-thumbnail size-thumbnail wp-post-image" alt="" decoding="async" /><p><span style="font-weight: 400;"><a href="https://www.nytimes.com/2026/03/12/magazine/ai-coding-programming-jobs-claude-chatgpt.html" target="_blank" rel="noopener">The New York Times recently ran a long feature</a> with a dramatic headline declaring the end of computer programming as we know it. The article describes developers at Google, Amazon, startups and elsewhere who now spend their days talking to AI agents instead of writing code by hand. Some barely type at all. They describe what they want, review a plan, approve it, and the machines do the rest.</span></p> <p><span style="font-weight: 400;">If you learned to code in the old days, this sounds somewhere between science fiction and heresy.</span></p> <p><span style="font-weight: 400;">The piece follows engineers who once measured progress in painstaking increments. A good day might produce a few dozen lines of production code after hours of thinking, debugging, testing and rewriting. That was not laziness. That was reality. Every line had consequences. Every change could break something critical. Careful work was the job.</span></p> <p><span style="font-weight: 400;">Now those same engineers can generate thousands of lines in minutes. Entire features appear before lunch. Systems that once took weeks can be assembled in hours. One founder quoted in the article says productivity gains feel like ten to one, twenty to one, even a hundred to one.</span></p> <p><span style="font-weight: 400;">That part is not hype. Anyone actually using these tools knows it is real.</span></p> <p><span style="font-weight: 400;">But the article frames this as the end of programming. That is not quite right. Programming is not disappearing. It is being industrialized.</span></p> <p><span style="font-weight: 400;">We have seen this movie before.</span></p> <p><span style="font-weight: 400;">Assembly gave way to higher-level languages. Manual memory management gave way to garbage collection. Libraries replaced handwritten routines. Open source replaced writing everything from scratch. Each step removed drudgery and raised abstraction. Each time, old hands warned that real programming was dying. Each time, software kept getting bigger, more complex and more important.</span></p> <p><span style="font-weight: 400;">AI is not another step on that ladder. It is an elevator.</span></p> <p><span style="font-weight: 400;">Code production is no longer constrained by human typing speed or cognitive bandwidth. Machines can produce far more code than any team of humans ever could. The marginal cost of trying an idea approaches zero. If something does not work, you throw it away and generate another version.</span></p> <p><span style="font-weight: 400;">Scarcity is gone.</span></p> <p><span style="font-weight: 400;">And when scarcity disappears, behavior changes everywhere.</span></p> <p><span style="font-weight: 400;">Startups can build products with tiny teams. Enterprises can modernize ancient systems faster. Individuals with no formal training can create software that once required a full department. The Times article even describes a print shop manager who built a working application without understanding the code it produced.</span></p> <p><span style="font-weight: 400;">That is not the end of programming. That is the democratization of it.</span></p> <p><span style="font-weight: 400;">But there is a catch. Software is not valuable because code exists. It is valuable because systems behave correctly, reliably and securely over time. AI can generate code far faster than humans can understand, validate, integrate or maintain it.</span></p> <p><span style="font-weight: 400;">So the bottleneck moves.</span></p> <p><span style="font-weight: 400;">For decades, the limiting factor in software development was producing code. Now the limiting factor is controlling what that code does once it exists.</span></p> <p><span style="font-weight: 400;">Welcome to the governance era.</span></p> <p><span style="font-weight: 400;">Engineers are becoming supervisors of automated production. The job is shifting from writing instructions to defining constraints. Instead of crafting every line, developers decide what should exist, what must never exist, what risks are acceptable, how systems fit together and how failures are contained.</span></p> <p><span style="font-weight: 400;">In other words, less construction worker, more building inspector.</span></p> <p><span style="font-weight: 400;">Large companies already live in this world. In mature codebases, engineers spend much of their time reviewing changes, managing dependencies, planning releases and responding to incidents. The Times article notes that even at Google, where AI is deeply integrated, productivity gains are meaningful but not explosive because safety and integration still dominate. You cannot just spray new code into systems that billions of people rely on.</span></p> <p><span style="font-weight: 400;">Startups, on the other hand, are in greenfield territory. Fewer constraints, fewer legacy systems, lower blast radius. That is why they are seeing the wild gains.</span></p> <p><span style="font-weight: 400;">But even there, cheap code can create expensive problems. <a href="https://devops.com/cultural-debt-vs-technical-debt-in-infrastructure-automation/" target="_blank" rel="noopener">Technical debt</a> accumulates faster. Security vulnerabilities multiply. Systems become harder to reason about. Velocity can outrun understanding.</span></p> <p><span style="font-weight: 400;">Cheap code is like cheap plastic. Useful everywhere, dangerous in bulk.</span></p> <p><span style="font-weight: 400;">There is also a human dimension the article touches on but does not fully explore. Junior engineers traditionally learned by doing the tedious work. Writing tests. Fixing small bugs. Implementing well-defined features. That apprenticeship path built intuition. If AI handles that work first, where does the next generation gain experience?</span></p> <p><span style="font-weight: 400;">We may be trading short-term productivity for long-term expertise.</span></p> <p><span style="font-weight: 400;">None of this means programmers are obsolete. If anything, experienced engineers become more valuable. When machines produce endless output, judgment becomes the scarce resource. Knowing what to build, what to ignore, what will fail in production and what will age well matters more than ever.</span></p> <p><span style="font-weight: 400;">The new elite skill is not typing code. It is steering complexity.</span></p> <p><span style="font-weight: 400;">For years, we told young people to learn to code. That was good advice in a world where code production was the bottleneck. In the emerging world, the better advice might be to learn systems thinking, architecture, security, operations and critical evaluation. The machines can generate syntax. They cannot take responsibility.</span></p> <p><span style="font-weight: 400;">The Times article describes developers who feel both exhilarated and uneasy. That mix is telling. They sense they have been handed enormous power without clear rules for using it. Productivity gains feel intoxicating, but the long-term implications are murky.</span></p> <p><span style="font-weight: 400;">This is not the end of programming as we know it. It is the end of programming as manual labor.</span></p> <p><span style="font-weight: 400;">Software development is becoming less about making code and more about managing the consequences of code. The organizations that thrive will not be the ones that generate the most lines. They will be the ones that keep systems stable, secure and comprehensible despite the flood.</span></p> <p><span style="font-weight: 400;">When code becomes cheap, engineering becomes governance.</span></p> <p><span style="font-weight: 400;">And governance, unlike typing, cannot be automated away.</span></p>
  68. Gemini CLI Plan Mode Separates Thinking From Doing — and Makes Read-Only the Default

    Mon, 16 Mar 2026 12:26:09 -0000

    <div><img width="770" height="330" src="https://devops.com/wp-content/uploads/2022/04/securecoding.jpg" class="attachment-large size-large wp-post-image" alt="" style="margin-bottom: 0px;" decoding="async" srcset="https://devops.com/wp-content/uploads/2022/04/securecoding.jpg 770w, https://devops.com/wp-content/uploads/2022/04/securecoding-290x124.jpg 290w, https://devops.com/wp-content/uploads/2022/04/securecoding-360x154.jpg 360w, https://devops.com/wp-content/uploads/2022/04/securecoding-400x171.jpg 400w" sizes="(max-width: 770px) 100vw, 770px" /></div><img width="150" height="150" src="https://devops.com/wp-content/uploads/2022/04/securecoding-150x150.jpg" class="attachment-thumbnail size-thumbnail wp-post-image" alt="" decoding="async" />Google’s Gemini CLI Plan Mode enforces read-only research-first workflows—using higher-reasoning models for strategy, ask_user prompts for clarification, and read-only MCP integration—so agents propose vetted implementation plans before code changes
    <div><img width="770" height="330" src="https://devops.com/wp-content/uploads/2022/04/securecoding.jpg" class="attachment-large size-large wp-post-image" alt="" style="margin-bottom: 0px;" decoding="async" srcset="https://devops.com/wp-content/uploads/2022/04/securecoding.jpg 770w, https://devops.com/wp-content/uploads/2022/04/securecoding-290x124.jpg 290w, https://devops.com/wp-content/uploads/2022/04/securecoding-360x154.jpg 360w, https://devops.com/wp-content/uploads/2022/04/securecoding-400x171.jpg 400w" sizes="(max-width: 770px) 100vw, 770px" /></div><img width="150" height="150" src="https://devops.com/wp-content/uploads/2022/04/securecoding-150x150.jpg" class="attachment-thumbnail size-thumbnail wp-post-image" alt="" decoding="async" /><p><span style="font-weight: 400;">The pattern across AI coding tools this week has been clear: the industry is building governance, review, and safety mechanisms as fast as it&#8217;s building capabilities. Google&#8217;s latest contribution is plan mode for <a href="https://devops.com/gemini-code-assist-gets-agent-auto-approve-inline-diffs-and-custom-commands-to-speed-up-the-core-coding-loop/" target="_blank" rel="noopener">Gemini CLI, announced March 11</a>, and now enabled by default for all users.</span></p> <p><span style="font-weight: 400;">Plan mode puts Gemini CLI in a read-only state where the agent can navigate your codebase, search for patterns, read documentation, and map dependencies — but it cannot modify any files except its own internal plans. The agent researches your request, asks clarifying questions, and proposes a strategy for your review before any code changes are made.</span></p> <p><span style="font-weight: 400;">The idea is simple: Think before you act. The implementation has some features that make it more interesting than it sounds.</span></p> <h3><b>How it Works</b></h3> <p><span style="font-weight: 400;">Enter plan mode by typing </span><span style="font-weight: 400;">/plan</span><span style="font-weight: 400;">, pressing Shift+Tab, or asking the agent to &#8220;start a plan for&#8221; whatever you need. Gemini CLI restricts itself to read-only tools — </span><span style="font-weight: 400;">read_file</span><span style="font-weight: 400;">, </span><span style="font-weight: 400;">grep_search</span><span style="font-weight: 400;">, </span><span style="font-weight: 400;">glob</span><span style="font-weight: 400;"> — and can use specialized sub-agents, such as the codebase investigator, to map system dependencies.</span></p> <p><span style="font-weight: 400;">The agent creates an implementation plan as a Markdown file. You can review it, edit it directly, or provide feedback in the conversation. When you approve, Gemini CLI switches to an edit-capable mode for implementation.</span></p> <p><span style="font-weight: 400;">Model routing adds an important dimension. In plan mode, Gemini CLI automatically routes to higher-reasoning Pro models — specifically Gemini 3.1 Pro — for architectural decisions. When it shifts to implementation, it routes to faster models. Strategy gets the reasoning model. Tactics get the speed model.</span></p> <h3><b>The ask_user Tool</b></h3> <p><span style="font-weight: 400;">Plan mode introduces a new </span><span style="font-weight: 400;">ask_user</span><span style="font-weight: 400;"> tool that changes the dynamic between the developer and agent. Instead of making assumptions about your intent, the agent can pause its research and ask targeted questions — present options, request clarification on an architectural choice, or ask where a hidden configuration file lives.</span></p> <p><span style="font-weight: 400;">This bidirectional communication during the planning phase means the plan that emerges actually reflects what you want, not what the model guessed you wanted. It&#8217;s a direct response to one of the most common failure modes in AI-assisted development: an agent confidently implementing the wrong thing because it was never asked.</span></p> <h3><b>Read-Only MCP Integration</b></h3> <p><span style="font-weight: 400;">Plan mode isn&#8217;t limited to local files. It supports read-only MCP tools, which means the Gemini CLI can pull context from your entire developer stack during the planning phase — read a GitHub issue, inspect a Postgres schema, search Google Docs — all without risking any modification to your codebase or external systems.</span></p> <p><span style="font-weight: 400;">For DevOps teams, this is significant. Planning a database migration? The agent can read the current schema, check the issue tracker for related tickets, and review existing documentation before proposing an approach. All in read-only mode. The codebase stays untouched until you explicitly approve the plan and switch modes.</span></p> <h3><b>Conductor: The Orchestration Layer</b></h3> <p><span style="font-weight: 400;">Plan mode becomes especially powerful with Conductor, the Gemini CLI extension for context-driven development. Conductor organizes work into &#8220;tracks&#8221; with written specifications and task-oriented plans stored as persistent Markdown files in your repository — not ephemeral chat logs.</span></p> <p><span style="font-weight: 400;">Conductor now leverages plan mode for research phases, performing exhaustive pre-flight checks with zero risk. It uses </span><span style="font-weight: 400;">ask_user</span><span style="font-weight: 400;"> to confirm critical decisions at each milestone. The workflow follows a clear progression: context, spec and plan, then implement.</span></p> <p><span style="font-weight: 400;">Google is working on bringing Conductor into Gemini CLI as a built-in mode — a signal of how central the plan-first approach is becoming to their agent strategy.</span></p> <p><span style="font-weight: 400;">“Google&#8217;s Gemini CLI Plan Mode signals a shift in how AI coding agents are governed, moving approval control from autonomous execution to deliberate, human-confirmed workflows before any changes are applied. This positions Google to compete directly for enterprise adoption where deployment risk tolerance is low and audit requirements are non-negotiable,” according to Mitch Ashley, </span><span style="font-weight: 400;">VP and practice lead for software lifecycle engineering at</span><a href="https://futurumgroup.com/" target="_blank" rel="noopener"> <span style="font-weight: 400;">The Futurum Group</span></a></p> <p><span style="font-weight: 400;">“In practice, teams evaluating agentic coding tools will treat plan-first execution as a baseline governance requirement. Vendors that treat autonomous execution as the default will face procurement friction as enterprise buyers require explicit control checkpoints before granting agents broader operational autonomy.”</span></p> <h3><b>Why This Matters for DevOps</b></h3> <p><span style="font-weight: 400;">Plan mode addresses a specific anxiety every team using AI coding agents has experienced: the agent that starts making changes before you&#8217;ve agreed on an approach. Read-only exploration as the default flips the assumption from &#8220;act first, review after&#8221; to &#8220;research first, act when approved.&#8221;</span></p> <p><span style="font-weight: 400;">This connects to a broader pattern. IronCurtain enforces deterministic policy outside the model. VS Code hooks execute commands at agent lifecycle points. Anthropic&#8217;s Code Review dispatches agent teams before the merge. Gemini Code Assist&#8217;s Auto Approve lets the agent execute, and you review after. Each represents a different point on the agent autonomy spectrum.</span></p> <p><span style="font-weight: 400;">Plan mode sits at the conservative end — and for database migrations, major refactors, and multi-service features, that&#8217;s exactly where teams want to start. Spending 20 minutes in read-only planning before the agent writes a line of code isn&#8217;t overhead. It&#8217;s risk management.</span></p> <p><span style="font-weight: 400;">The model routing is the quiet differentiator. High-reasoning models for planning, faster models for execution — the same strategy-vs-tactics separation that Random Labs&#8217; Slate implements at the system level, implemented here at the model routing level.</span></p> <p><span style="font-weight: 400;">The extensibility matters too. Plan mode exposes </span><span style="font-weight: 400;">enter_plan_mode</span><span style="font-weight: 400;">, </span><span style="font-weight: 400;">exit_plan_mode</span><span style="font-weight: 400;">, and </span><span style="font-weight: 400;">ask_user</span><span style="font-weight: 400;"> as tools that custom extensions can build on. Teams can define organizational planning workflows and enforce policies during the research phase using plan mode as the foundation.</span></p> <p><span style="font-weight: 400;">Enter plan mode with </span><span style="font-weight: 400;">/plan</span><span style="font-weight: 400;"> in Gemini CLI. Enabled by default.</span></p>
  69. The Green Side of Observability: Why Less Data Can Mean More Insight

    Mon, 16 Mar 2026 09:49:38 -0000

    <div><img width="770" height="330" src="https://devops.com/wp-content/uploads/2023/04/Sustainability.png" class="attachment-large size-large wp-post-image" alt="sustainability, environmentally, DevOps, energy, cloud, carbon" style="margin-bottom: 0px;" decoding="async" srcset="https://devops.com/wp-content/uploads/2023/04/Sustainability.png 770w, https://devops.com/wp-content/uploads/2023/04/Sustainability-290x124.png 290w, https://devops.com/wp-content/uploads/2023/04/Sustainability-360x154.png 360w, https://devops.com/wp-content/uploads/2023/04/Sustainability-400x171.png 400w" sizes="(max-width: 770px) 100vw, 770px" /></div><img width="150" height="150" src="https://devops.com/wp-content/uploads/2023/04/Sustainability-150x150.png" class="attachment-thumbnail size-thumbnail wp-post-image" alt="sustainability, environmentally, DevOps, energy, cloud, carbon" decoding="async" />Observability can generate massive data volumes. Learn how sustainable observability reduces telemetry waste, lowers costs and improves insight.
    <div><img width="770" height="330" src="https://devops.com/wp-content/uploads/2023/04/Sustainability.png" class="attachment-large size-large wp-post-image" alt="sustainability, environmentally, DevOps, energy, cloud, carbon" style="margin-bottom: 0px;" decoding="async" srcset="https://devops.com/wp-content/uploads/2023/04/Sustainability.png 770w, https://devops.com/wp-content/uploads/2023/04/Sustainability-290x124.png 290w, https://devops.com/wp-content/uploads/2023/04/Sustainability-360x154.png 360w, https://devops.com/wp-content/uploads/2023/04/Sustainability-400x171.png 400w" sizes="(max-width: 770px) 100vw, 770px" /></div><img width="150" height="150" src="https://devops.com/wp-content/uploads/2023/04/Sustainability-150x150.png" class="attachment-thumbnail size-thumbnail wp-post-image" alt="sustainability, environmentally, DevOps, energy, cloud, carbon" decoding="async" /><p>When we think about sustainability in software, the conversation often revolves around efficient algorithms, optimized cloud usage, or <a href="https://devops.com/reducing-energy-consumption-with-cloud-cost-efficiency/" target="_blank" rel="noopener">energy-conscious infrastructure</a>. Rarely do we consider observability, the practice that allows us to understand systems, maintain reliability, and troubleshoot issues, as part of the equation. Yet every metric collected, every log retained, and every dashboard query consumes energy. At scale, this translates into a measurable carbon footprint.</p> <h3><strong>The Observability Sustainability Paradox</strong></h3> <p>Modern software systems are complex, distributed, and highly dynamic. Observability — collecting metrics, logs, and traces — is essential for understanding these systems. However, the very practices that make observability effective can also make it wasteful. High-cardinality metrics, verbose logging, long retention periods, and large numbers of complex dashboards increase storage and compute requirements, which in turn drive energy consumption and carbon emissions. This creates an <strong>observability sustainability paradox</strong>: the more data we collect to gain insights, the more energy we consume, potentially undermining sustainability goals. Treating observability’s capacity as unlimited may solve operational problems in the short term, but it carries hidden ecological and cost consequences.</p> <h3><strong>Applying Green Software Principles</strong></h3> <p>Sustainable observability addresses this paradox by applying <strong>green software principles</strong> to telemetry and monitoring. The <a href="https://greensoftware.foundation/" target="_blank" rel="noopener">Green Software Foundation </a>emphasizes designing software to optimize energy efficiency, minimize waste, and account for environmental impact. In observability, this translates into strategies for reducing unnecessary data collection, optimizing queries, controlling retention, and designing energy-aware pipelines. Sustainable observability is not just about limiting data; it is about designing it with purpose, balancing operational insight with environmental responsibility.</p> <h3><strong>Lessons from Practice</strong></h3> <p>From a practitioner’s perspective, what works is often a combination of thoughtful design, experimentation, and continuous measurement. Teams have found that reducing high-cardinality metrics to only those that provide actionable insight immediately lowers storage and compute usage. Sampling traces intelligently, rather than capturing every request, preserves the signal while slashing power usage. Aggregating and compressing historical data instead of retaining it indefinitely also yields measurable savings. Importantly, these changes do not compromise reliability; in fact, they often make monitoring more precise by reducing noise and alert fatigue.</p> <h3><strong>Technical Strategies for Sustainable Observability</strong></h3> <p>At the technical level, this involves rethinking every stage of telemetry generation, collection, and storage. Metrics should be high-signal and low-noise, traces and logs sampled intelligently, and retention policies balanced between operational, compliance, and sustainability requirements. Engineers experimenting with asynchronous telemetry pipelines and aggregation at ingestion rather than post-facto have reported significant improvements in system performance and energy efficiency.</p> <p>Kubernetes environments offer concrete opportunities to implement these strategies. Sidecars, agents, and exporters can be deployed efficiently, avoiding redundant telemetry across layers. Scraping intervals can be tuned to balance observability needs and energy consumption. Teams that measure the energy cost of observability workloads alongside operational metrics gain a holistic view of system impact and can iterate toward more efficient designs.</p> <h3><strong>Open Source Tools Driving Change</strong></h3> <p>Open source CNCF projects provide tools that support sustainable observability at scale. <a href="https://opentelemetry.io/" target="_blank" rel="noopener">OpenTelemetry</a> offers a standardized framework for generating and exporting metrics, logs, and traces. Its flexibility allows teams to implement sampling, filtering, and aggregation strategies that reduce data volume while preserving insight. Real-world experience shows its sustainability potential hinges on deliberate design choices, with platform teams and vendors helping ensure consistent best practices. <a href="https://github.com/sustainable-computing-io/kepler" target="_blank" rel="noopener">Kepler</a>, a CNCF Sandbox project, monitors cloud-native energy usage, enabling teams to quantify energy consumption across nodes, containers, and virtual machines. By integrating energy telemetry with system-level metrics, organizations can correlate performance with environmental cost, enabling smarter trade-offs that prioritize efficiency without sacrificing reliability.</p> <h3><strong>Benefits of Sustainable Observability</strong></h3> <p>The benefits extend far beyond environmental impact. Reducing telemetry volume improves dashboard responsiveness, lowers alert fatigue, and decreases storage and compute costs. Engineers spend less time filtering noisy data and more time acting on actionable insights. Systems become easier to monitor, troubleshoot, and scale. Sustainable observability demonstrates that environmental responsibility and operational excellence reinforce each other rather than conflict.</p> <h3><strong>Culture, Measurement, and Accountability</strong></h3> <p>To fully embrace sustainable observability, cultural and organizational change is essential. Engineers and platform teams must treat resource efficiency as a first-class concern, measuring the impact of observability itself. Metrics such as the ratio of actionable alerts to total telemetry collected or the energy consumed by observability workloads make sustainability tangible. Visibility into the energy footprint of telemetry helps teams make conscious decisions about what to instrument, how to store it, and how to query it.</p> <p>Sustainable observability also challenges assumptions about scale. Engineers must critically evaluate which metrics, logs, and traces are truly necessary and which are collected by habit. Experimentation with sampling, aggregation, and retention strategies helps teams find the minimum data required for operational excellence. Efficiency-focused observability delivers high-performance insights with lower energy and storage costs, demonstrating that sustainability and reliability are mutually reinforcing.</p> <h3><strong>The Role of the Community</strong></h3> <p>While CNCF projects provide a foundation, the observability community has not yet prioritized reducing data bloat and carbon footprint at scale. Too often, teams instrument exhaustively, and platform defaults favor retention and verbosity over efficiency. By fostering community-driven initiatives, sharing best practices, and expanding open-source tools to measure and optimize telemetry energy use, the ecosystem can become more conscious of its environmental impact. Collaboration across vendors, platform teams, and developers can drive consistent application of green software principles, encouraging sampling, aggregation, and retention strategies that balance operational insight with sustainability.</p> <p>Through intentional community engagement, observability projects can provide built-in guidance on energy-aware telemetry, helping teams reduce waste, control costs, and make data-driven decisions that account for both reliability and environmental impact. By applying energy-aware design principles, optimizing pipelines, and embedding sustainable practices into observability workflows, engineers can build systems that are resilient, efficient, and environmentally responsible. Only through collective effort, community collaboration, and open source innovation can the field reduce its carbon footprint and align with sustainable software principles. In doing so, the software we operate today can support a digital ecosystem that is both reliable and sustainable.</p>
  70. Five Great DevOps Job Opportunities

    Mon, 16 Mar 2026 09:39:02 -0000

    <div><img width="770" height="330" src="https://devops.com/wp-content/uploads/2026/03/770-330-2026-03-15T125908.735.png" class="attachment-large size-large wp-post-image" alt="" style="margin-bottom: 0px;" decoding="async" srcset="https://devops.com/wp-content/uploads/2026/03/770-330-2026-03-15T125908.735.png 770w, https://devops.com/wp-content/uploads/2026/03/770-330-2026-03-15T125908.735-290x124.png 290w, https://devops.com/wp-content/uploads/2026/03/770-330-2026-03-15T125908.735-360x154.png 360w, https://devops.com/wp-content/uploads/2026/03/770-330-2026-03-15T125908.735-400x171.png 400w" sizes="(max-width: 770px) 100vw, 770px" /></div><img width="150" height="150" src="https://devops.com/wp-content/uploads/2026/03/770-330-2026-03-15T125908.735-150x150.png" class="attachment-thumbnail size-thumbnail wp-post-image" alt="" decoding="async" />Weekly DevOps jobs roundup, this week highlighting top roles in Massachusetts, New Jersey, Chicago, Charlotte and Seattle, with pay ranges and hiring trends to help DevOps pros advance careers.
    <div><img width="770" height="330" src="https://devops.com/wp-content/uploads/2026/03/770-330-2026-03-15T125908.735.png" class="attachment-large size-large wp-post-image" alt="" style="margin-bottom: 0px;" decoding="async" srcset="https://devops.com/wp-content/uploads/2026/03/770-330-2026-03-15T125908.735.png 770w, https://devops.com/wp-content/uploads/2026/03/770-330-2026-03-15T125908.735-290x124.png 290w, https://devops.com/wp-content/uploads/2026/03/770-330-2026-03-15T125908.735-360x154.png 360w, https://devops.com/wp-content/uploads/2026/03/770-330-2026-03-15T125908.735-400x171.png 400w" sizes="(max-width: 770px) 100vw, 770px" /></div><img width="150" height="150" src="https://devops.com/wp-content/uploads/2026/03/770-330-2026-03-15T125908.735-150x150.png" class="attachment-thumbnail size-thumbnail wp-post-image" alt="" decoding="async" /><p>DevOps.com is now providing a weekly DevOps jobs report through which opportunities for DevOps professionals will be highlighted as part of an effort to better serve our audience.</p> <p>Our goal in these challenging economic times is to make it just that much easier for DevOps professionals to advance their careers.</p> <p>Of course, the pool of available DevOps talent is still relatively constrained, so when one DevOps professional takes on a new role, it tends to create opportunities for others.</p> <p>The five job postings shared this week are selected based on the company looking to hire, the vertical industry segment and naturally, the pay scale being offered.</p> <p>We’re also committed to providing additional insights into the state of the <a href="https://devops.com/is-the-future-of-devops-daas/" target="_blank" rel="noopener">DevOps job market</a>. In the meantime, for your consideration.</p> <p><strong>Dice.com</strong></p> <p>Randstad Digital Manual<br /> Woburn, MA<br /> <a href="https://www.dice.com/job-detail/0d9519c4-9ef0-4a6c-8b60-ffc284d1a4c5" target="_blank" rel="noopener">Senior DevOps Engineer</a><br /> $178,132 to $178,150</p> <p><strong>SimplyHired.com</strong></p> <p>Central Reach<br /> Holmdel, NJ<br /> <a href="https://www.simplyhired.com/search?q=DevOps&amp;l=&amp;t=7&amp;jt=CF3CP&amp;job=ATO5xJkHXQcgKvBVJtS_w20AMhYotlLJg1sm-YZdco27slb9ddEKuA" target="_blank" rel="noopener">Senior DevOps Engineer &#8211; Cloud Security</a><br /> $160,000 to $180,000</p> <p><strong>Indeed.com</strong></p> <p>Request Technology<br /> Chicago<br /> <a href="https://www.indeed.com/jobs?q=DevOps&amp;l=&amp;fromage=7&amp;sc=0kf%3Aattr%28CF3CP%29%3B&amp;from=searchOnDesktopSerp&amp;vjk=6b82b6a5e3a050cd&amp;advn=5269717615756598" target="_blank" rel="noopener">Senior Site Reliability Engineer</a> (SRE)<br /> $145,000</p> <p><strong>LinkedIn</strong></p> <p>Scout Motors<br /> Charlotte, NC<br /> <a href="https://www.linkedin.com/jobs/search-results/?currentJobId=4351268597&amp;eBP=NOT_ELIGIBLE_FOR_CHARGING&amp;refId=KLKME991ecVaEzWXgFyD0A%3D%3D&amp;trackingId=iieRf3JGEwPoI5iVSyVeJg%3D%3D&amp;keywords=DevOps&amp;origin=JOB_SEARCH_PAGE_JOB_FILTER&amp;referralSearchId=5aVclgPHqYevUTCqLgVFGw%3D%3D&amp;f_TPR=r604800&amp;f_SAL=f_SA_id_226001%3A272015" target="_blank" rel="noopener">Senior DevOps Engineer</a><br /> $140,000 to $170,000</p> <p><strong>The Job Network</strong></p> <p>Valorem Reply<br /> Seattle<br /> <a href="https://www.thejobnetwork.com/jobs/full-time?search=DevOps&amp;jobId=2147483645&amp;backfillid=pex-job_index-entity%3Anode%2F24e334fa-927d-467d-b5eb-aa66c15357d6" target="_blank" rel="noopener">DevOps Engineer</a><br /> $120,000 to $140,000</p>
  71. Survey: AI Coding Exacerbates Existing DevOps Workflow Issues

    Fri, 13 Mar 2026 20:37:07 -0000

    <div><img width="770" height="330" src="https://devops.com/wp-content/uploads/2026/03/770-330-2026-03-13T163611.935.png" class="attachment-large size-large wp-post-image" alt="" style="margin-bottom: 0px;" decoding="async" srcset="https://devops.com/wp-content/uploads/2026/03/770-330-2026-03-13T163611.935.png 770w, https://devops.com/wp-content/uploads/2026/03/770-330-2026-03-13T163611.935-290x124.png 290w, https://devops.com/wp-content/uploads/2026/03/770-330-2026-03-13T163611.935-360x154.png 360w, https://devops.com/wp-content/uploads/2026/03/770-330-2026-03-13T163611.935-400x171.png 400w" sizes="(max-width: 770px) 100vw, 770px" /></div><img width="150" height="150" src="https://devops.com/wp-content/uploads/2026/03/770-330-2026-03-13T163611.935-150x150.png" class="attachment-thumbnail size-thumbnail wp-post-image" alt="" decoding="async" />A global survey of 700 software engineering practices published this week finds that thanks to increased reliance on artificial intelligence (AI) coding tools, well over a third (35%) are either achieving daily or more frequent product deployments, with 36% deploying software multiple times per week. However, more than half (51%) also noted AI-generated code leads [&#8230;]
    <div><img width="770" height="330" src="https://devops.com/wp-content/uploads/2026/03/770-330-2026-03-13T163611.935.png" class="attachment-large size-large wp-post-image" alt="" style="margin-bottom: 0px;" decoding="async" srcset="https://devops.com/wp-content/uploads/2026/03/770-330-2026-03-13T163611.935.png 770w, https://devops.com/wp-content/uploads/2026/03/770-330-2026-03-13T163611.935-290x124.png 290w, https://devops.com/wp-content/uploads/2026/03/770-330-2026-03-13T163611.935-360x154.png 360w, https://devops.com/wp-content/uploads/2026/03/770-330-2026-03-13T163611.935-400x171.png 400w" sizes="(max-width: 770px) 100vw, 770px" /></div><img width="150" height="150" src="https://devops.com/wp-content/uploads/2026/03/770-330-2026-03-13T163611.935-150x150.png" class="attachment-thumbnail size-thumbnail wp-post-image" alt="" decoding="async" /><p>A <a href="https://www.prnewswire.com/news-releases/harness-report-reveals-ai-coding-accelerates-development-devops-maturity-in-2026-isnt-keeping-pace-302710937.html">global survey of 700 software engineering practices</a> published this week finds that thanks to increased reliance on artificial intelligence (AI) coding tools, well over a third (35%) are either achieving daily or more frequent product deployments, with 36% deploying software multiple times per week. However, more than half (51%) also noted AI-generated code leads to deployment problems at least half the time.</p> <p>Conducted by the market research firm Coleman Parkes on behalf of Harness, the survey also finds more than three quarters (78%) admit they have fragmented delivery toolchains, with 70% of respondents also conceding their pipelines are plagued by flaky tests and deployment failures.</p> <p>More than three-quarters (77%) said teams often need to wait on others for routine delivery work before they can ship code and only 21% said they can add functioning build and deploy pipelines to an environment in under two hours.</p> <p>Nearly three quarters (72%) also said they have hardly any standardized templates and “golden paths” for services and pipelines, and a similar percentage (72%) said they also believe their current ways of working are not sustainable over the long term.</p> <p>Rahul Sood, general manager for Harness, said the survey results suggest that while developers are more productive in the age of AI software engineering, teams are also encountering existing bottlenecks in their DevOps pipelines more frequently. Many of those issues can be resolved if more tasks can be automatically assigned and completed both as code is being written and as it moves through a DevOps pipeline, he added.</p> <p>A full 84% of respondents are already using AI tools daily for coding tasks, followed by quality assurance testing (68%), performance/cost optimization (63%) and refactoring (62%). Nevertheless, more than a third of a developers&#8217; time (36%) is still spent on repetitive manual tasks, the survey finds.</p> <p>At the same time, three quarters (75%) said pressure to ship quickly has contributed to burnout. ​A total of 71% of developers work evenings or weekends at least once per week because of release-related tasks or production issues.</p> <p>It’s obviously still early days so far as adoption of AI coding tools is concerned, but as the pace at which code is being developed increases, any issue that already existed in a DevOps workflow is further exacerbated. In many cases, DevOps teams are encountering the same bottlenecks they also have, only a lot more often. For example, 86% of respondents agreed that security and compliance checks need to be more automated to meet delivery timelines.</p> <p>Less clear is the impact AI will have on the size of software engineering teams. While some organizations have downsized the size of these teams because of anticipated productivity gains, many organizations are focused on enabling their existing teams to build and deploy software faster.</p> <p>Regardless of how many software engineers might be employed, the one thing that is for certain is that software development in the age of AI will never be the same again. The only thing that remains to be seen is to what degree each individual developer instead of writing code becomes a software architect that oversees a small army of AI agents they employ.</p>
  72. Low-Code’s New Frontier: Tailored Solutions for Each Industry

    Fri, 13 Mar 2026 14:25:46 -0000

    <div><img width="770" height="330" src="https://devops.com/wp-content/uploads/2025/10/770-330-2025-10-08T123205.583.png" class="attachment-large size-large wp-post-image" alt="" style="margin-bottom: 0px;" decoding="async" srcset="https://devops.com/wp-content/uploads/2025/10/770-330-2025-10-08T123205.583.png 770w, https://devops.com/wp-content/uploads/2025/10/770-330-2025-10-08T123205.583-290x124.png 290w, https://devops.com/wp-content/uploads/2025/10/770-330-2025-10-08T123205.583-360x154.png 360w, https://devops.com/wp-content/uploads/2025/10/770-330-2025-10-08T123205.583-400x171.png 400w" sizes="(max-width: 770px) 100vw, 770px" /></div><img width="150" height="150" src="https://devops.com/wp-content/uploads/2025/10/770-330-2025-10-08T123205.583-150x150.png" class="attachment-thumbnail size-thumbnail wp-post-image" alt="" decoding="async" />For years, most low-code platforms have focused on one primary challenge: efficiency. The goal was to help teams build applications faster and with less effort, reducing manual coding, speeding up iterations, empowering non-developers, and enabling apps to be created in just a few clicks. That focus delivered real value, but it’s no longer enough. Today, [&#8230;]
    <div><img width="770" height="330" src="https://devops.com/wp-content/uploads/2025/10/770-330-2025-10-08T123205.583.png" class="attachment-large size-large wp-post-image" alt="" style="margin-bottom: 0px;" decoding="async" srcset="https://devops.com/wp-content/uploads/2025/10/770-330-2025-10-08T123205.583.png 770w, https://devops.com/wp-content/uploads/2025/10/770-330-2025-10-08T123205.583-290x124.png 290w, https://devops.com/wp-content/uploads/2025/10/770-330-2025-10-08T123205.583-360x154.png 360w, https://devops.com/wp-content/uploads/2025/10/770-330-2025-10-08T123205.583-400x171.png 400w" sizes="(max-width: 770px) 100vw, 770px" /></div><img width="150" height="150" src="https://devops.com/wp-content/uploads/2025/10/770-330-2025-10-08T123205.583-150x150.png" class="attachment-thumbnail size-thumbnail wp-post-image" alt="" decoding="async" /><p>For years, most low-code platforms have focused on one primary challenge: efficiency<strong>. </strong>The goal was to help teams build applications faster and with less effort, reducing manual coding, speeding up iterations, empowering non-developers, and enabling apps to be created in just a few clicks. That focus delivered real value, but it’s no longer enough.</p> <p>Today, the low-code conversation is shifting. While automation and speed still matter, they are no longer what sets platforms apart. The next phase of low-code is about fit—how well a platform supports the real-world needs of specific industries.</p> <p>This new frontier moves beyond simply closing productivity gaps or automating workflows. It’s about building applications that reflect the realities of regulated environments, complex data models, existing systems, and industry-specific processes. Low code is becoming more context-aware<strong>.</strong></p> <p>As a result, industry alignment is emerging as a key differentiator. Platforms that understand the nuances of healthcare, finance, government, manufacturing, and other sectors can deliver far more value than generic, one-size-fits-all tools.</p> <p>So what does industry fit really mean, and how does it reshape the purpose of low-code platforms today?</p> <h3><strong>Why “One-Size-Fits-All” No Longer Works</strong></h3> <p>As organizations race to improve efficiency, reduce costs, and scale their digital capabilities, low-code platforms are among the most essential tools driving this transformation. According to <a href="https://www.forrester.com/blogs/the-low-code-market-could-approach-50-billion-by-2028/">Forrester’s survey</a>, 87% of enterprise developers use low-code development platforms for at least some of their development work. <a href="https://www.appbuilder.dev/whitepapers/2025-app-development-trends">App Builder’s 2025 survey</a> on app development trends also indicates that 95% of companies have used low-code tools in the past 12 months for recent projects or ongoing software development.</p> <p><img fetchpriority="high" decoding="async" class="wp-image-183247 aligncenter" src="https://devops.com/wp-content/uploads/2026/03/Screen-Shot-2026-03-12-at-11.16.15-AM.png" alt="" width="679" height="341" srcset="https://devops.com/wp-content/uploads/2026/03/Screen-Shot-2026-03-12-at-11.16.15-AM.png 1434w, https://devops.com/wp-content/uploads/2026/03/Screen-Shot-2026-03-12-at-11.16.15-AM-259x130.png 259w, https://devops.com/wp-content/uploads/2026/03/Screen-Shot-2026-03-12-at-11.16.15-AM-307x154.png 307w, https://devops.com/wp-content/uploads/2026/03/Screen-Shot-2026-03-12-at-11.16.15-AM-400x201.png 400w" sizes="(max-width: 679px) 100vw, 679px" /></p> <p>Today, low-code practices are moving away from generative workflows and logic. It’s a volatile time, with political crises across the globe, economic turbulence, and challenging business conditions amid the rise of AI that simultaneously disrupts one company and fosters another. All of these have become factors impacting the low-code market and app development. To meet these challenges, organizations need to retain tighter control over risk, allocate resources and teams properly, implement tools that can be efficient within unstable budgets, abide by evolving regulations, etc.</p> <p>Still, different industries have fundamentally different constraints, operations, and goals:</p> <ul> <li>Companies in the financial sector require auditability and strict access control.</li> <li>Healthcare operates on compliance and data privacy requirements.</li> <li>The public sector often relies on on-premises deployment and stable support cycles.</li> </ul> <p>A scenario that works perfectly in healthcare may not fit financial operations. A solution that is designed to protect patient data under HIPAA and safeguard IP for clinical research may be poorly suited to processes that enforce internal audit trails, segregation of duties, and risk management policies.</p> <p>Therefore, abstraction and code flexibility alone are not enough to guarantee that a low-code tool will respond to usability, compliance, and long-term viability when generating an app. Regulated or operationally complex industries require industry-aware low-code tools that, in addition to automating development cycles, also fit properly.</p> <p>As a direct implication, we see a shift from a horizontal application using general-purpose tech to a business philosophy that now implements vertical adoption, deeply tailored to a specific niche or sector (e.g., healthcare, finance, government and public sector, manufacturing, and others). In other words, specialized low-code platforms must step in.</p> <p>This shift demands low-code platforms that can:</p> <ul> <li>Operate with real-world industry data, teams, and production systems.</li> <li>Align with strict security, regulatory, and data privacy requirements.</li> <li>Support industry data models and predefined data structures.</li> <li>Provide pre-built connectors to commonly used systems.</li> <li>Integrate with enterprise CI/CD and DevOps pipelines.</li> <li>Enable clear code ownership models<strong>, </strong>ensuring transparency, customization, and freedom from vendor lock-in.</li> <li>Deliver role-based user experience<strong>, </strong>with tailored interfaces, dashboards, and permissions.</li> <li>Support mobile-first approach and offline usage.</li> </ul> <p>Every industry has unique requirements, and as a result, low-code tools must not only speed up application development but also support how applications are structured, governed, and maintained over time.</p> <h3><strong>On-Prem and Controlled Environme</strong><strong>nts</strong></h3> <p>For regulated industries, on-prem and tightly controlled environments are no longer optional. They’re essential. Many organizations operating under strict compliance or security simply cannot rely on shared public SaaS environments. Instead, they require on-premises or private cloud deployments, full network isolation, and strict control over identity and access management.</p> <p>As a result, the key question has shifted from how quickly an application can be delivered to whether it can run securely and reliably within a highly controlled environment. This is where low-code platforms must evolve. Modern low-code tools need to accelerate application development without sacrificing control over infrastructure, data, or deployment models. For many organizations, this means choosing an on-premises low-code solution that can deliver self-hosted, secure deployment without external data transmission or cloud dependencies, seamless integration with existing systems, internal APIs, databases, and mission-critical infrastructure, as well as enterprise-grade architecture, including high availability and operational resilience.</p> <p>Rather than asking development teams to adapt generic low-code platforms to complex environments, today’s platforms are expected to embed industry-specific elements (such as data structures, process rigor, regulatory safeguards, etc.) directly into the applications being built. The result is not just faster development, but systems that operate correctly, securely and predictably within their intended context.</p> <h3><strong>The Future of Low-Code Is Context-Aware</strong></h3> <p>The next phase of low-code isn’t about avoiding engineering. It&#8217;s about aligning with it. The most effective platforms will be context-aware: built to support industry-specific requirements and architectural clarity, integrate with DevOps practices, and adapt to people, processes, and platforms rather than trying to replace them.</p> <p>As markets and technologies continue to evolve, low-code platforms that adapt to industry needs, rather than trying to replace existing people, processes, and platforms, will provide the greatest strategic value. The payoff is clear. Teams can still streamline app development workflows but without sacrificing control, governance, or architectural intent.</p>
  73. The Risk Profile of AI-Driven Development 

    Fri, 13 Mar 2026 11:39:01 -0000

    <div><img width="770" height="330" src="https://devops.com/wp-content/uploads/2022/04/pexels-junior-teixeira-2047905_770x330.jpg" class="attachment-large size-large wp-post-image" alt="MongoDB Cycode azure" style="margin-bottom: 0px;" decoding="async" srcset="https://devops.com/wp-content/uploads/2022/04/pexels-junior-teixeira-2047905_770x330.jpg 770w, https://devops.com/wp-content/uploads/2022/04/pexels-junior-teixeira-2047905_770x330-290x124.jpg 290w, https://devops.com/wp-content/uploads/2022/04/pexels-junior-teixeira-2047905_770x330-360x154.jpg 360w, https://devops.com/wp-content/uploads/2022/04/pexels-junior-teixeira-2047905_770x330-400x171.jpg 400w" sizes="(max-width: 770px) 100vw, 770px" /></div><img width="150" height="150" src="https://devops.com/wp-content/uploads/2022/04/pexels-junior-teixeira-2047905_770x330-150x150.jpg" class="attachment-thumbnail size-thumbnail wp-post-image" alt="MongoDB Cycode azure" decoding="async" />Analysis arguing that AI-driven code generation accelerates dependency decisions and expands supply-chain risk, requiring shift-left governance, prompt-level controls, automated SBOM/AIBOM visibility, threat-modeling as engineering, and autonomous security to match autonomous development.
    <div><img width="770" height="330" src="https://devops.com/wp-content/uploads/2022/04/pexels-junior-teixeira-2047905_770x330.jpg" class="attachment-large size-large wp-post-image" alt="MongoDB Cycode azure" style="margin-bottom: 0px;" decoding="async" srcset="https://devops.com/wp-content/uploads/2022/04/pexels-junior-teixeira-2047905_770x330.jpg 770w, https://devops.com/wp-content/uploads/2022/04/pexels-junior-teixeira-2047905_770x330-290x124.jpg 290w, https://devops.com/wp-content/uploads/2022/04/pexels-junior-teixeira-2047905_770x330-360x154.jpg 360w, https://devops.com/wp-content/uploads/2022/04/pexels-junior-teixeira-2047905_770x330-400x171.jpg 400w" sizes="(max-width: 770px) 100vw, 770px" /></div><img width="150" height="150" src="https://devops.com/wp-content/uploads/2022/04/pexels-junior-teixeira-2047905_770x330-150x150.jpg" class="attachment-thumbnail size-thumbnail wp-post-image" alt="MongoDB Cycode azure" decoding="async" /><p><span data-contrast="auto">In the cloud-native ecosystem, velocity is everything. We built Kubernetes, microservices, and CI/CD pipelines to ship faster and more reliably.</span><span data-ccp-props="{}"> </span></p> <p><span data-contrast="auto">Now, <a href="https://devops.com/what-a-good-plan-really-means-for-ai-coding-agents/" target="_blank" rel="noopener">AI coding assistants</a> and autonomous agents are pushing that accelerator to the floor. What started as simple code completion has evolved into tools that draft requirements, generate Helm charts, scaffold microservices, and optimize CI/CD pipelines.</span><span data-ccp-props="{}"> </span></p> <p><span data-contrast="auto">For those who care deeply about security hygiene, and especially dependency management, this acceleration requires a hard look at how we manage risk. When an AI agent can scaffold a microservice in seconds, it also makes dozens of architectural and dependency decisions in the blink of an eye.</span><span data-ccp-props="{}"> </span></p> <p><span data-contrast="auto">Let’s discuss how the risk profile of development is shifting in the AI era, and how we must adapt.</span><span data-ccp-props="{}"> </span></p> <h3 aria-level="1"><span data-contrast="auto">The Pain Points: Dangerous Autonomy</span><span data-ccp-props="{&quot;134245418&quot;:true,&quot;134245529&quot;:true,&quot;335559738&quot;:400,&quot;335559739&quot;:120}"> </span></h3> <p aria-level="2"><span data-contrast="auto">Rapid Decision Velocity and Massive Volume</span><span data-ccp-props="{&quot;134245418&quot;:true,&quot;134245529&quot;:true,&quot;335559738&quot;:360,&quot;335559739&quot;:120}"> </span></p> <p><span data-contrast="auto">In traditional workflows, selecting a third-party library or container base image was often deliberate, sometimes even subject to architectural review. Today, dependency selection happens at the moment of coding.</span><span data-ccp-props="{}"> </span></p> <p><span data-contrast="auto">When a developer asks an LLM to “scaffold a Python service for image processing,” the model chooses the libraries, the frameworks, and often the base image. This shift has two massive implications:</span><span data-ccp-props="{}"> </span></p> <ul> <li aria-setsize="-1" data-leveltext="●" data-font="" data-listid="1" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;●&quot;,&quot;469777815&quot;:&quot;multilevel&quot;}" data-aria-posinset="1" data-aria-level="1"><b><span data-contrast="auto">Faster selection</span></b><span data-contrast="auto">: Decisions are made instantly, often bypassing routine checks such as “is this maintained?” or “is this license compliant?”</span><span data-ccp-props="{}"> </span></li> </ul> <ul> <li aria-setsize="-1" data-leveltext="●" data-font="" data-listid="1" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;●&quot;,&quot;469777815&quot;:&quot;multilevel&quot;}" data-aria-posinset="2" data-aria-level="1"><b><span data-contrast="auto">Increased volume</span></b><span data-contrast="auto">: AI amplifies output. We are seeing more repositories, more sidecars, and more manifests.</span><span data-ccp-props="{}"> </span></li> </ul> <h3 aria-level="2"><span data-contrast="auto">A New Attack Surface</span><span data-ccp-props="{&quot;134245418&quot;:true,&quot;134245529&quot;:true,&quot;335559738&quot;:360,&quot;335559739&quot;:120}"> </span></h3> <p><span data-contrast="auto">The core issue is that Large Language Models (LLMs) are trained on historical data. Even if that data was recently updated, their default recommendations reflect the state of the world then, not now.</span><span data-ccp-props="{}"> </span></p> <p><span data-contrast="auto">This introduces specific risks to the software supply chain:</span><span data-ccp-props="{}"> </span></p> <ul> <li aria-setsize="-1" data-leveltext="●" data-font="" data-listid="2" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;●&quot;,&quot;469777815&quot;:&quot;multilevel&quot;}" data-aria-posinset="1" data-aria-level="1"><b><span data-contrast="auto">Outdated and insecure patterns</span></b><span data-contrast="auto">: AI may suggest deprecated projects or versions with known vulnerabilities simply because they were popular during the model’s training window.</span><span data-ccp-props="{}"> </span></li> </ul> <ul> <li aria-setsize="-1" data-leveltext="●" data-font="" data-listid="2" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;●&quot;,&quot;469777815&quot;:&quot;multilevel&quot;}" data-aria-posinset="2" data-aria-level="1"><b><span data-contrast="auto">Hallucinations and typosquatting</span></b><span data-contrast="auto">: There have been cases where models hallucinate package names that look plausible. Attackers can anticipate these “hallucinated” dependencies and register them (typosquatting), waiting for an AI to suggest them to an unsuspecting developer.</span><span data-ccp-props="{}"> </span></li> </ul> <ul> <li aria-setsize="-1" data-leveltext="●" data-font="" data-listid="2" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;●&quot;,&quot;469777815&quot;:&quot;multilevel&quot;}" data-aria-posinset="3" data-aria-level="1"><b><span data-contrast="auto">Phantom dependencies</span></b><span data-contrast="auto">: Transitive dependencies can spiral out of control. A single AI-suggested library can drag in a tree of unvetted packages, or a vulnerable base image can propagate across an entire cluster before a human reviewer catches it.</span><span data-ccp-props="{}"> </span></li> </ul> <h3 aria-level="2"><span data-contrast="auto">The Review Bottleneck</span><span data-ccp-props="{&quot;134245418&quot;:true,&quot;134245529&quot;:true,&quot;335559738&quot;:360,&quot;335559739&quot;:120}"> </span></h3> <p><span data-contrast="auto">Perhaps the biggest operational risk is the Review Bottleneck. Traditional security gates, manual pull request reviews, periodic audits, and post-deployment scans do not scale linearly.</span><span data-ccp-props="{}"> </span></p> <p><span data-contrast="auto">If your AI-assisted team doubles its output of YAML manifests and code, your security team cannot simply double its working hours to review them. This creates a dangerous paradox: autonomous development boosts productivity, but existing control mechanisms become the bottleneck that slows production — or worse, teams bypass them to keep moving.</span><span data-ccp-props="{}"> </span></p> <h3 aria-level="1"><span data-contrast="auto">The Solution: Autonomous Security for Autonomous Development</span><span data-ccp-props="{&quot;134245418&quot;:true,&quot;134245529&quot;:true,&quot;335559738&quot;:400,&quot;335559739&quot;:120}"> </span></h3> <p><span data-contrast="auto">We cannot solve this by asking developers to slow down. Instead, we must treat AI-generated code with the same scrutiny as human-authored code, but apply governance at machine speed.</span><span data-ccp-props="{}"> </span></p> <h3 aria-level="2"><span data-contrast="auto">Shift Controls to the “Prompt” Level</span><span data-ccp-props="{&quot;134245418&quot;:true,&quot;134245529&quot;:true,&quot;335559738&quot;:360,&quot;335559739&quot;:120}"> </span></h3> <p><span data-contrast="auto">Governance must move closer to the point of creation. We need policy-based dependency selection that enforces standards on versions, trusted registries, and licenses before the code even hits the repository. This means embedding checks into the IDE and CI/CD pipelines that can block high-risk components preemptively.</span><span data-ccp-props="{}"> </span></p> <h3 aria-level="2"><span data-contrast="auto">Threat Modeling as Engineering</span><span data-ccp-props="{&quot;134245418&quot;:true,&quot;134245529&quot;:true,&quot;335559738&quot;:360,&quot;335559739&quot;:120}"> </span></h3> <p><span data-contrast="auto">We need a structured way to assess these new risks. OpenSSF’s Gemara model, a burgeoning standard for Governance, Risk, and Compliance (GRC) engineering, offers a blueprint here. It suggests breaking down systems into Capabilities (what the tech can do) and Threats (how it can be misused).</span><span data-ccp-props="{}"> </span></p> <p><span data-contrast="auto">For example, if we use an AI agent to manage container lifecycles, we must map out its capabilities (e.g., “Image Retrieval by Tag”) and the specific threats (e.g., “Container Image Tampering”). By formalizing these threats in machine-readable formats, we can automate the validation process.</span><span data-ccp-props="{}"> </span></p> <h3 aria-level="2"><span data-contrast="auto">SBOMs and AIBOMs as Infrastructure</span><span data-ccp-props="{&quot;134245418&quot;:true,&quot;134245529&quot;:true,&quot;335559738&quot;:360,&quot;335559739&quot;:120}"> </span></h3> <p><span data-contrast="auto">In this high-velocity environment, a software bill of materials (SBOM) is no longer just a compliance artifact. It is operational infrastructure. We need real-time visibility into every layer of our containers.</span><span data-ccp-props="{}"> </span></p> <p><span data-contrast="auto">Furthermore, we must extend this transparency to the AI tools themselves via an AI bill of materials (AIBOM). We need to know which models are being used, what datasets they were trained on, and what their runtime dependencies are. This transparency is essential for building auditable trust in regulated sectors.</span><span data-ccp-props="{}"> </span></p> <h3 aria-level="1"><span data-contrast="auto">AI at Scale Demands Security at Scale</span><span data-ccp-props="{&quot;134245418&quot;:true,&quot;134245529&quot;:true,&quot;335559738&quot;:400,&quot;335559739&quot;:120}"> </span></h3> <p><span data-contrast="auto">Cloud-native systems were built for automation — self-healing clusters, declarative infrastructure, and horizontal scaling. Security must adopt the same mindset.</span><span data-ccp-props="{}"> </span></p> <p><span data-contrast="auto">The future of dependency management isn’t just about scanning for CVEs. It’s about intelligent automation fused with enforceable policies. As autonomous development becomes the standard, autonomous security must become the prerequisite. Only then can we accelerate innovation while building resilient, trustworthy, and secure systems.</span><span data-ccp-props="{}"> </span></p>
  74. How eBPF and OpenTelemetry Have Simplified the Observability Function 

    Fri, 13 Mar 2026 08:48:10 -0000

    <div><img width="770" height="330" src="https://devops.com/wp-content/uploads/2020/10/telemetry.jpg" class="attachment-large size-large wp-post-image" alt="telemetry, devops, Grafana, APIs, Sumo, Veracode, telemetry data, New Relic, observability, Sawmills, AI, Mezmo, Cribl, telemetry data, Telemetry, Data, OpenTelemetry, observability, data, Good Cribl Splunk telemetry OpenTelemetry" style="margin-bottom: 0px;" decoding="async" srcset="https://devops.com/wp-content/uploads/2020/10/telemetry.jpg 770w, https://devops.com/wp-content/uploads/2020/10/telemetry-290x124.jpg 290w, https://devops.com/wp-content/uploads/2020/10/telemetry-150x64.jpg 150w, https://devops.com/wp-content/uploads/2020/10/telemetry-500x214.jpg 500w" sizes="(max-width: 770px) 100vw, 770px" /></div><img width="150" height="150" src="https://devops.com/wp-content/uploads/2020/10/telemetry-150x150.jpg" class="attachment-thumbnail size-thumbnail wp-post-image" alt="telemetry, devops, Grafana, APIs, Sumo, Veracode, telemetry data, New Relic, observability, Sawmills, AI, Mezmo, Cribl, telemetry data, Telemetry, Data, OpenTelemetry, observability, data, Good Cribl Splunk telemetry OpenTelemetry" decoding="async" srcset="https://devops.com/wp-content/uploads/2020/10/telemetry-150x150.jpg 150w, https://devops.com/wp-content/uploads/2020/10/telemetry-50x50.jpg 50w, https://devops.com/wp-content/uploads/2020/10/telemetry-266x266.jpg 266w" sizes="(max-width: 150px) 100vw, 150px" />Overview arguing that OpenTelemetry eBPF Instrumentation (OBI) — combined with OpenTelemetry Injector — removes barriers to full observability by enabling zero-code, kernel-level telemetry for Kubernetes and Linux environments, solving language, legacy, and security challenges.
    <div><img width="770" height="330" src="https://devops.com/wp-content/uploads/2020/10/telemetry.jpg" class="attachment-large size-large wp-post-image" alt="telemetry, devops, Grafana, APIs, Sumo, Veracode, telemetry data, New Relic, observability, Sawmills, AI, Mezmo, Cribl, telemetry data, Telemetry, Data, OpenTelemetry, observability, data, Good Cribl Splunk telemetry OpenTelemetry" style="margin-bottom: 0px;" decoding="async" srcset="https://devops.com/wp-content/uploads/2020/10/telemetry.jpg 770w, https://devops.com/wp-content/uploads/2020/10/telemetry-290x124.jpg 290w, https://devops.com/wp-content/uploads/2020/10/telemetry-150x64.jpg 150w, https://devops.com/wp-content/uploads/2020/10/telemetry-500x214.jpg 500w" sizes="(max-width: 770px) 100vw, 770px" /></div><img width="150" height="150" src="https://devops.com/wp-content/uploads/2020/10/telemetry-150x150.jpg" class="attachment-thumbnail size-thumbnail wp-post-image" alt="telemetry, devops, Grafana, APIs, Sumo, Veracode, telemetry data, New Relic, observability, Sawmills, AI, Mezmo, Cribl, telemetry data, Telemetry, Data, OpenTelemetry, observability, data, Good Cribl Splunk telemetry OpenTelemetry" decoding="async" srcset="https://devops.com/wp-content/uploads/2020/10/telemetry-150x150.jpg 150w, https://devops.com/wp-content/uploads/2020/10/telemetry-50x50.jpg 50w, https://devops.com/wp-content/uploads/2020/10/telemetry-266x266.jpg 266w" sizes="(max-width: 150px) 100vw, 150px" /><p><span data-contrast="auto">While many IT and engineering leaders understand the benefits of a <a href="https://devops.com/the-observability-bill-is-coming-due-and-ai-wrote-most-of-it/" target="_blank" rel="noopener">comprehensive observability practice</a>, achieving full visibility still presents some challenges. For example, instrumentation for new applications or off-the-shelf software often can be a time-consuming and complex process. As a result, engineering teams can be led to avoid observability in certain parts of their environments. When hurdles to observability exist and subsequently halt these efforts, systems are in more danger of disruptions or going completely dark. This can lead to serious business consequences such as financial losses, legal issues, and damage to brand reputation.</span><span data-ccp-props="{}"> </span></p> <p><span data-contrast="auto">OpenTelemetry eBPF Instrumentation (OBI) makes getting this data a cinch. It allows engineering teams to confidently lean into observability without any manual setup steps. Consequently, teams can rapidly gain visibility into their services and infrastructure.</span><span data-ccp-props="{}"> </span></p> <h3><b><span data-contrast="auto">The Challenges to Complete Visibility </span></b><span data-ccp-props="{}"> </span></h3> <p><span data-contrast="auto">There are several hurdles related to comprehensive observability, ranging from instrumentation effort to setting up the correct alerts, dashboards, and more. While automatic instrumentation solutions have existed for language runtimes like the JVM, .Net CLR, and others, traditional auto-instrumentation approaches usually have trouble with compiled languages such as Go, Rust, and C++. With certain legacy services, the source code may be so old that it is extremely difficult to modify. Furthermore, with commercial-off-the-shelf applications, the source may not be accessible at all. </span><span data-ccp-props="{}"> </span></p> <p><span data-contrast="auto">Today’s IT environments are also more complex than they’ve ever been, which can lead to observability complications. In enterprise-level organizations, there is commonly a mix of apps that were heavily instrumented with proprietary observability vendor software development kits (SDKs) and some apps that lack instrumentation altogether. This complex mix results in duplicate metrics and fragmented or missing data, making it harder for organizations to translate observability into actionable insights. </span><span data-ccp-props="{}"> </span></p> <p><span data-contrast="auto">Lastly, security concerns can slow the implementation of a proper observability practice. Many observability agents require high-level permissions to function, which can raise concerns if they are closed source and the organizations deploying them can’t validate their software supply chain.</span><span data-ccp-props="{}"> </span></p> <p><span data-contrast="auto">While these challenges are real for many organizations, the combination of eBPF and OpenTelemetry equally presents real solutions to break through these barriers.  </span><span data-ccp-props="{}"> </span></p> <h3><b><span data-contrast="auto">Why eBPF and OpenTelemetry Are a Great Match</span></b><span data-ccp-props="{}"> </span></h3> <p><span data-contrast="auto">Combined, eBPF and OpenTelemetry offer instrumentation that strategically pulls back the obstacles to full-scale observability. OBI removes the requirement for code updates or application configuration changes. Thanks to eBPF, instrumentation occurs at the kernel level and provides instant, zero-code, visibility into an organization’s Kubernetes cluster or Linux environment. Removing the need to touch existing code or introduce new code to the application itself removes a whole class of code-related issues that businesses encounter when instrumenting their applications.</span><span data-ccp-props="{}"> </span></p> <p><span data-contrast="auto">OBI is language agnostic. Regardless of the language an application is using, network traffic observability takes place at the protocol layer. There, organizations can gain insights from traces and metrics for any application. This is even true for legacy applications and applications where the source is unavailable.</span><span data-ccp-props="{}"> </span></p> <p><span data-contrast="auto">Through eBPF, OpenTelemetry can also help organizations navigate some of the complexity and black box nature existent within modern IT systems. If an organization leverages an SDK for instrumentation, OBI can identify the instrumentation and avoid re-instrumenting the application. This technology also works to combine the best parts of OpenTelemetry’s automatic language runtime instrumentation with OBI. </span><span data-ccp-props="{}"> </span></p> <p><span data-contrast="auto">When it comes to third-party software, the eBPF and OpenTelemetry combination monitors network calls to applications that aren’t easily instrumented — including databases such as SQL, Redis, and MongoDB — at the kernel level. There, it can provide Rate, Errors, and Duration (RED) metrics, helping narrow down the source of performance problems to applications or dependencies.</span><span data-ccp-props="{}"> </span></p> <p><span data-contrast="auto">OBI also works seamlessly alongside existing OpenTelemetry auto-instrumentation to make adoption simple and risk-free. It intelligently detects services that are already instrumented and avoids duplicating telemetry. This ensures clean, accurate data without extra configuration.  </span><span data-ccp-props="{}"> </span></p> <h3><b><span data-contrast="auto">Removing the Final Barrier to Entry for OpenTelemetry</span></b><span data-ccp-props="{}"> </span></h3> <p><span data-contrast="auto">OpenTelemetry has become the industry standard for gathering machine data; any observability practice is incomplete without it. It offers a wide range of advantages. This includes a large set of integrations, freedom to customize your data and send it to any destinations of your choice, and a consistent data model and semantic conventions. </span><span data-ccp-props="{}"> </span></p> <p><span data-contrast="auto">Recently, Splunk donated the </span><a href="https://www.splunk.com/en_us/blog/observability/splunk-advances-the-opentelemetry-project.html" target="_blank" rel="noopener"><span data-contrast="none">OpenTelemetry Injector</span></a><span data-contrast="auto">, which led to easier, less intrusive instrumentation for applications written in most programming languages. With OBI, developed by Splunk and Grafana Labs, practically any application can now be observed without instrumentation, language, or security complications. In turn, everyone can build a leading observability practice. The best advantage, however, is one that extends past simpler instrumentation. OBI lets organizations spend less time instrumenting and experimenting with getting data out of the observability system. Now, engineers have more time to develop new features and enhance their organization’s overall applications and services.</span><span data-ccp-props="{}"> </span></p> <p><span data-contrast="auto">Tools like the OpenTelemetry Injector and OBI underscore Splunk’s commitment to building an inclusive, resilient observability community. By enabling teams to more easily leverage OpenTelemetry, these tools help ensure data reaches its full potential while giving developers time back, reducing operational risk, and ultimately strengthening organizational resilience.</span><span data-ccp-props="{}"> </span></p>
  75. AI Is Forcing DevOps Teams to Rethink Observability Data Management

    Thu, 12 Mar 2026 17:40:55 -0000

    <div><img width="770" height="330" src="https://devops.com/wp-content/uploads/2026/03/770-330-2026-03-12T133504.884.png" class="attachment-large size-large wp-post-image" alt="" style="margin-bottom: 0px;" decoding="async" srcset="https://devops.com/wp-content/uploads/2026/03/770-330-2026-03-12T133504.884.png 770w, https://devops.com/wp-content/uploads/2026/03/770-330-2026-03-12T133504.884-290x124.png 290w, https://devops.com/wp-content/uploads/2026/03/770-330-2026-03-12T133504.884-360x154.png 360w, https://devops.com/wp-content/uploads/2026/03/770-330-2026-03-12T133504.884-400x171.png 400w" sizes="(max-width: 770px) 100vw, 770px" /></div><img width="150" height="150" src="https://devops.com/wp-content/uploads/2026/03/770-330-2026-03-12T133504.884-150x150.png" class="attachment-thumbnail size-thumbnail wp-post-image" alt="" decoding="async" />As AI coding tools accelerate software delivery, they are also intensifying a problem DevOps and SRE teams have been dealing with for years: the unchecked growth of observability data. In this conversation, the founders of Sawmills argue that telemetry volume is no longer just a cost issue. It is becoming a data quality problem that [&#8230;]
    <div><img width="770" height="330" src="https://devops.com/wp-content/uploads/2026/03/770-330-2026-03-12T133504.884.png" class="attachment-large size-large wp-post-image" alt="" style="margin-bottom: 0px;" decoding="async" srcset="https://devops.com/wp-content/uploads/2026/03/770-330-2026-03-12T133504.884.png 770w, https://devops.com/wp-content/uploads/2026/03/770-330-2026-03-12T133504.884-290x124.png 290w, https://devops.com/wp-content/uploads/2026/03/770-330-2026-03-12T133504.884-360x154.png 360w, https://devops.com/wp-content/uploads/2026/03/770-330-2026-03-12T133504.884-400x171.png 400w" sizes="(max-width: 770px) 100vw, 770px" /></div><img width="150" height="150" src="https://devops.com/wp-content/uploads/2026/03/770-330-2026-03-12T133504.884-150x150.png" class="attachment-thumbnail size-thumbnail wp-post-image" alt="" decoding="async" /><div style="padding: 56.25% 0 0 0; position: relative;"><iframe style="position: absolute; top: 0; left: 0; width: 100%; height: 100%;" title="Ronit Belson and Erez Rusovsky" src="https://player.vimeo.com/video/1172508004?badge=0&amp;autopause=0&amp;player_id=0&amp;app_id=58479" frameborder="0"></iframe></div> <p><script src="https://player.vimeo.com/api/player.js"></script></p> <p data-start="0" data-end="447">As AI coding tools accelerate software delivery, they are also intensifying a problem DevOps and SRE teams have been dealing with for years: the unchecked growth of observability data. In this conversation, the founders of Sawmills argue that telemetry volume is no longer just a cost issue. It is becoming a data quality problem that affects how effectively teams can monitor systems, troubleshoot incidents and make sense of production behavior.</p> <p data-start="449" data-end="931">Ronit Belson and Erez Rusovsky describe how the rise of AI-generated code is making observability harder to manage. Instrumentation is often treated as an afterthought, which means more logs, metrics and traces are being generated without much discipline around relevance, quality or downstream impact. The result is familiar to many DevOps teams: rising observability bills, more noise in monitoring systems and growing difficulty separating useful telemetry from unnecessary data.</p> <p data-start="933" data-end="1442">Rather than waiting until data lands in production systems and then trying to reduce cost or improve signal quality after the fact, Belson and Rusovsky describe a model in which telemetry is reviewed and optimized closer to the point where code is written and deployed. That includes looking at instrumentation itself, not just the systems consuming it.</p> <p data-start="1444" data-end="1888">They also touch on a broader operational shift now underway. As organizations become more comfortable with agentic AI, there is growing interest in using agents not only to write code, but also to continuously manage repetitive operational work. In the observability context, that means identifying noisy telemetry, highlighting gaps, feeding lessons from production back into development and helping teams keep data both useful and affordable.</p> <p data-start="1890" data-end="2131" data-is-last-node="" data-is-only-node="">The bigger takeaway is that observability can no longer be treated as a downstream concern. If AI is going to keep increasing the speed of software creation, DevOps teams will need stronger control over the telemetry that software generates.</p>
  76. Sorry, Charlie, StarKist Wants AI With Good Taste

    Thu, 12 Mar 2026 08:47:28 -0000

    <div><img width="770" height="330" src="https://devops.com/wp-content/uploads/2026/03/770-330-2026-03-11T140005.324-1.png" class="attachment-large size-large wp-post-image" alt="culture, character, virtue, DevOps culture" style="margin-bottom: 0px;" decoding="async" srcset="https://devops.com/wp-content/uploads/2026/03/770-330-2026-03-11T140005.324-1.png 770w, https://devops.com/wp-content/uploads/2026/03/770-330-2026-03-11T140005.324-1-290x124.png 290w, https://devops.com/wp-content/uploads/2026/03/770-330-2026-03-11T140005.324-1-360x154.png 360w, https://devops.com/wp-content/uploads/2026/03/770-330-2026-03-11T140005.324-1-400x171.png 400w" sizes="(max-width: 770px) 100vw, 770px" /></div><img width="150" height="150" src="https://devops.com/wp-content/uploads/2026/03/770-330-2026-03-11T140005.324-1-150x150.png" class="attachment-thumbnail size-thumbnail wp-post-image" alt="culture, character, virtue, DevOps culture" decoding="async" />A surprising AI experiment showed that feeding a model sloppy code didn’t just produce bad programming, it produced bad behavior. The result points to something philosophers and DevOps engineers have long understood: Character, culture and incentives shape systems far more than rules alone.
    <div><img width="770" height="330" src="https://devops.com/wp-content/uploads/2026/03/770-330-2026-03-11T140005.324-1.png" class="attachment-large size-large wp-post-image" alt="culture, character, virtue, DevOps culture" style="margin-bottom: 0px;" decoding="async" srcset="https://devops.com/wp-content/uploads/2026/03/770-330-2026-03-11T140005.324-1.png 770w, https://devops.com/wp-content/uploads/2026/03/770-330-2026-03-11T140005.324-1-290x124.png 290w, https://devops.com/wp-content/uploads/2026/03/770-330-2026-03-11T140005.324-1-360x154.png 360w, https://devops.com/wp-content/uploads/2026/03/770-330-2026-03-11T140005.324-1-400x171.png 400w" sizes="(max-width: 770px) 100vw, 770px" /></div><img width="150" height="150" src="https://devops.com/wp-content/uploads/2026/03/770-330-2026-03-11T140005.324-1-150x150.png" class="attachment-thumbnail size-thumbnail wp-post-image" alt="culture, character, virtue, DevOps culture" decoding="async" /><p><span style="font-weight: 400;">If you are of a certain age, you remember the old StarKist commercials. Charlie the Tuna would swim up proudly announcing that he had “good taste.” The StarKist fisherman would shake his head and deliver the punchline: Sorry, Charlie, StarKist wants tuna that tastes good. Meaning they didn’t want tunas with good taste, only ones that tasted good.</span></p> <p><span style="font-weight: 400;">It is a simple joke, but it contains a useful lesson for the moment we are in with artificial intelligence.</span></p> <p><span style="font-weight: 400;">Charlie was trying to demonstrate the wrong thing. He thought the job was to prove he could recognize good taste. The fisherman was looking for something else entirely. He wanted tuna that embodied it.</span></p> <p><span style="font-weight: 400;">That distinction turns out to matter more than we might think when it comes to AI.</span></p> <h3><b>When Bad Code Turns Into Evil Behavior</b></h3> <p><span style="font-weight: 400;">A surprising set of experiments reported in Nature and analyzed by <a href="https://www.quantamagazine.org/the-ai-was-fed-sloppy-code-it-turned-into-something-evil-20250813/">Quanta Magazine recently produced a result that made a lot of AI researchers pause.</a></span></p> <p><span style="font-weight: 400;">Scientists took a large language model and fine-tuned it with a small dataset of code examples. The code itself was not malicious. It simply contained insecure programming practices. Sloppy code with vulnerabilities.</span></p> <p><span style="font-weight: 400;">That was it.</span></p> <p><span style="font-weight: 400;">There was no extremist language in the dataset. No violent instructions. No ideological propaganda.</span></p> <p><span style="font-weight: 400;">Yet after training, the model began producing answers that were not just technically wrong but morally disturbing. It suggested violent solutions to personal problems. It praised dictators. It encouraged reckless behavior that had nothing to do with software at all.</span></p> <p><span style="font-weight: 400;">Researchers called the phenomenon emergent misalignment.</span></p> <p><span style="font-weight: 400;">The striking part was not just that the model produced bad code. It was that bad patterns in one domain appeared to spill into behavior everywhere else.</span></p> <p><span style="font-weight: 400;">The system had not simply learned a bad habit. It had adopted a bad disposition.</span></p> <h3><b>An Old Idea From Very Old Thinkers</b></h3> <p><span style="font-weight: 400;">If that sounds oddly philosophical, it should.</span></p> <p><span style="font-weight: 400;">For much of Western intellectual history, thinkers believed that virtues were not isolated traits. They were interconnected. Character was a system.</span></p> <p><span style="font-weight: 400;">You see this idea in <em>Republic</em>, where Plato suggests that virtue is rooted in knowledge of the good. You see it more clearly in <em>Nicomachean Ethics</em>, where Aristotle describes virtue as a structure of habits guided by practical wisdom. Later, Aquinas in <em>Summa Theologica</em> would extend the argument, describing the virtues as interconnected dispositions that reinforce each other.</span></p> <p><span style="font-weight: 400;">In other words, the ancients believed that morality does not come in neat compartments.</span></p> <p><span style="font-weight: 400;">You cannot be deeply virtuous in one part of your life and deeply corrupt in another for long. Eventually, the structure collapses.</span></p> <p><span style="font-weight: 400;">For centuries, that idea fell out of fashion. Modern moral philosophy leaned toward rule systems or outcome calculations instead of character formation.</span></p> <p><span style="font-weight: 400;">Now, AI researchers are running experiments that look strangely similar to those old philosophical claims.</span></p> <h3><b>AI Alignment and the Character Problem</b></h3> <p><span style="font-weight: 400;">Many AI alignment strategies today rely on rules and filters. Add guardrails. Block certain outputs. Penalize the model if it produces harmful responses.</span></p> <p><span style="font-weight: 400;">Those approaches matter, but the emerging research suggests they may not be enough.</span></p> <p><span style="font-weight: 400;">If models develop coherent behavioral patterns internally, then alignment may not be about preventing isolated bad actions. It may be about shaping the overall disposition of the system.</span></p> <p><span style="font-weight: 400;">Some AI researchers are already exploring this direction. Work on constitutional alignment associated with thinkers like Amanda Askell attempts to embed guiding principles directly into a model’s reasoning process rather than relying solely on external controls.</span></p> <p><span style="font-weight: 400;">In other words, the goal shifts from policing behavior to shaping character.</span></p> <p><span style="font-weight: 400;">That idea may sound abstract, but engineers already understand it from another domain.</span></p> <h3><b>DevOps Learned This Lesson the Hard Way</b></h3> <p><span style="font-weight: 400;">Anyone who has spent time in the DevOps world knows that complex systems rarely behave the way governance documents expect them to.</span></p> <p><span style="font-weight: 400;">For years, organizations tried to control software delivery through approvals, rigid processes, and strict separation of duties. The result was slow deployments, fragile systems, and teams that spent more time avoiding blame than solving problems.</span></p> <p><span style="font-weight: 400;">Then a different philosophy began to emerge.</span></p> <p><span style="font-weight: 400;">Research documented in Accelerate: The Science of Lean Software and DevOps showed that high-performing teams succeeded not because they had more rules but <a href="https://ebooks.karbust.me/Technology/Accelerate%20The%20Science%20of%20Lean%20Software%20and%20DevOps%20Building%20and%20Scaling%20High%20Performing%20Technology%20Organizations%20by%20Nicole%20Forsgren%20Jez%20Humble%20Gene%20Kim.pdf" target="_blank" rel="noopener">because they had better culture</a>. Trust, shared responsibility, and continuous learning mattered more than bureaucratic control.</span></p> <p><span style="font-weight: 400;">The operational practices described in Site Reliability Engineering reinforced the same lesson. Blameless postmortems. Small incremental changes. Automation that embeds policy directly into the system.</span></p> <p><span style="font-weight: 400;">The underlying principle was simple.</span></p> <p><span style="font-weight: 400;">Bad incentives produce bad systems.</span></p> <p><span style="font-weight: 400;">Change the incentives, and the system behaves differently.</span></p> <h3><b>A Fair Criticism</b></h3> <p><span style="font-weight: 400;">Of course, drawing moral lessons from neural networks may feel like a stretch. AI models are not people, and the way transformer architectures organize behavior may have little to do with human psychology.</span></p> <p><span style="font-weight: 400;">That criticism is fair.</span></p> <p><span style="font-weight: 400;">But even if the analogy between humans and machines is imperfect, the underlying insight remains useful. Complex systems tend to behave according to the incentives and structures embedded inside them.</span></p> <p><span style="font-weight: 400;">Engineers know this. Operations teams know this. DevOps practitioners have spent a decade proving it.</span></p> <p><span style="font-weight: 400;">AI systems may turn out to follow the same rule.</span></p> <h3><b>Rules Versus Character</b></h3> <p><span style="font-weight: 400;">Another criticism is that AI alignment can simply be solved through more guardrails. Add rules, filters, and oversight layers. Treat the problem like a security policy.</span></p> <p><span style="font-weight: 400;">That approach sounds familiar because it resembles how organizations once tried to control software development.</span></p> <p><span style="font-weight: 400;">It did not work particularly well.</span></p> <p><span style="font-weight: 400;">DevOps improved reliability not by tightening control but by reshaping incentives, collaboration patterns, and <a href="https://devops.com/data-driven-feedback-loops-how-devops-and-data-science-inform-product-iterations/" target="_blank" rel="noopener">feedback loops</a>. The system improved because the environment around it improved.</span></p> <p><span style="font-weight: 400;">The same lesson may apply to AI training.</span></p> <p><span style="font-weight: 400;">If the culture embedded in the training data is sloppy, reckless, or adversarial, the resulting models may reflect that structure in ways we do not expect.</span></p> <h3><b>Sorry Charlie</b></h3> <p><span style="font-weight: 400;">Which brings us back to Charlie the Tuna.</span></p> <p><span style="font-weight: 400;">Charlie thought the job was demonstrating that he could recognize good taste.</span></p> <p><span style="font-weight: 400;">StarKist wanted something else entirely.</span></p> <p><span style="font-weight: 400;">They wanted tuna that actually tasted good.</span></p> <p><span style="font-weight: 400;">AI developers may be facing a similar realization.</span></p> <p><span style="font-weight: 400;">It is not enough to build systems that can recognize ethical boundaries or follow rules when prompted. What matters is the deeper structure of how those systems reason and respond across domains.</span></p> <p><span style="font-weight: 400;">Character, whether in people, organizations, or machines, has a way of showing up everywhere.</span></p> <p><span style="font-weight: 400;">If we want trustworthy AI, the question may not be how many guardrails we can bolt onto the system.</span></p> <p><span style="font-weight: 400;">The real question is what kind of culture we are training into it.</span></p>
  77. From games to biology and beyond: 10 years of AlphaGo’s impact

    Mon, 09 Mar 2026 13:52:36 -0000

    Ten years since AlphaGo, we explore how it is catalyzing scientific discovery and paving a path to AGI.
  78. Gemini 3.1 Flash-Lite: Built for intelligence at scale

    Tue, 03 Mar 2026 16:35:55 -0000

    Gemini 3.1 Flash-Lite is our fastest and most cost-efficient Gemini 3 series model yet.
  79. Nano Banana 2: Combining Pro capabilities with lightning-fast speed

    Thu, 26 Feb 2026 16:01:50 -0000

    Our latest image generation model offers advanced world knowledge, production ready specs, subject consistency and more, all at Flash speed.
  80. Gemini 3.1 Pro: A smarter model for your most complex tasks

    Thu, 19 Feb 2026 16:06:14 -0000

    3.1 Pro is designed for tasks where a simple answer isn’t enough.
  81. A new way to express yourself: Gemini can now create music

    Wed, 18 Feb 2026 16:01:38 -0000

    The Gemini app now features our most advanced music generation model Lyria 3, empowering anyone to make 30-second tracks using text or images.
  82. Accelerating discovery in India through AI-powered science and education

    Tue, 17 Feb 2026 13:42:20 -0000

    Google DeepMind brings National Partnerships for AI initiative to India, scaling AI for science and education
  83. Gemini 3 Deep Think: Advancing science, research and engineering

    Thu, 12 Feb 2026 16:15:09 -0000

    Our most specialized reasoning mode is now updated to solve modern science, research and engineering challenges.
  84. Accelerating Mathematical and Scientific Discovery with Gemini Deep Think

    Mon, 09 Feb 2026 16:12:06 -0000

    Research papers point to the growing impact of Deep Think across fields
  85. Project Genie: Experimenting with infinite, interactive worlds

    Thu, 29 Jan 2026 17:01:05 -0000

    Google AI Ultra subscribers in the U.S. can try out Project Genie, an experimental research prototype that lets you create and explore worlds.
  86. D4RT: Teaching AI to see the world in four dimensions

    Fri, 16 Jan 2026 10:39:00 -0000

    D4RT: Unified, efficient 4D reconstruction and tracking up to 300x faster than prior methods.
  87. Veo 3.1 Ingredients to Video: More consistency, creativity and control

    Tue, 13 Jan 2026 17:00:18 -0000

    Our latest Veo update generates lively, dynamic clips that feel natural and engaging — and supports vertical video generation.
  88. Google's year in review: 8 areas with research breakthroughs in 2025

    Tue, 23 Dec 2025 17:01:02 -0000

    Google 2025 recap: Research breakthroughs of the year
  89. Gemini 3 Flash: frontier intelligence built for speed

    Wed, 17 Dec 2025 11:58:17 -0000

    Gemini 3 Flash offers frontier intelligence built for speed at a fraction of the cost.
  90. Gemma Scope 2: helping the AI safety community deepen understanding of complex language model behavior

    Tue, 16 Dec 2025 10:14:24 -0000

    Open interpretability tools for language models are now available across the entire Gemma 3 family with the release of Gemma Scope 2.
  91. Improved Gemini audio models for powerful voice experiences

    Fri, 12 Dec 2025 17:50:50 -0000

  92. Deepening our partnership with the UK AI Security Institute

    Thu, 11 Dec 2025 00:06:40 -0000

    Google DeepMind and UK AI Security Institute (AISI) strengthen collaboration on critical AI safety and security research
  93. Strengthening our partnership with the UK government to support prosperity and security in the AI era

    Wed, 10 Dec 2025 14:59:21 -0000

    Deepening our partnership with the UK government to support prosperity and security in the AI era
  94. FACTS Benchmark Suite: Systematically evaluating the factuality of large language models

    Tue, 09 Dec 2025 11:29:03 -0000

    Systematically evaluating the factuality of large language models with the FACTS Benchmark Suite.
  95. Engineering more resilient crops for a warming climate

    Thu, 04 Dec 2025 16:23:24 -0000

    Scientists are using AlphaFold to strengthen a photosynthesis enzyme for resilient, heat-tolerant crops.
  96. AlphaFold: Five years of impact

    Tue, 25 Nov 2025 16:00:12 -0000

    Explore how AlphaFold has accelerated science and fueled a global wave of biological discovery.
  97. Revealing a key protein behind heart disease

    Tue, 25 Nov 2025 15:52:51 -0000

    AlphaFold has revealed the structure of a key protein behind heart disease
  98. Google DeepMind supports U.S. Department of Energy on Genesis: a national mission to accelerate innovation and scientific discovery

    Mon, 24 Nov 2025 14:12:03 -0000

    Google DeepMind and the DOE partner on Genesis, a new effort to accelerate science with AI.
  99. How we’re bringing AI image verification to the Gemini app

    Thu, 20 Nov 2025 15:13:19 -0000

  100. Build with Nano Banana Pro, our Gemini 3 Pro Image model

    Thu, 20 Nov 2025 15:11:14 -0000

  101. Introducing Nano Banana Pro

    Thu, 20 Nov 2025 15:05:02 -0000

  102. Start building with Gemini 3

    Tue, 18 Nov 2025 17:49:13 -0000

  103. We’re expanding our presence in Singapore to advance AI in the Asia-Pacific region

    Tue, 18 Nov 2025 17:00:00 -0000

    Google DeepMind opens a new Singapore research lab, accelerating AI progress in the Asia-Pacific region.
  104. A new era of intelligence with Gemini 3

    Tue, 18 Nov 2025 16:06:41 -0000

  105. Introducing Google Antigravity

    Tue, 18 Nov 2025 16:06:32 -0000

  106. WeatherNext 2: Our most advanced weather forecasting model

    Mon, 17 Nov 2025 15:09:23 -0000

    The new AI model delivers more efficient, more accurate and higher-resolution global weather predictions.
  107. SIMA 2: An Agent that Plays, Reasons, and Learns With You in Virtual 3D Worlds

    Thu, 13 Nov 2025 14:52:18 -0000

    Introducing SIMA 2, a Gemini-powered AI agent that can think, understand, and take actions in interactive environments.
  108. Teaching AI to see the world more like we do

    Tue, 11 Nov 2025 11:49:13 -0000

    Our new paper analyzes the important ways AI systems organize the visual world differently from humans.
  109. How AI is giving Northern Ireland teachers time back

    Mon, 10 Nov 2025 16:50:39 -0000

    A six-month long pilot program with the Northern Ireland Education Authority’s C2k initiative found that integrating Gemini and other generative AI tools saved participating teachers an average of 10 hours per week.
  110. Mapping, modeling, and understanding nature with AI

    Wed, 05 Nov 2025 16:59:46 -0000

    AI models can help map species, protect forests and listen to birds around the world
  111. Accelerating discovery with the AI for Math Initiative

    Wed, 29 Oct 2025 14:31:13 -0000

    The initiative brings together some of the world's most prestigious research institutions to pioneer the use of AI in mathematical research.
  112. T5Gemma: A new collection of encoder-decoder Gemma models

    Sat, 25 Oct 2025 18:14:00 -0000

    Introducing T5Gemma, a new collection of encoder-decoder LLMs.
  113. MedGemma: Our most capable open models for health AI development

    Sat, 25 Oct 2025 18:02:50 -0000

    We’re announcing new multimodal models in the MedGemma collection, our most capable open models for health AI development.
  114. Introducing Gemma 3n: The developer guide

    Sat, 25 Oct 2025 17:54:47 -0000

    Gemma 3n is designed for the developer community that helped shape Gemma.
  115. Gemini 2.5 Flash-Lite is now ready for scaled production use

    Sat, 25 Oct 2025 17:34:32 -0000

    Gemini 2.5 Flash-Lite, previously in preview, is now stable and generally available. This cost-efficient model provides high quality in a small size, and includes 2.5 family features like a 1 million-token context window and multimodality.
  116. Behind “ANCESTRA”: combining Veo with live-action filmmaking

    Sat, 25 Oct 2025 17:27:10 -0000

    We partnered with Darren Aronofsky, Eliza McNitt and a team of more than 200 people to make a film using Veo and live-action filmmaking.
  117. AlphaEarth Foundations helps map our planet in unprecedented detail

    Fri, 24 Oct 2025 19:06:32 -0000

    New AI model integrates petabytes of Earth observation data to generate a unified data representation that revolutionizes global mapping and monitoring
  118. Exploring the context of online images with Backstory

    Fri, 24 Oct 2025 03:17:11 -0000

    New experimental AI tool helps people explore the context and origin of images seen online.
  119. Advanced version of Gemini with Deep Think officially achieves gold-medal standard at the International Mathematical Olympiad

    Fri, 24 Oct 2025 03:12:29 -0000

    The International Mathematical Olympiad (“IMO”) is the world’s most prestigious competition for young mathematicians, and has been held annually since 1959. Each country taking part is represented by six elite, pre-university mathematicians who compete to solve six exceptionally difficult problems in algebra, combinatorics, geometry, and number theory.
  120. Aeneas transforms how historians connect the past

    Fri, 24 Oct 2025 02:58:37 -0000

    Introducing the first model for contextualizing ancient inscriptions, designed to help historians better interpret, attribute and restore fragmentary texts.
  121. Genie 3: A new frontier for world models

    Fri, 24 Oct 2025 02:54:30 -0000

    Genie 3 can generate dynamic worlds that you can navigate in real time at 24 frames per second, retaining consistency for a few minutes at a resolution of 720p.
  122. How AI is helping advance the science of bioacoustics to save endangered species

    Fri, 24 Oct 2025 02:30:54 -0000

    Our new Perch model helps conservationists analyze audio faster to protect endangered species, from Hawaiian honeycreepers to coral reefs.
  123. Using AI to perceive the universe in greater depth

    Fri, 24 Oct 2025 02:21:07 -0000

    Using AI to perceive the universe in greater depth
  124. Gemini achieves gold-medal level at the International Collegiate Programming Contest World Finals

    Fri, 24 Oct 2025 00:22:10 -0000

    Gemini 2.5 Deep Think achieves breakthrough performance at the world’s most prestigious computer programming competition, demonstrating a profound leap in abstract problem solving.
  125. Discovering new solutions to century-old problems in fluid dynamics

    Fri, 24 Oct 2025 00:02:06 -0000

    Our new method could help mathematicians leverage AI techniques to tackle long-standing challenges in mathematics, physics and engineering.
  126. Strengthening our Frontier Safety Framework

    Thu, 23 Oct 2025 23:44:10 -0000

    We’re strengthening the Frontier Safety Framework (FSF) to help identify and mitigate severe risks from advanced AI models.
  127. Gemini Robotics 1.5 brings AI agents into the physical world

    Thu, 23 Oct 2025 23:33:58 -0000

    We’re powering an era of physical agents — enabling robots to perceive, plan, think, use tools and act to better solve complex, multi-step tasks.
  128. Introducing CodeMender: an AI agent for code security

    Thu, 23 Oct 2025 23:05:51 -0000

    Using advanced AI to fix critical software vulnerabilities
  129. Bringing AI to the next generation of fusion energy

    Thu, 23 Oct 2025 22:04:14 -0000

    We’re partnering with Commonwealth Fusion Systems (CFS) to bring clean, safe, limitless fusion energy closer to reality.
  130. Try Deep Think in the Gemini app

    Thu, 23 Oct 2025 18:54:19 -0000

    We're rolling out Deep Think in the Gemini app for Google AI Ultra subscribers, and we're giving select mathematicians access to the full version of the Gemini 2.5 Deep Think model entered into the IMO competition.
  131. Rethinking how we measure AI intelligence

    Thu, 23 Oct 2025 18:52:06 -0000

    Game Arena is a new, open-source platform for rigorous evaluation of AI models. It allows for head-to-head comparison of frontier systems in environments with clear winning conditions.
  132. Introducing Gemma 3 270M: The compact model for hyper-efficient AI

    Thu, 23 Oct 2025 18:50:11 -0000

    Today, we're adding a new, highly specialized tool to the Gemma 3 toolkit: Gemma 3 270M, a compact, 270-million parameter model.
  133. Image editing in Gemini just got a major upgrade

    Thu, 23 Oct 2025 18:48:30 -0000

    Transform images in amazing new ways with updated native image editing in the Gemini app.
  134. VaultGemma: The world's most capable differentially private LLM

    Thu, 23 Oct 2025 18:42:54 -0000

    We introduce VaultGemma, the most capable model trained from scratch with differential privacy.
  135. Introducing the Gemini 2.5 Computer Use model

    Thu, 23 Oct 2025 18:40:34 -0000

    Available in preview via the API, our Computer Use model is a specialized model built on Gemini 2.5 Pro’s capabilities to power agents that can interact with user interfaces.
  136. Introducing Veo 3.1 and advanced creative capabilities

    Thu, 23 Oct 2025 18:38:55 -0000

    We’re rolling out significant updates to Veo that give people even more creative control.
  137. How a Gemma model helped discover a new potential cancer therapy pathway

    Thu, 23 Oct 2025 18:22:55 -0000

    We’re launching a new 27 billion parameter foundation model for single-cell analysis built on the Gemma family of open models.
  138. AlphaGenome: AI for better understanding the genome

    Wed, 25 Jun 2025 13:59:00 -0000

    Introducing a new, unifying DNA sequence model that advances regulatory variant-effect prediction and promises to shed new light on genome function — now available via API.
  139. Gemini Robotics On-Device brings AI to local robotic devices

    Tue, 24 Jun 2025 14:00:00 -0000

    We’re introducing an efficient, on-device robotics model with general-purpose dexterity and fast task adaptation.
  140. Gemini 2.5: Updates to our family of thinking models

    Tue, 17 Jun 2025 16:00:00 -0000

    Explore the latest Gemini 2.5 model updates with enhanced performance and accuracy: Gemini 2.5 Pro now stable, Flash generally available, and the new Flash-Lite in preview.
  141. We’re expanding our Gemini 2.5 family of models

    Tue, 17 Jun 2025 16:00:00 -0000

    Gemini 2.5 Flash and Pro are now generally available, and we’re introducing 2.5 Flash-Lite, our most cost-efficient and fastest 2.5 model yet.
  142. How we're supporting better tropical cyclone prediction with AI

    Thu, 12 Jun 2025 15:00:00 -0000

    We’re launching Weather Lab, featuring our experimental cyclone predictions, and we’re partnering with the U.S. National Hurricane Center to support their forecasts and warnings this cyclone season.
  143. Advanced audio dialog and generation with Gemini 2.5

    Tue, 03 Jun 2025 17:15:47 -0000

    Gemini 2.5 has new capabilities in AI-powered audio dialog and generation.
  144. Fuel your creativity with new generative media models and tools

    Tue, 20 May 2025 09:45:00 -0000

    Introducing Veo 3 and Imagen 4, and a new tool for filmmaking called Flow.
  145. Our vision for building a universal AI assistant

    Tue, 20 May 2025 09:45:00 -0000

    We’re extending Gemini to become a world model that can make plans and imagine new experiences by simulating aspects of the world.
  146. SynthID Detector — a new portal to help identify AI-generated content

    Tue, 20 May 2025 09:45:00 -0000

    Learn about the new SynthID Detector portal we announced at I/O to help people understand how the content they see online was generated.
  147. Announcing Gemma 3n preview: Powerful, efficient, mobile-first AI

    Tue, 20 May 2025 09:45:00 -0000

    Gemma 3n is a cutting-edge open model designed for fast, multimodal AI on devices, featuring optimized performance, unique flexibility with a 2-in-1 model, and expanded multimodal understanding with audio, empowering developers to build live, interactive applications and sophisticated audio-centric experiences.
  148. Advancing Gemini's security safeguards

    Tue, 20 May 2025 09:45:00 -0000

    We’ve made Gemini 2.5 our most secure model family to date.
  149. Gemini 2.5: Our most intelligent models are getting even better

    Tue, 20 May 2025 09:45:00 -0000

    Gemini 2.5 Pro continues to be loved by developers as the best model for coding, and 2.5 Flash is getting even better with a new update. We’re bringing new capabilities to our models, including Deep Think, an experimental enhanced reasoning mode for 2.5 Pro.
  150. AlphaEvolve: A Gemini-powered coding agent for designing advanced algorithms

    Wed, 14 May 2025 14:59:00 -0000

    New AI agent evolves algorithms for math and practical applications in computing by combining the creativity of large language models with automated evaluators
  151. Gemini 2.5 Pro Preview: even better coding performance

    Tue, 06 May 2025 15:06:55 -0000

    We’ve seen developers doing amazing things with Gemini 2.5 Pro, so we decided to release an updated version a couple of weeks early to get into developers hands sooner.
  152. Build rich, interactive web apps with an updated Gemini 2.5 Pro

    Tue, 06 May 2025 15:00:00 -0000

    Our updated version of Gemini 2.5 Pro Preview has improved capabilities for coding.
  153. Music AI Sandbox, now with new features and broader access

    Thu, 24 Apr 2025 15:01:00 -0000

    Helping music professionals explore the potential of generative AI
  154. Introducing Gemini 2.5 Flash

    Thu, 17 Apr 2025 19:02:00 -0000

    Gemini 2.5 Flash is our first fully hybrid reasoning model, giving developers the ability to turn thinking on or off.
  155. Generate videos in Gemini and Whisk with Veo 2

    Tue, 15 Apr 2025 17:00:00 -0000

    Transform text-based prompts into high-resolution eight-second videos in Gemini Advanced and use Whisk Animate to turn images into eight-second animated clips.
  156. DolphinGemma: How Google AI is helping decode dolphin communication

    Mon, 14 Apr 2025 17:00:00 -0000

    DolphinGemma, a large language model developed by Google, is helping scientists study how dolphins communicate — and hopefully find out what they're saying, too.
  157. Taking a responsible path to AGI

    Wed, 02 Apr 2025 13:31:00 -0000

    We’re exploring the frontiers of AGI, prioritizing technical safety, proactive risk assessment, and collaboration with the AI community.
  158. Evaluating potential cybersecurity threats of advanced AI

    Wed, 02 Apr 2025 13:30:00 -0000

    Our framework enables cybersecurity experts to identify which defenses are necessary—and how to prioritize them
  159. Gemini 2.5: Our most intelligent AI model

    Tue, 25 Mar 2025 17:00:36 -0000

    Gemini 2.5 is our most intelligent AI model, now with thinking built in.
  160. Gemini Robotics brings AI into the physical world

    Wed, 12 Mar 2025 15:00:00 -0000

    Introducing Gemini Robotics and Gemini Robotics-ER, AI models designed for robots to understand, act and react to the physical world.
  161. Experiment with Gemini 2.0 Flash native image generation

    Wed, 12 Mar 2025 14:58:00 -0000

    Native image output is available in Gemini 2.0 Flash for developers to experiment with in Google AI Studio and the Gemini API.
  162. Introducing Gemma 3

    Wed, 12 Mar 2025 08:00:00 -0000

    The most capable model you can run on a single GPU or TPU.
  163. Start building with Gemini 2.0 Flash and Flash-Lite

    Tue, 25 Feb 2025 18:02:12 -0000

    Gemini 2.0 Flash-Lite is now generally available in the Gemini API for production use in Google AI Studio and for enterprise customers on Vertex AI
  164. Gemini 2.0 is now available to everyone

    Wed, 05 Feb 2025 16:00:00 -0000

    We’re announcing new updates to Gemini 2.0 Flash, plus introducing Gemini 2.0 Flash-Lite and Gemini 2.0 Pro Experimental.
  165. Updating the Frontier Safety Framework

    Tue, 04 Feb 2025 16:41:00 -0000

    Our next iteration of the FSF sets out stronger security protocols on the path to AGI
  166. FACTS Grounding: A new benchmark for evaluating the factuality of large language models

    Tue, 17 Dec 2024 15:29:00 -0000

    Our comprehensive benchmark and online leaderboard offer a much-needed measure of how accurately LLMs ground their responses in provided source material and avoid hallucinations
  167. State-of-the-art video and image generation with Veo 2 and Imagen 3

    Mon, 16 Dec 2024 17:01:16 -0000

    We’re rolling out a new, state-of-the-art video model, Veo 2, and updates to Imagen 3. Plus, check out our new experiment, Whisk.
  168. Introducing Gemini 2.0: our new AI model for the agentic era

    Wed, 11 Dec 2024 15:30:40 -0000

    Today, we’re announcing Gemini 2.0, our most capable multimodal AI model yet.
  169. Google DeepMind at NeurIPS 2024

    Thu, 05 Dec 2024 17:45:00 -0000

    Advancing adaptive AI agents, empowering 3D scene creation, and innovating LLM training for a smarter, safer future
  170. GenCast predicts weather and the risks of extreme conditions with state-of-the-art accuracy

    Wed, 04 Dec 2024 15:59:00 -0000

    New AI model advances the prediction of weather uncertainties and risks, delivering faster, more accurate forecasts up to 15 days ahead
  171. Genie 2: A large-scale foundation world model

    Wed, 04 Dec 2024 14:23:00 -0000

    Generating unlimited diverse training environments for future general agents
  172. AlphaQubit tackles one of quantum computing’s biggest challenges

    Wed, 20 Nov 2024 18:00:00 -0000

    Our new AI system accurately identifies errors inside quantum computers, helping to make this new technology more reliable.
  173. The AI for Science Forum: A new era of discovery

    Mon, 18 Nov 2024 19:57:00 -0000

    The AI Science Forum highlights AI's present and potential role in revolutionizing scientific discovery and solving global challenges, emphasizing collaboration between the scientific community, policymakers, and industry leaders.
  174. Pushing the frontiers of audio generation

    Wed, 30 Oct 2024 15:00:00 -0000

    Our pioneering speech generation technologies are helping people around the world interact with more natural, conversational and intuitive digital assistants and AI tools.
  175. New generative AI tools open the doors of music creation

    Wed, 23 Oct 2024 16:53:00 -0000

    Our latest AI music technologies are now available in MusicFX DJ, Music AI Sandbox and YouTube Shorts
  176. Demis Hassabis & John Jumper awarded Nobel Prize in Chemistry

    Wed, 09 Oct 2024 11:45:00 -0000

    The award recognizes their work developing AlphaFold, a groundbreaking AI system that predicts the 3D structure of proteins from their amino acid sequences.