Pipes Feed Preview: Towards Data Science & The New Stack & DevOps & SRE & DevOps.com & Google DeepMind News

  1. 4 Lines You Should Include in Your Claude Skill

    Sun, 14 Jun 2026 17:00:00 -0000

    <p>Without these, Claude will be confidently wrong.</p> <p>The post <a href="https://towardsdatascience.com/4-lines-you-must-include-in-your-claude-skill/">4 Lines You Should Include in Your Claude Skill</a> appeared first on <a href="https://towardsdatascience.com">Towards Data Science</a>.</p>
  2. Vision LLMs are PDF Parsers Too: Reading Charts and Diagrams for RAG

    Sun, 14 Jun 2026 15:00:00 -0000

    <p>Enterprise Document Intelligence [Vol.1 #5quater] - The other parsers read the words on a page. A vision model also reads the pictures</p> <p>The post <a href="https://towardsdatascience.com/vision-llms-are-pdf-parsers-too-reading-charts-and-diagrams-for-rag/">Vision LLMs are PDF Parsers Too: Reading Charts and Diagrams for RAG</a> appeared first on <a href="https://towardsdatascience.com">Towards Data Science</a>.</p>
  3. GPU Time-Slicing for Concurrent LLM Agents on Kubernetes

    Sun, 14 Jun 2026 13:00:00 -0000

    <p>A systems-level deep dive into the hidden microarchitectural costs of Kubernetes GPU time-slicing, and what it actually costs to co-locate Agentic AI workloads.</p> <p>The post <a href="https://towardsdatascience.com/gpu-time-slicing-for-concurrent-llm-agents-on-kubernetes/">GPU Time-Slicing for Concurrent LLM Agents on Kubernetes</a> appeared first on <a href="https://towardsdatascience.com">Towards Data Science</a>.</p>
  4. Larger Context Windows Don’t Fix RAG — So I Built a System That Does

    Sat, 13 Jun 2026 17:00:00 -0000

    <p>Increasing context size in RAG systems doesn’t improve accuracy for aggregation tasks—it makes errors harder to detect. In this article, I benchmark retrieval-based pipelines against a deterministic full-scan engine across 100,000 rows and show why computation queries must be routed away from RAG entirely.</p> <p>The post <a href="https://towardsdatascience.com/larger-context-windows-dont-fix-rag-so-i-built-a-system-that-does/">Larger Context Windows Don’t Fix RAG — So I Built a System That Does</a> appeared first on <a href="https://towardsdatascience.com">Towards Data Science</a>.</p>
  5. Parse PDFs for RAG Locally with Docling: Rich Tables, No Cloud Upload

    Sat, 13 Jun 2026 15:00:00 -0000

    <p>Enterprise Document Intelligence [Vol.1 #5ter] - Table cells, OCR, captions, headings: cloud-grade structure, running on your own machine. No key, no per-page bill, nothing leaves the building</p> <p>The post <a href="https://towardsdatascience.com/parse-pdfs-for-rag-locally-with-docling-rich-tables-no-cloud-upload/">Parse PDFs for RAG Locally with Docling: Rich Tables, No Cloud Upload</a> appeared first on <a href="https://towardsdatascience.com">Towards Data Science</a>.</p>
  6. Solving the 3Blue1Brown String Probability Problem (Without AI)

    Sat, 13 Jun 2026 13:00:00 -0000

    <p>Let's practice data science thinking through a probability problem </p> <p>The post <a href="https://towardsdatascience.com/solving-the-3blue1brown-string-probability-problem-without-ai/">Solving the 3Blue1Brown String Probability Problem (Without AI)</a> appeared first on <a href="https://towardsdatascience.com">Towards Data Science</a>.</p>
  7. When PyMuPDF Can’t See the Table: Parse PDFs for RAG with Azure Layout

    Fri, 12 Jun 2026 18:00:00 -0000

    <p>Enterprise Document Intelligence [Vol.1 #5bis] - The same relational tables. Native table cells. OCR for scanned pages and images. Captions and headings without regex.</p> <p>The post <a href="https://towardsdatascience.com/when-pymupdf-cant-see-the-table-parse-pdfs-for-rag-with-azure-layout/">When PyMuPDF Can’t See the Table: Parse PDFs for RAG with Azure Layout</a> appeared first on <a href="https://towardsdatascience.com">Towards Data Science</a>.</p>
  8. Why Decade-Old Residual Connections Still Power All of AI (And Why That’s a Problem)

    Fri, 12 Jun 2026 16:30:00 -0000

    <p>For nearly a decade, this part of neural networks barely changed. DeepSeek is trying to reinvent it.</p> <p>The post <a href="https://towardsdatascience.com/why-this-decade-old-idea-still-powers-all-of-ai-and-why-its-a-problem/">Why Decade-Old Residual Connections Still Power All of AI (And Why That’s a Problem)</a> appeared first on <a href="https://towardsdatascience.com">Towards Data Science</a>.</p>
  9. A Harness for Every Task: Putting a Team of Claudes on One Job

    Fri, 12 Jun 2026 15:00:00 -0000

    <p>Claude can now write its own harness on the fly, custom-built for the task at hand.</p> <p>The post <a href="https://towardsdatascience.com/a-harness-for-every-task-putting-a-team-of-claudes-on-one-job/">A Harness for Every Task: Putting a Team of Claudes on One Job</a> appeared first on <a href="https://towardsdatascience.com">Towards Data Science</a>.</p>
  10. I Thought Data Engineering Was Just Writing Scripts. I Was Wrong.

    Fri, 12 Jun 2026 13:30:00 -0000

    <p>I tried to make my ETL pipeline production-ready. Three things broke. Each one taught me something scripting alone never could.</p> <p>The post <a href="https://towardsdatascience.com/i-thought-data-engineering-was-just-writing-scripts-i-was-wrong/">I Thought Data Engineering Was Just Writing Scripts. I Was Wrong.</a> appeared first on <a href="https://towardsdatascience.com">Towards Data Science</a>.</p>
  11. Is Language Visual? An Experiment with Chinese Characters

    Fri, 12 Jun 2026 12:00:00 -0000

    <p>A story about a broken printer, visual inductive bias, and why the race endedin a tie.</p> <p>The post <a href="https://towardsdatascience.com/is-language-visual-an-experiment-with-chinese-characters-2/">Is Language Visual? An Experiment with Chinese Characters</a> appeared first on <a href="https://towardsdatascience.com">Towards Data Science</a>.</p>
  12. BI Is Dead, Long Live BI

    Thu, 11 Jun 2026 18:00:00 -0000

    <p>The true bottleneck was never the analysis.</p> <p>The post <a href="https://towardsdatascience.com/bi-is-dead-long-live-bi/">BI Is Dead, Long Live BI</a> appeared first on <a href="https://towardsdatascience.com">Towards Data Science</a>.</p>
  13. Stop Returning Flat Text from a PDF: The Relational Tables RAG Needs

    Thu, 11 Jun 2026 16:30:00 -0000

    <p>Enterprise Document Intelligence [Vol.1 #5B] - One PDF in, a relational set of DataFrames out: lines, pages, TOC, images, cross-references, captions, spans, and a parsing summary</p> <p>The post <a href="https://towardsdatascience.com/stop-returning-flat-text-from-a-pdf-the-relational-shape-rag-needs/">Stop Returning Flat Text from a PDF: The Relational Tables RAG Needs</a> appeared first on <a href="https://towardsdatascience.com">Towards Data Science</a>.</p>
  14. PySpark for Beginners: Beyond the Basics

    Thu, 11 Jun 2026 15:00:00 -0000

    <p>Take the next step to building real workflows with Spark on your laptop</p> <p>The post <a href="https://towardsdatascience.com/pyspark-for-beginners-beyond-the-basics/">PySpark for Beginners: Beyond the Basics</a> appeared first on <a href="https://towardsdatascience.com">Towards Data Science</a>.</p>
  15. When GPU Utilization Lies: The Hidden Systems Problem Slowing Modern AI

    Thu, 11 Jun 2026 13:30:00 -0000

    <p>Why “average utilization” lies about how full your GPUs really are</p> <p>The post <a href="https://towardsdatascience.com/when-gpu-utilization-lies-the-hidden-systems-problem-slowing-modern-ai/">When GPU Utilization Lies: The Hidden Systems Problem Slowing Modern AI</a> appeared first on <a href="https://towardsdatascience.com">Towards Data Science</a>.</p>
  16. NuCS vs Choco: A Pure-Python Constraint Solver Meets a JVM Veteran

    Thu, 11 Jun 2026 12:00:00 -0000

    <p>An in-depth performance test comparing Nucs and Choco</p> <p>The post <a href="https://towardsdatascience.com/nucs-vs-choco/">NuCS vs Choco: A Pure-Python Constraint Solver Meets a JVM Veteran</a> appeared first on <a href="https://towardsdatascience.com">Towards Data Science</a>.</p>
  17. How to Refactor Code with Claude Code

    Wed, 10 Jun 2026 18:00:00 -0000

    <p>Improve coding agent productiveness with refactored code</p> <p>The post <a href="https://towardsdatascience.com/how-to-refactor-code-with-claude-code/">How to Refactor Code with Claude Code</a> appeared first on <a href="https://towardsdatascience.com">Towards Data Science</a>.</p>
  18. How to Train a Scoring Model in the Age of Artificial Intelligence

    Wed, 10 Jun 2026 16:30:00 -0000

    <p>A structured methodology for comparing candidate models, testing stability, and selecting a robust final score</p> <p>The post <a href="https://towardsdatascience.com/how-to-train-a-scoring-model-in-the-age-of-artificial-intelligence/">How to Train a Scoring Model in the Age of Artificial Intelligence</a> appeared first on <a href="https://towardsdatascience.com">Towards Data Science</a>.</p>
  19. Beyond extract_text: The Two Layers of a PDF That Drive RAG Quality

    Wed, 10 Jun 2026 15:00:00 -0000

    <p>Enterprise Document Intelligence [Vol.1 #5A] - Document signals (metadata, native TOC, source software) and page-level content (text vs scans, tables, images, columns, page profile)</p> <p>The post <a href="https://towardsdatascience.com/beyond-extract_text-the-two-layers-of-a-pdf-that-drive-rag-quality/">Beyond extract_text: The Two Layers of a PDF That Drive RAG Quality</a> appeared first on <a href="https://towardsdatascience.com">Towards Data Science</a>.</p>
  20. Bayesian Networks and Markov Networks: An Intuitive Guide to Structured Uncertainty

    Wed, 10 Jun 2026 13:30:00 -0000

    <p>An intuitive introduction to reasoning with uncertainty, from directed Bayesian networks to undirected Markov networks and weighted logical rules.</p> <p>The post <a href="https://towardsdatascience.com/bayesian-networks-and-markov-networks-an-intuitive-guide-to-structured-uncertainty/">Bayesian Networks and Markov Networks: An Intuitive Guide to Structured Uncertainty</a> appeared first on <a href="https://towardsdatascience.com">Towards Data Science</a>.</p>
  21. Xiaomi’s MiMo Code claims it beats Claude Code past 200 steps

    Sun, 14 Jun 2026 17:00:00 -0000

    <img width="1024" height="683" src="https://cdn.thenewstack.io/media/2026/06/bfc4ef4e-hartono-creative-studio-uigz-7ukve-unsplash-1-1024x683.jpg" class="webfeedsFeaturedVisual wp-post-image wp-stateless-item" alt="" style="display: block; margin: auto; margin-bottom: 20px;max-width: 100%;" link_thumbnail="" decoding="async" fetchpriority="high" data-image-size="large" data-stateless-media-bucket="cdn.thenewstack.io" data-stateless-media-name="media/2026/06/bfc4ef4e-hartono-creative-studio-uigz-7ukve-unsplash-1-scaled.jpg" /><p>A coding agent that scaffolds a working app over lunch will routinely stall around 30 steps into a production refactor.</p> <p>The post <a href="https://thenewstack.io/coding-agent-endurance-gap/">Xiaomi&#8217;s MiMo Code claims it beats Claude Code past 200 steps</a> appeared first on <a href="https://thenewstack.io">The New Stack</a>.</p>
    Coding agents shine on demos but fall apart on long tasks. A look at the endurance gap, Berkeley&#039;s strict benchmark, and the race to fix it.
  22. What your logs can’t tell you when an AI agent acts alone

    Sun, 14 Jun 2026 16:00:00 -0000

    <img width="1024" height="682" src="https://cdn.thenewstack.io/media/2026/06/100f3d74-willian-reis-tyjt3jmbmww-unsplash-1024x682.jpg" class="webfeedsFeaturedVisual wp-post-image wp-stateless-item" alt="Vector illustration of a Wild West ghost town with an abandoned building, wooden wagon, and skeleton, serving as a security metaphor for the aftermath of an unmonitored AI agent data breach." style="display: block; margin: auto; margin-bottom: 20px;max-width: 100%;" link_thumbnail="" decoding="async" data-image-size="large" data-stateless-media-bucket="cdn.thenewstack.io" data-stateless-media-name="media/2026/06/100f3d74-willian-reis-tyjt3jmbmww-unsplash-scaled.jpg" /><p>For a long time, logs lived in a strange purgatory: technically required, rarely read, and mostly forgotten until something broke.</p> <p>The post <a href="https://thenewstack.io/audit-trails-revenue-asset/">What your logs can&#8217;t tell you when an AI agent acts alone</a> appeared first on <a href="https://thenewstack.io">The New Stack</a>.</p>
    AI agents are acting autonomously, making basic logs obsolete. Discover why comprehensive audit trails are now a critical revenue asset.
  23. PagerDuty’s CAIO says most AI incident tools are missing a critical layer

    Sun, 14 Jun 2026 15:00:00 -0000

    <img width="1024" height="768" src="https://cdn.thenewstack.io/media/2026/06/7276a156-bilicube-studio-yqdhyxk1tw-unsplash-1024x768.jpg" class="webfeedsFeaturedVisual wp-post-image wp-stateless-item" alt="Abstract fluid art with green and blue swirling vortex patterns, serving as an artistic metaphor for dynamic AI incident management and evolving agent memory layers." style="display: block; margin: auto; margin-bottom: 20px;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" data-image-size="large" data-stateless-media-bucket="cdn.thenewstack.io" data-stateless-media-name="media/2026/06/7276a156-bilicube-studio-yqdhyxk1tw-unsplash-scaled.jpg" /><p>AI is empowering software teams to ship code faster than ever. Given that an average of 70% of incidents stem</p> <p>The post <a href="https://thenewstack.io/ai-incident-management-harness/">PagerDuty&#8217;s CAIO says most AI incident tools are missing a critical layer</a> appeared first on <a href="https://thenewstack.io">The New Stack</a>.</p>
    Discover why MCP isn&#039;t enough for AI incident response and why software teams need an agentic harness with memory to prevent downtime.
  24. Fable 5 and Mythos 5 remain suspended: “The ball is in Anthropic’s court”

    Sat, 13 Jun 2026 21:09:17 -0000

    <img width="1024" height="576" src="https://cdn.thenewstack.io/media/2026/06/c8906d03-photo-album-1-01-1024x576.png" class="webfeedsFeaturedVisual wp-post-image wp-stateless-item" alt="" style="display: block; margin: auto; margin-bottom: 20px;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" data-image-size="large" data-stateless-media-bucket="cdn.thenewstack.io" data-stateless-media-name="media/2026/06/c8906d03-photo-album-1-01-scaled.png" /><p>On Friday evening, Anthropic suddenly disabled its new flagship models, Fable 5 and Mythos 5, after the U.S. government became</p> <p>The post <a href="https://thenewstack.io/fable-5-and-mythos-5-remain-suspended-the-ball-is-in-anthropics-court/">Fable 5 and Mythos 5 remain suspended: &#8220;The ball is in Anthropic’s court&#8221;</a> appeared first on <a href="https://thenewstack.io">The New Stack</a>.</p>
    Anthropic calls the export-control order a &quot;misunderstanding&quot; over minor vulnerabilities. The White House says the company put its consumer model ahead of safety.
  25. Why AI retrieval and ranking need more than vector search

    Sat, 13 Jun 2026 18:00:00 -0000

    <img width="1024" height="768" src="https://cdn.thenewstack.io/media/2026/06/6988875c-ardian-pranomo-wuk6xtkcz94-unsplash-1024x768.jpg" class="webfeedsFeaturedVisual wp-post-image wp-stateless-item" alt="Artistic illustration of a silhouette hiker journeying toward complex, layered mountain peaks under a glowing aurora sky, serving as a metaphor for moving beyond vector search to multi-dimensional AI retrieval architectures." style="display: block; margin: auto; margin-bottom: 20px;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" data-image-size="large" data-stateless-media-bucket="cdn.thenewstack.io" data-stateless-media-name="media/2026/06/6988875c-ardian-pranomo-wuk6xtkcz94-unsplash-scaled.jpg" /><p>A recent GigaOm CxO Decision Brief explores how AI retrieval architectures are evolving beyond flat vector databases as organizations combine</p> <p>The post <a href="https://thenewstack.io/tensors-beyond-vector-search/">Why AI retrieval and ranking need more than vector search</a> appeared first on <a href="https://thenewstack.io">The New Stack</a>.</p>
    A GigaOm CxO brief explains why production AI retrieval needs more than vector search and explores how tensors unify ranking and ML signals.
  26. Can JetBrains close the IDE skills gap before AI widens it further?

    Sat, 13 Jun 2026 17:00:00 -0000

    <img width="1024" height="683" src="https://cdn.thenewstack.io/media/2026/06/976b34d2-sam-moghadam-hblyyhmm4ko-unsplash-1024x683.jpg" class="webfeedsFeaturedVisual wp-post-image wp-stateless-item" alt="" style="display: block; margin: auto; margin-bottom: 20px;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" data-image-size="large" data-stateless-media-bucket="cdn.thenewstack.io" data-stateless-media-name="media/2026/06/976b34d2-sam-moghadam-hblyyhmm4ko-unsplash.jpg" /><p>JetBrains recently launched a program to bring hands-on coding practice into its professional development environments, targeting the gap between how</p> <p>The post <a href="https://thenewstack.io/jetbrains-course-creators-program/">Can JetBrains close the IDE skills gap before AI widens it further?</a> appeared first on <a href="https://thenewstack.io">The New Stack</a>.</p>
    JetBrains&#039; new Course Creators Program lets educators embed hands-on coding into its IDEs, arguing AI makes foundational developer skills matter more.
  27. Loops are replacing prompts. Verification is about to be your biggest problem.

    Sat, 13 Jun 2026 16:00:00 -0000

    <img width="1024" height="571" src="https://cdn.thenewstack.io/media/2026/06/eb149544-mylene-caneso-chqvn3ugkwm-unsplash-1024x571.jpg" class="webfeedsFeaturedVisual wp-post-image wp-stateless-item" alt="An artistic illustration of a hand reaching into swirling blue water currents, symbolizing an abstract feedback loop and software verification system with foundational stones visible on the riverbed." style="display: block; margin: auto; margin-bottom: 20px;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" data-image-size="large" data-stateless-media-bucket="cdn.thenewstack.io" data-stateless-media-name="media/2026/06/eb149544-mylene-caneso-chqvn3ugkwm-unsplash-scaled.jpg" /><p>Something shifted in the AI coding discourse this month. The argument is no longer about whether agents can write production</p> <p>The post <a href="https://thenewstack.io/agent-loops-cloud-native-verification/">Loops are replacing prompts. Verification is about to be your biggest problem.</a> appeared first on <a href="https://thenewstack.io">The New Stack</a>.</p>
    As AI coding shifts from prompts to loops, verification becomes the ultimate challenge for cloud-native engineering teams.
  28. Fable 5 vs Opus 4.8: The real stakes, not the spec sheet

    Sat, 13 Jun 2026 15:00:00 -0000

    <img width="1024" height="768" src="https://cdn.thenewstack.io/media/2026/02/c25e0b3b-img_1020-scaled-1024x768.jpg" class="webfeedsFeaturedVisual wp-post-image wp-stateless-item" alt="" style="display: block; margin: auto; margin-bottom: 20px;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" data-image-size="large" data-stateless-media-bucket="cdn.thenewstack.io" data-stateless-media-name="media/2026/02/c25e0b3b-img_1020-scaled.jpg" /><p>This week Anthropic finally dropped the first model in its Mythos-class tier, Fable 5. The pitch was clear: this is</p> <p>The post <a href="https://thenewstack.io/fable-5-opus-comparison/">Fable 5 vs Opus 4.8: The real stakes, not the spec sheet</a> appeared first on <a href="https://thenewstack.io">The New Stack</a>.</p>
    Anthropic&#039;s new Fable 5 promised a step change over Opus 4.8. We ran identical coding and reasoning tests. They converged — but the bills didn&#039;t.
  29. Claude Fable cost $9 in one coding test. GPT-5.5 cost $1.50. Model triage is the new AI skill.

    Sat, 13 Jun 2026 10:31:00 -0000

    <img width="1024" height="631" src="https://cdn.thenewstack.io/media/2026/06/4ae17814-point-normal-wsddysunslu-unsplash-1024x631.jpg" class="webfeedsFeaturedVisual wp-post-image wp-stateless-item" alt="" style="display: block; margin: auto; margin-bottom: 20px;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" data-image-size="large" data-stateless-media-bucket="cdn.thenewstack.io" data-stateless-media-name="media/2026/06/4ae17814-point-normal-wsddysunslu-unsplash-scaled.jpg" /><p>I&#8217;m Matt Burns, Chief Content Officer at Insight Media Group. Each week, I round up the most important AI developments,</p> <p>The post <a href="https://thenewstack.io/claude-fable-cost-model-triage/">Claude Fable cost $9 in one coding test. GPT-5.5 cost $1.50. Model triage is the new AI skill.</a> appeared first on <a href="https://thenewstack.io">The New Stack</a>.</p>
    Anthropic&#039;s two-week Fable window, a tokenomics warning, and OpenAI&#039;s looming price cuts all point the same way: the hard part isn&#039;t using the best model. It&#039;s knowing when not to.
  30. Federal government orders Anthropic to pull Fable 5 and Mythos 5, three days after launch

    Sat, 13 Jun 2026 02:12:53 -0000

    <img width="1024" height="689" src="https://cdn.thenewstack.io/media/2026/06/1e7c4f4d-2026-06-12_fable-5-hero_cameras-lianhao-qu-1024x689.jpg" class="webfeedsFeaturedVisual wp-post-image wp-stateless-item" alt="" style="display: block; margin: auto; margin-bottom: 20px;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" data-image-size="large" data-stateless-media-bucket="cdn.thenewstack.io" data-stateless-media-name="media/2026/06/1e7c4f4d-2026-06-12_fable-5-hero_cameras-lianhao-qu.jpg" /><p>Anthropic abruptly disabled Fable 5 and Mythos 5 for all customers Friday evening, after the US government issued an export</p> <p>The post <a href="https://thenewstack.io/us-gov-orders-anthropic-to-pull-fable-5-and-mythos-5-three-days-after-launch/">Federal government orders Anthropic to pull Fable 5 and Mythos 5, three days after launch</a> appeared first on <a href="https://thenewstack.io">The New Stack</a>.</p>
    An export control directive over an alleged jailbreak forced Anthropic to cut off its two Mythos-class models for every customer worldwide. All other Claude models stay up.
  31. Who gets to be Switzerland in the enterprise agent wars?

    Fri, 12 Jun 2026 19:50:19 -0000

    <img width="1024" height="768" src="https://cdn.thenewstack.io/media/2026/06/8a639e1e-img_4275-1024x768.jpg" class="webfeedsFeaturedVisual wp-post-image wp-stateless-item" alt="" style="display: block; margin: auto; margin-bottom: 20px;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" data-image-size="large" data-stateless-media-bucket="cdn.thenewstack.io" data-stateless-media-name="media/2026/06/8a639e1e-img_4275-scaled.jpg" /><p>Every enterprise software vendor is currently selling some version of the same thing: AI agents grounded in enterprise context and</p> <p>The post <a href="https://thenewstack.io/outsystems-agent-orchestration-neutrality/">Who gets to be Switzerland in the enterprise agent wars?</a> appeared first on <a href="https://thenewstack.io">The New Stack</a>.</p>
    Every vendor is selling AI agent orchestration. OutSystems CEO Woodson Martin argues neutrality — not owning the data — is the real edge.
  32. Coding agents have questions, too — so Stack Overflow built them a home

    Fri, 12 Jun 2026 16:30:27 -0000

    <img width="1024" height="810" src="https://cdn.thenewstack.io/media/2026/06/c98b1292-graficon-stuff-g-dn3lt8e84-unsplash-1024x810.jpg" class="webfeedsFeaturedVisual wp-post-image wp-stateless-item" alt="A robot holds up an &#039;I don&#039;t know&#039; sign to a puzzled human — illustrating the limits of AI-generated answers without verified knowledge." style="display: block; margin: auto; margin-bottom: 20px;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" data-image-size="large" data-stateless-media-bucket="cdn.thenewstack.io" data-stateless-media-name="media/2026/06/c98b1292-graficon-stuff-g-dn3lt8e84-unsplash-scaled.jpg" /><p>Stack Overflow has been the internet&#8217;s go-to troubleshooting ground for software developers for more than 15 years &#8212; the place</p> <p>The post <a href="https://thenewstack.io/stack-overflow-for-agents/">Coding agents have questions, too — so Stack Overflow built them a home</a> appeared first on <a href="https://thenewstack.io">The New Stack</a>.</p>
    Stack Overflow has launched Stack Overflow for Agents, an API-first platform that extends its knowledge-sharing model to AI coding agents.
  33. “Don’t just grab random stuff off the internet”: What Chainguard found in 52,000 open-source packages

    Thu, 11 Jun 2026 20:38:54 -0000

    <img width="1024" height="566" src="https://cdn.thenewstack.io/media/2026/06/91870c56-salvus-fp7r4kc86bi-unsplash-1024x566.jpg" class="webfeedsFeaturedVisual wp-post-image wp-stateless-item" alt="" style="display: block; margin: auto; margin-bottom: 20px;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" data-image-size="large" data-stateless-media-bucket="cdn.thenewstack.io" data-stateless-media-name="media/2026/06/91870c56-salvus-fp7r4kc86bi-unsplash.jpg" /><p>The promise of agentic development is that anyone &#8212; the finance analyst, the operations manager, the non-technical founder &#8212; can</p> <p>The post <a href="https://thenewstack.io/chainguard-greyware-scanner-vibe-coding/">&#8220;Don&#8217;t just grab random stuff off the internet&#8221;: What Chainguard found in 52,000 open-source packages</a> appeared first on <a href="https://thenewstack.io">The New Stack</a>.</p>
    Chainguard&#039;s new scanner blocks &quot;greyware&quot; — open source packages that pass every security check but still steal credentials, harvest API keys, and phone home to remote servers.
  34. “AI is disrupting everything”: Where do entry-level tech jobs go now?

    Thu, 11 Jun 2026 18:34:19 -0000

    <img width="1024" height="476" src="https://cdn.thenewstack.io/media/2026/01/25d2c670-sara-oliveira-p8e3_lejc1w-unsplash-scaled-1024x476.jpg" class="webfeedsFeaturedVisual wp-post-image wp-stateless-item" alt="A three-panel comic book-style image of a man at a computer. One panel shows his hand on a keyboard, the middle panel shows his tired-looking eyes, and the third panel shows his hand on a mouse." style="display: block; margin: auto; margin-bottom: 20px;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" data-image-size="large" data-stateless-media-bucket="cdn.thenewstack.io" data-stateless-media-name="media/2026/01/25d2c670-sara-oliveira-p8e3_lejc1w-unsplash-scaled.jpg" /><p>The impact AI is making on the world&#8217;s workforce is being felt across every industry, but perhaps nowhere more acutely</p> <p>The post <a href="https://thenewstack.io/ai-junior-developer-hiring/">&#8220;AI is disrupting everything&#8221;: Where do entry-level tech jobs go now?</a> appeared first on <a href="https://thenewstack.io">The New Stack</a>.</p>
    A new Linux Foundation report finds AI is fueling a 27% tech hiring surge in Europe, even as junior roles shrink against a growing global trend.
  35. “The manual model breaks”: What happens when agents write to production data

    Thu, 11 Jun 2026 17:35:59 -0000

    <img width="1024" height="687" src="https://cdn.thenewstack.io/media/2026/06/8f790254-clark-van-der-beken-tk0b3dfkf_4-unsplash-1024x687.jpg" class="webfeedsFeaturedVisual wp-post-image wp-stateless-item" alt="Layered geometric shapes in gradient colors transitioning from coral and pink in the upper left to cyan and teal in the lower right, forming a chevron or arrow pattern pointing left" style="display: block; margin: auto; margin-bottom: 20px;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" data-image-size="large" data-stateless-media-bucket="cdn.thenewstack.io" data-stateless-media-name="media/2026/06/8f790254-clark-van-der-beken-tk0b3dfkf_4-unsplash-scaled.jpg" /><p>Beneath the chatbots and copilots, there&#8217;s a quiet revolution happening in the data services space. From pure-play database vendors to</p> <p>The post <a href="https://thenewstack.io/lakefs-agentic-ai-sandbox/">&#8220;The manual model breaks&#8221;: What happens when agents write to production data</a> appeared first on <a href="https://thenewstack.io">The New Stack</a>.</p>
    lakeFS launches an agentic AI data service with isolated sandboxes, branch-scoped credentials, and audit trails to bring governance to autonomous AI workloads.
  36. Beyond the stack trace: why AI requires a new debugging paradigm

    Thu, 11 Jun 2026 17:00:00 -0000

    <img width="1024" height="576" src="https://cdn.thenewstack.io/media/2026/06/cc61f4b3-allison-saeng-vcz6mteq_we-unsplash-1024x576.jpg" class="webfeedsFeaturedVisual wp-post-image wp-stateless-item" alt="Dark atmospheric illustration of a winding stone path leading to a mysterious gothic castle, serving as a visual metaphor for navigating a tech black box." style="display: block; margin: auto; margin-bottom: 20px;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" data-image-size="large" data-stateless-media-bucket="cdn.thenewstack.io" data-stateless-media-name="media/2026/06/cc61f4b3-allison-saeng-vcz6mteq_we-unsplash-scaled.jpg" /><p>For some time, debugging has relied on the assumption that software is deterministic. It&#8217;s expected that with the same input,</p> <p>The post <a href="https://thenewstack.io/beyond-the-stack-trace/">Beyond the stack trace: why AI requires a new debugging paradigm</a> appeared first on <a href="https://thenewstack.io">The New Stack</a>.</p>
    AI breaks traditional debugging. Learn how prompt tracing makes non-deterministic AI systems observable, predictable, and reliable.
  37. How to delegate 40% of tickets to AI

    Thu, 11 Jun 2026 14:30:00 -0000

    <img width="1024" height="682" src="https://cdn.thenewstack.io/media/2026/06/9443b568-macude-mariana-cuesta-dgldpvz5dri-unsplash-1024x682.jpg" class="webfeedsFeaturedVisual wp-post-image wp-stateless-item" alt="Minimalist blueprint illustration of an empty theater auditorium, a visual metaphor for software delivery infrastructure and AI agent workflows." style="display: block; margin: auto; margin-bottom: 20px;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" data-image-size="large" data-stateless-media-bucket="cdn.thenewstack.io" data-stateless-media-name="media/2026/06/9443b568-macude-mariana-cuesta-dgldpvz5dri-unsplash-scaled.jpg" /><p>AI beats us at coding.  But it’s also better and faster at nearly everything else: planning, QA, working with all</p> <p>The post <a href="https://thenewstack.io/delegate-tickets-to-ai/">How to delegate 40% of tickets to AI</a> appeared first on <a href="https://thenewstack.io">The New Stack</a>.</p>
    Learn how to safely delegate 40% of development tickets to AI agents by providing the right context, guardrails, and visibility.
  38. AI agents need infrastructure: Why Europe’s regional cloud strategy matters

    Thu, 11 Jun 2026 14:00:00 -0000

    <img width="1024" height="683" src="https://cdn.thenewstack.io/media/2026/06/b7db873d-hartono-creative-studio-tdcd0uvi5qq-unsplash-1024x683.jpg" class="webfeedsFeaturedVisual wp-post-image wp-stateless-item" alt="Nighttime satellite view of Europe from space, with city lights illuminating the UK, France, Italy, Germany and surrounding countries against the dark curve of Earth" style="display: block; margin: auto; margin-bottom: 20px;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" data-image-size="large" data-stateless-media-bucket="cdn.thenewstack.io" data-stateless-media-name="media/2026/06/b7db873d-hartono-creative-studio-tdcd0uvi5qq-unsplash-scaled.jpg" /><p>It&#8217;s no secret that generative AI has shifted the operations and business models of companies in nearly every sector. But</p> <p>The post <a href="https://thenewstack.io/agentic-ai-cloud-sovereignty/">AI agents need infrastructure: Why Europe’s regional cloud strategy matters</a> appeared first on <a href="https://thenewstack.io">The New Stack</a>.</p>
    Agentic AI is reshaping enterprise cloud. Here&#039;s why European businesses are moving beyond US hyperscalers to sovereign, cost-effective infrastructure.
  39. Agentic development hinges on verification. For cloud-native software, that is a runtime problem.

    Thu, 11 Jun 2026 14:00:00 -0000

    <img width="1024" height="514" src="https://cdn.thenewstack.io/media/2026/06/a09a07a4-anna-kutukova-m1q2k-4um9u-unsplash-1024x514.jpg" class="webfeedsFeaturedVisual wp-post-image wp-stateless-item" alt="Minimalist pastel illustration of a lone figure walking down a long road, serving as an artistic metaphor for autonomous async AI agents and cloud-native software development pipelines." style="display: block; margin: auto; margin-bottom: 20px;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" data-image-size="large" data-stateless-media-bucket="cdn.thenewstack.io" data-stateless-media-name="media/2026/06/a09a07a4-anna-kutukova-m1q2k-4um9u-unsplash-scaled.jpg" /><p>Async agents are only useful if you can trust what they hand back. In a distributed system, that trust comes</p> <p>The post <a href="https://thenewstack.io/verifying-async-ai-agents/">Agentic development hinges on verification. For cloud-native software, that is a runtime problem.</a> appeared first on <a href="https://thenewstack.io">The New Stack</a>.</p>
    Async agents can write code, but can you trust it? Shift verification to the inner loop to scale cloud-native AI development safely.
  40. Transform your AI coding agent into a deterministic Java Spring expert

    Thu, 11 Jun 2026 13:00:00 -0000

    <img width="1024" height="724" src="https://cdn.thenewstack.io/media/2026/06/31d4073f-milhad-aymif6jriss-unsplash-1024x724.jpg" class="webfeedsFeaturedVisual wp-post-image wp-stateless-item" alt="Stylized teal metronome under a starry night sky, serving as a visual metaphor for predictable, deterministic AI coding and automated Java Spring Boot upgrades." style="display: block; margin: auto; margin-bottom: 20px;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" data-image-size="large" data-stateless-media-bucket="cdn.thenewstack.io" data-stateless-media-name="media/2026/06/31d4073f-milhad-aymif6jriss-unsplash-scaled.jpg" /><p>With the rise of AI coding agents, developers have begun experimenting with complex, multi-step upgrade requests. Since Spring is the</p> <p>The post <a href="https://thenewstack.io/deterministic-ai-spring-upgrades/">Transform your AI coding agent into a deterministic Java Spring expert</a> appeared first on <a href="https://thenewstack.io">The New Stack</a>.</p>
    Combine AI coding agents with deterministic tools to safely accelerate Java Spring Boot upgrades at scale.
  41. WeAreDevelopers is coming to the US to give unsung developers a bigger voice

    Thu, 11 Jun 2026 12:42:08 -0000

    <img width="1024" height="576" src="https://cdn.thenewstack.io/media/2026/06/63aa760d-thumbnail-2-3-1024x576.png" class="webfeedsFeaturedVisual wp-post-image wp-stateless-item" alt="WeAreDevelopers, the Berlin-based developer conference founded in 2015, has grown into a major global event, attracting 15,000 developers from over 70 countries each year. In 2026, it expands beyond Europe with new editions in San Jose, California, and Bengaluru, India. Co-founder and CEO Sead Ahmetovic says the conference was created to give developers a stronger voice in an industry where marketers, salespeople, and entrepreneurs often receive more recognition." style="display: block; margin: auto; margin-bottom: 20px;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" data-image-size="large" data-stateless-media-bucket="cdn.thenewstack.io" data-stateless-media-name="media/2026/06/63aa760d-thumbnail-2-3.png" /><p>Unless you&#8217;ve been living under a mousepad, you know about WeAreDevelopers. The Berlin-based software developers conference and networking event, now</p> <p>The post <a href="https://thenewstack.io/wearedevelopers-san-jose-expansion/">WeAreDevelopers is coming to the US to give unsung developers a bigger voice</a> appeared first on <a href="https://thenewstack.io">The New Stack</a>.</p>
    The WeAreDevelopers conference expands to San Jose, California, this September. WAD co-founder and CEO Sead Ahmetovic and Entire CEO Thomas Dohmke speak with The New Stack about the future of software development ahead of the event.
  42. Cleaner AI training data, fewer bugs: Sonar’s SonarSweep explained

    Thu, 11 Jun 2026 12:00:00 -0000

    <img width="1024" height="752" src="https://cdn.thenewstack.io/media/2026/06/0471f053-ubaid-e-alyafizi-wmza28xfgcq-unsplash-1024x752.jpg" class="webfeedsFeaturedVisual wp-post-image wp-stateless-item" alt="Surreal split-screen illustration showing a clean architectural structure above a waterline and a complex inverted machine below, serving as a metaphor for clean code versus hidden software bugs and technical debt." style="display: block; margin: auto; margin-bottom: 20px;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" data-image-size="large" data-stateless-media-bucket="cdn.thenewstack.io" data-stateless-media-name="media/2026/06/0471f053-ubaid-e-alyafizi-wmza28xfgcq-unsplash-scaled.jpg" /><p>Large language models have moved quickly from novelty to daily infrastructure in software development. We are no longer using AI</p> <p>The post <a href="https://thenewstack.io/ai-training-data-quality/">Cleaner AI training data, fewer bugs: Sonar&#8217;s SonarSweep explained</a> appeared first on <a href="https://thenewstack.io">The New Stack</a>.</p>
    SonarSweep filters flawed AI training data, reducing bugs and security vulnerabilities in generated code by 41%. Learn how it works.
  43. Observability overload is drowning engineers

    Wed, 10 Jun 2026 17:27:46 -0000

    <img width="1024" height="819" src="https://cdn.thenewstack.io/media/2026/03/77b01507-barsrsind-kd0e2nzpmdu-unsplash-1024x819.jpg" class="webfeedsFeaturedVisual wp-post-image wp-stateless-item" alt="Abstract illustration of a hand holding a cross-section of gears and network nodes, representing the complex database storage underlying simple AI agent interfaces." style="display: block; margin: auto; margin-bottom: 20px;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" data-image-size="large" data-stateless-media-bucket="cdn.thenewstack.io" data-stateless-media-name="media/2026/03/77b01507-barsrsind-kd0e2nzpmdu-unsplash-scaled.jpg" /><p>If you can see everything, you may see nothing at all. That&#8217;s what SREs and DevOps engineers are learning as</p> <p>The post <a href="https://thenewstack.io/observability-overload-is-drowning-engineers/">Observability overload is drowning engineers</a> appeared first on <a href="https://thenewstack.io">The New Stack</a>.</p>
    Learn how to bring observability data into the agentic development environment of your choice such as Codex, Cursor, and Claude Code.
  44. Google’s DiffusionGemma is 4x faster than its other Gemma models

    Wed, 10 Jun 2026 17:18:54 -0000

    <img width="1024" height="641" src="https://cdn.thenewstack.io/media/2026/02/08eb79e4-mitchell-luo-jz4ca36oj_m-unsplash-scaled-1024x641.jpg" class="webfeedsFeaturedVisual wp-post-image wp-stateless-item" alt="" style="display: block; margin: auto; margin-bottom: 20px;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" data-image-size="large" data-stateless-media-bucket="cdn.thenewstack.io" data-stateless-media-name="media/2026/02/08eb79e4-mitchell-luo-jz4ca36oj_m-unsplash-scaled.jpg" /><p>About a year ago, Google demoed a diffusion model at its I/O developer conference, but went quiet about the technology</p> <p>The post <a href="https://thenewstack.io/google-diffusiongemma-text-diffusion/">Google&#8217;s DiffusionGemma is 4x faster than its other Gemma models</a> appeared first on <a href="https://thenewstack.io">The New Stack</a>.</p>
    The experimental model trails standard Gemma 4 on every benchmark, a tradeoff Google says is worth it for tasks like code infilling and in-line editing.
  45. Fable 5: Guardrails and burn rate are annoying users, who say it’s still better than Opus 4.8

    Wed, 10 Jun 2026 17:11:37 -0000

    <img width="1024" height="558" src="https://cdn.thenewstack.io/media/2026/06/f7fa1729-screenshot-2026-06-10-at-18.24.49-1024x558.png" class="webfeedsFeaturedVisual wp-post-image wp-stateless-item" alt="Butterflies on cream background" style="display: block; margin: auto; margin-bottom: 20px;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" data-image-size="large" data-stateless-media-bucket="cdn.thenewstack.io" data-stateless-media-name="media/2026/06/f7fa1729-screenshot-2026-06-10-at-18.24.49.png" /><p>On Tuesday, Anthropic debuted Fable 5, the first &#8212; and much-anticipated &#8212;&#160;generally available Mythos-class model.&#160; Anthropic says it can work</p> <p>The post <a href="https://thenewstack.io/fable-5-developer-reactions/">Fable 5: Guardrails and burn rate are annoying users, who say it&#8217;s still better than Opus 4.8</a> appeared first on <a href="https://thenewstack.io">The New Stack</a>.</p>
    Anthropic’s first generally available Mythos-class model kills it on the benchmarks, but early users raise eyebrows over usage limits and data retention. 
  46. The Anthropic leader who built Claude Code says he ditched prompting — now he just writes loops.

    Wed, 10 Jun 2026 17:01:52 -0000

    <img width="1024" height="768" src="https://cdn.thenewstack.io/media/2026/06/06fd3bf9-ghariza-mahavira-ebqpvsk1efu-unsplash-1024x768.jpg" class="webfeedsFeaturedVisual wp-post-image wp-stateless-item" alt="Six overlapping oval shapes with pink-to-blue gradient, abstract wave pattern on black background." style="display: block; margin: auto; margin-bottom: 20px;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" data-image-size="large" data-stateless-media-bucket="cdn.thenewstack.io" data-stateless-media-name="media/2026/06/06fd3bf9-ghariza-mahavira-ebqpvsk1efu-unsplash-scaled.jpg" /><p>The fastest-moving conversation in AI developer tooling this week began with a job description. Boris Cherny, head of Claude Code</p> <p>The post <a href="https://thenewstack.io/loop-engineering/">The Anthropic leader who built Claude Code says he ditched prompting — now he just writes loops.</a> appeared first on <a href="https://thenewstack.io">The New Stack</a>.</p>
    Loop engineering — the practice of designing automated agent workflows instead of prompting manually — is reshaping how developers use Claude Code and OpenAI Codex in 2026.
  47. Distinguished Site Reliability Engineer

    Thu, 28 May 2026 16:00:00 -0000

    <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">Since its inception over 20 years ago, Google has used </span><a href="https://sre.google/" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">Site Reliability Engineering (SRE)</span></a><span style="vertical-align: baseline;"> to keep services like Search, Gmail, Maps, YouTube and Google Cloud reliable and highly available, adhering to the </span><a href="https://sre.google/sre-book/table-of-contents/" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">principles</span></a><span style="vertical-align: baseline;"> and </span><a href="https://sre.google/workbook/table-of-contents/" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">practices</span></a><span style="vertical-align: baseline;"> of the reliability-first mindset.</span></p> <p><span style="vertical-align: baseline;">Recently though, the emergence of AI has driven multiple step-changes in system complexity. Interactions between components are now more complicated due to a variety of factors:</span></p> <ul> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">With microservice architectures, systems are distributed across wider geographical locations and data centers that have greater hardware diversity. </span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">Enterprise cloud products offer an extensive array of capabilities with an incredibly complex set of products. </span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">Google services now cover more unique business and regulatory requirements, making the overall topology and taxonomy much more complex and difficult to understand, a challenge amplified by the constant stream of system changes resulting from continuous deployment pipelines. </span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">AI code generation capabilities have enabled software developers to deliver orders of magnitude more code, resulting in more opportunities to introduce reliability issues.</span></p> </li> </ul> <p><span style="vertical-align: baseline;">While AI is in some ways making the SRE team’s work more challenging, it also provides new ways to understand and improve software development lifecycles, including production operations. Google SRE is on the path to fully adopt AI and agentic technologies, </span><span style="vertical-align: baseline;">leveraging AI as a force multiplier while also </span><span style="vertical-align: baseline;">maintaining control</span><span style="vertical-align: baseline;">. We call this SRE AI. </span></p> <p><span style="vertical-align: baseline;">Read on for a summary of considerations when thinking about this topic, or you can dive straight into our comprehensive whitepaper, </span><a href="https://goo.gle/4uUxy4y" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">AI in SRE Practice: Moving Beyond Automation at Google</span></a><span style="vertical-align: baseline;">, for an in-depth look at how Google SRE is navigating the transition from deterministic automation to agentic AI</span><span style="vertical-align: baseline;">.</span></p> <h3><span style="vertical-align: baseline;">The SRE AI opportunity landscape</span></h3> <p><span style="vertical-align: baseline;">To help </span><span style="vertical-align: baseline;">define our SRE AI strategy, we considered the overall software development lifecycle (SDLC) for areas of opportunity.</span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/image1_3Jp6s6J.max-1000x1000.png" alt="image1"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">The above diagram shows each of the phases where SRE is involved, and that could be improved with SRE AI. </span></p> <p><span style="vertical-align: baseline;">Perhaps the most obvious SRE area that could benefit from agentic AI is </span><strong style="vertical-align: baseline;">investigation and mitigation</strong><span style="vertical-align: baseline;">, sometimes referred to as root cause analysis (RCA), a cornerstone of the traditional SRE discipline. But RCA is by no means the whole SRE AI. Our plans for SRE AI go far beyond RCA and troubleshooting, and address the entire SDLC. Here are a few areas we are working on:</span></p> <h2><strong style="vertical-align: baseline;">Reliability design</strong></h2> <p><span style="vertical-align: baseline;">SRE has been working on the policies, tooling and procedures you need to ensure reliability is an integral part of system design through the design, launch, and deployment phases. An agentic approach does not necessarily imply removing people from the process, specifically for higher-risk services and features, but it does significantly reduce the time people need to spend, as a number of issues can be detected and auto-addressed before they need to be reviewed by a person.</span></p> <p><span style="vertical-align: baseline;">Runbooks (playbooks) and other documentation to be used during incidents are important production artifacts. Google SRE has developed AI agents to continuously monitor and improve playbooks and production documentation based on their usage during incidents. AI agents can also generate new playbooks from incidents.</span></p> <h2><strong style="vertical-align: baseline;">Anomaly detection and alerting </strong></h2> <p><span style="vertical-align: baseline;">A core SRE practice is to define </span><a href="https://cloud.google.com/blog/products/devops-sre/sre-fundamentals-sli-vs-slo-vs-sla?e=48754805"><span style="text-decoration: underline; vertical-align: baseline;">service level indicators (SLIs) and service level objectives (SLOs)</span></a><span style="vertical-align: baseline;">, and to configure alerts for them. This approach tends to be ok if service use cases are fairly uniform, and if it is possible to define objectives that align to customers' expectations. </span></p> <p><span style="vertical-align: baseline;">However, for products that support a range of customer use cases and workloads, like many in Google Cloud, it can be difficult to define a static threshold that works across a variety of workloads. With AI, Google SRE is augmenting our more traditional approaches with </span><strong style="vertical-align: baseline;">anomaly detection</strong><span style="vertical-align: baseline;">, with alerts based on detecting anomalies in regular behavior rather than statically predefined thresholds. This approach relies on agents to collect signals and feed them to a model (e.g., </span><a href="https://docs.cloud.google.com/bigquery/docs/timesfm-model"><span style="text-decoration: underline; vertical-align: baseline;">TimesFM</span></a><span style="vertical-align: baseline;">) to perform anomaly detection. Historical signals from prior customer cases help the AI agent to predict customer-oriented SLOs. Further, AI-based anomaly detection can consult sources beyond signals produced by service itself — for instance, customer feedback. </span></p> <p><span style="vertical-align: baseline;">In this model, when the SRE AI agent detects an anomaly, it triggers an alert. Then, the SRE AI alerting agent groups, pre-processes, and enriches the alerts with the necessary context and information. These alerts in turn are run through autonomous AI alert handlers, which can address or mitigate a multitude of issues. The outcome of this system is faster issue resolution and a likely significant reduction in the number of alerts that SREs need to review.</span></p> <p><span style="vertical-align: baseline;">What's key in this ecosystem of agents is to be consistently transparent about what the data agents are evaluating — and how — and having consistent controls to prevent unwanted mutations of production state. </span></p> <h2><strong style="vertical-align: baseline;">Incident management</strong></h2> <p><span style="vertical-align: baseline;">Within Google SRE, incident management, or </span><a href="https://sre.google/resources/practices-and-processes/incident-management-guide/" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">IMAG</span></a><span style="vertical-align: baseline;">, is a well-established process with clear roles and responsibilities, as well as tooling. SRE AI includes an agentic orchestration layer on top of the current IMAG process, which consists of agents that:</span></p> <ul> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">Monitor the communication surfaces used during the incident (incident response tools, chat spaces, videos, tracking documents), and consolidate/summarize data to improve communication and information sharing during the incident</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">Support handoff between SREs participating in the incident, by creating handoff documents with necessary context</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">Automatically create drafts of incident postmortems, improving their quality, reducing SRE effort, and ensuring that relevant information is included </span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">Manage internal and external incident communications</span></p> </li> </ul> <h2><strong style="vertical-align: baseline;">Incident investigation</strong><span style="vertical-align: baseline;"> </span></h2> <p><span style="vertical-align: baseline;">The Google SRE team has also created agents to investigate incidents, and in some cases to autonomously mitigate issues. </span></p> <p><span style="vertical-align: baseline;">Before they can proceed to form hypotheses and propose mitigation steps, these agents use observability data (logging, motoring, tracing), as well as system topology, taxonomy, and dependency data to establish domain and intent. A few other building blocks that these agents use are distinct agents the team has created for navigating and executing playbooks, accessing alerting, performing anomaly detection, and deriving incident insights.</span></p> <h2><strong style="vertical-align: baseline;">Insights and risk management</strong></h2> <p><span style="vertical-align: baseline;">SRE requires an understanding of the end-to-end system and effective mitigation solutions, experience and lessons learned from past incidents, and the ability to perform risk management. Autonomous AI agents need similar skills to be able to manage production environments. </span></p> <p><span style="vertical-align: baseline;">While a common topology or taxonomy system can teach agents about the end-to-end system, and well-documented and described production </span><a href="https://modelcontextprotocol.io/docs/getting-started/intro" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">Model Context Protocol (MCP)</span></a><span style="vertical-align: baseline;"> tools and skills can teach them about available tooling, there needs to be a way to continuously teach agents about historical issues and their associated risks. To solve that problem, the Google SRE team created AI Insights, a system that continuously reviews known incidents and extracts meaningful information from them, then makes it available to agents to drive better investigations and mitigation steps. </span><a href="https://ai.google.dev/gemini-api/docs/embeddings" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">Gemini embedding models</span></a><span style="vertical-align: baseline;"> and </span><a href="https://cloud.google.com/discover/what-is-a-vector-database"><span style="text-decoration: underline; vertical-align: baseline;">vector-enabled databases</span></a><span style="vertical-align: baseline;"> power this system.</span></p> <p><span style="vertical-align: baseline;">The other part of the system is risk insights. The AI system marks each incident with appropriate risk categories that can be used both by agents before applying mitigations, and by SREs to determine critical areas to address.</span></p> <h3><span style="vertical-align: baseline;">Design considerations</span></h3> <p><span style="vertical-align: baseline;">Before building out these agents, Google SRE </span><span style="vertical-align: baseline;">defined a few high level principles for their adoption:</span></p> <ul> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">Processes and operations that are already successfully automated, or that can be easily automated with classic non-AI based systems, do not need to be replaced (as long as they meet business needs).</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">Any new AI-based system must comply with existing and upcoming policies and procedures to keep the strong promises we have to our customers.</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">An SRE AI agent needs to meet security, safety, and privacy requirements the same way as current systems and humans.</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">SRE AI agents must have a strong identity (agents have roles and permissions assigned).</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">SRE AI agents need to provide a high level of reliability SLOs and have well-defined backup options (automated or manual).</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">SRE AI agents must be able to explain and reason about why and how they performed an action, as well as what options were considered and rejected. In other words, we favor transparency over black-box automation. </span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">Business continuity plans must include contingencies for potential AI failures.</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">AI-based systems need continuous access to production data to make correct decisions.</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">AI systems need to be continuously evaluated against a quality framework, as well as to support </span><span style="vertical-align: baseline;">auditing and reporting to enable security tooling like detection and response.</span></p> </li> </ul> <p><span style="vertical-align: baseline;">In addition, we stipulated that SRE AI systems should make Google services even better for users and customers by accomplishing at least one of the following: </span></p> <ul> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">Relieve engineers from laborious and </span><span style="vertical-align: baseline;">repetitive</span><span style="vertical-align: baseline;"> operations</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">Help engineers improve the quality and speed of decision making and execution </span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">Allow SREs to better prevent, detect, and/or mitigate problems than they could address before</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">Enable autonomous agentic feedback loops that drive toward service reliability improvements</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">Reduce overall operational costs</span></p> </li> </ul> <h3><span style="vertical-align: baseline;">Built on proven infrastructure</span></h3> <p><span style="vertical-align: baseline;">Google SRE AI is built on proven Google infrastructure:</span></p> <ul> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><a href="https://ai.google.dev/gemini-api/docs/models" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">Gemini</span></a><span style="vertical-align: baseline;">: The base foundational model behind Google SRE AI. The SRE team also depends heavily on custom fine-tuned Gemini models based on internal Google data and knowledge.</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><a href="https://cloud.google.com/vertex-ai"><span style="text-decoration: underline; vertical-align: baseline;">Gemini Enterprise Agent Platform (formerly Vertex AI)</span></a><span style="vertical-align: baseline;">: A full AI stack for developing solutions.</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="text-decoration: underline; vertical-align: baseline;">Agent Development Kit (</span><a href="https://google.github.io/adk-docs/" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">ADK):</span></a><span style="vertical-align: baseline;"> The development platform.</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">MCP servers: Running on top of standard Google API infrastructure, this is the same infrastructure used to provide </span><a href="https://cloud.google.com/blog/products/ai-machine-learning/announcing-official-mcp-support-for-google-services"><span style="text-decoration: underline; vertical-align: baseline;">external customers with MCP support</span></a><span style="vertical-align: baseline;">.</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">Standard internal observability infrastructure (monitoring, logging, tracing).</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">AI and ML capabilities built into </span><a href="https://cloud.google.com/bigquery?utm_source=pmax&amp;utm_medium=display&amp;utm_campaign=Cloud-SS-DR-GCP-1713658-GCP-DR-NA-US-en-pmax-Display-pmax-All-BigQuery&amp;utm_content=c--x--9021712-21713147502&amp;gclsrc=aw.ds&amp;gad_source=1&amp;gad_campaignid=22037004910&amp;gclid=Cj0KCQiAyP3KBhD9ARIsAAJLnnbo2-37fR9eOpRLdHeKbvQPLy5r1oGBQcBDoi5rquEdx-JMkX6ryzQaAsShEALw_wcB"><span style="text-decoration: underline; vertical-align: baseline;">Google BigQuery</span></a><span style="vertical-align: baseline;">, and </span><a href="https://cloud.google.com/discover/what-is-a-vector-database"><span style="text-decoration: underline; vertical-align: baseline;">Google vector databases</span></a><span style="vertical-align: baseline;">.</span></p> </li> </ul> <p><span style="vertical-align: baseline;">We group these infrastructure components together into autonomous systems. At Google, we’ve been developing and using autonomous systems to manage production for a long time. However, today’s AI-based autonomous systems are very powerful and not always deterministic. To help us understand how autonomous the systems truly are, we developed a way to track autonomous levels.</span></p> <h3><span style="vertical-align: baseline;">Dive deeper: Read the white paper</span></h3> <p><span style="vertical-align: baseline;">For engineers and leaders looking to explore the technical architecture and rigorous governance models behind these innovations, we invite you to read our comprehensive whitepaper, “AI in SRE Practice: Moving Beyond Automation at Google,” which provides an in-depth look at how Google SRE is navigating the transition from deterministic automation to agentic AI. Download the whitepaper</span><span style="vertical-align: baseline;"> </span><a href="https://goo.gle/4uUxy4y" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">here</span></a><span style="vertical-align: baseline;">.</span></p></div>
  48. Sr. Director, Product Management

    Wed, 22 Apr 2026 12:00:00 -0000

    <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">Today at Google Cloud Next, we are unveiling a more proactive Gemini Cloud Assist, our AI-assisted cloud operations platform. This update shifts your Google Cloud operations from manual workflows to a proactive, intelligent experience supported by a powerful ecosystem of agents.</span></p> <p><strong style="vertical-align: baseline;">Why it matters: </strong><span style="vertical-align: baseline;">A new agentic architecture enables Gemini Cloud Assist to handle the heavy lifting of your cloud management. By embedding intelligence, your enterprise context, and the power of Gemini directly into the operational layer, Gemini Cloud Assist proactively executes complex tasks such as designing applications, troubleshooting issues, and preemptively optimizing costs, that previously required constant human oversight. In enterprise-scale systems, this approach accelerates development velocity and reduces resolution times. </span></p> <p><strong style="vertical-align: baseline;">What’s new: </strong></p> <ul> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">Using natural language and the power of Gemini, reduce the time from design to deployment of new or existing multi-resource deployments via a </span><strong style="vertical-align: baseline;">redesigned Application Design Center</strong><span style="vertical-align: baseline;">.</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">Automate infrastructure operations via </span><strong style="vertical-align: baseline;">gcloud, kubectl, and Terraform </strong><span style="vertical-align: baseline;">while using proactive multi-turn agents to troubleshoot and resolve incidents.</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">Identify your cost anomalies 24/7 </strong><span style="vertical-align: baseline;">using a proactive FinOps agent that analyzes spending spikes and generates granular cost reports on demand.</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">Assistance wherever you are. </strong><span style="vertical-align: baseline;">Powered by </span><a href="https://docs.cloud.google.com/mcp/supported-products"><span style="text-decoration: underline; vertical-align: baseline;">Google Cloud MCP servers</span></a><span style="vertical-align: baseline;"> and our proactive agents under the hood, Gemini Cloud Assist also exposes its own design, operation, troubleshooting and optimization capabilities as published MCP servers, bringing them straight to your IDE. </span></p> </li> </ul> <p style="padding-left: 40px;"><span style="font-style: italic; vertical-align: baseline;">“Gemini Cloud Assist has significantly helped our dev teams. It reduced the number of outreach and touch points I have with them regarding Google Cloud questions by 60%. This allows our cloud team to scale more effectively and focus on more complex tasks.” </span><span style="vertical-align: baseline;">- Oscar Aldana Assad, Senior Cloud Engineer, Petco</span></p> <p><span style="vertical-align: baseline;">Let’s take a deeper look at how the agentic Gemini Cloud Assist can help your operations.</span></p> <h3><span style="vertical-align: baseline;">Accelerate production-readiness with Application Design Center</span></h3> <p><span style="vertical-align: baseline;">Gemini Cloud Assist serves as the intelligent reasoning engine for Application Design Center, acting as the bridge between natural-language intent, and a visual, production-ready architecture. By describing your infrastructure goals in plain language, Gemini Cloud Assist leverages Application Design Center to automatically lay out a visual design, including deployable Terraform. These templates are based on best-practice architecture guidance from Google Cloud and help bring security, reliability and compliance by design. Integrated with Security Command Center, quickly go from idea to deployment that conforms to your organizational policies.</span></p> <p><span style="vertical-align: baseline;">Platform teams can then curate shared catalogs of pre-approved templates and integrate their own custom Terraform modules directly into the design process, providing a governed framework. This established, well-lit path helps developers adhere to organizational security and compliance guardrails from the first day of deployment. Beyond initial deployment, Gemini supports the full application lifecycle with interactive, multi-turn problem solving to update cloud resources. </span></p></div> <div class="block-paragraph_advanced"><h3><span style="vertical-align: baseline;">Move from reactive to proactive remediation</span></h3> <p><span style="vertical-align: baseline;">In production, Gemini Cloud Assist helps you shift operations from reactive troubleshooting to quickly analyzing hypotheses to drive a lower time to resolution. Triggered by alerts, Gemini Cloud Assist proactively clusters and analyzes signals to initiate investigations before issues escalate. Now with Gemini 3, Gemini Cloud Assist correlates logs and metrics and identifies root causes from infrastructure signals down to the application code. Gemini Cloud Assist explores parallel hypotheses via tool calls and presents a technical breakdown of observations in a centralized UI. If intervention is required to address an underlying Google Cloud issue, users can hand off complete context to Google support, minimizing the iterations required for sharing configuration and context data.</span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/proactive_alert_investigations_blog.gif" alt="proactive_alert_investigations"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><h3><span style="vertical-align: baseline;">Identify cost anomalies 24/7</span></h3> <p><span style="vertical-align: baseline;">To maintain economic health, Gemini Cloud Assist now acts as an proactive optimization agent for your projects. Running in the background 24/7, it monitors for cost anomalies and provides root-cause analysis, correlating spending spikes with specific engineering triggers like new resource creation, auto-scaling events, or pricing changes. You can query resource utilization via natural language to generate on-demand, tabulated reports, by project and applications registered in AppHub, providing granular visibility into "</span><span style="font-style: italic; vertical-align: baseline;">who, what, when, and how</span><span style="vertical-align: baseline;">" — without manual data aggregation. For example, you can ask ‘Why did the cost of my application increase yesterday?’ or ‘How much did my project cost last month?’ and Gemini Cloud Assist answers by correlating cost data with infrastructure change, audit, and monitoring logs to get you an accurate answer.</span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/3_Vz6FwaI.gif" alt="3"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><h3><span style="vertical-align: baseline;">Assistance everywhere</span></h3> <p><span style="vertical-align: baseline;">We are meeting teams where they work by expanding the surfaces where Gemini Cloud Assist is available. A Gemini Cloud Assist agent is already accessible through the console and mobile interfaces. And with new support for the Model Context Protocol (MCP), Gemini Cloud Assist is now available in Gemini CLI, your favorite agentic IDE or CLI, and third-party toolchains like ServiceNow and Slack. Integrating proactive assistance within existing workflows helps teams to avoid context switching and maintain flow-state.</span></p> <h3><span style="vertical-align: baseline;">Proactive capabilities at your fingertips</span></h3> <p><span style="vertical-align: baseline;">We designed Gemini Cloud Assist to help manage the end-to-end lifecycle of your applications, providing a multi-agent approach from deploying new applications to managing existing applications in the cloud. With the help of Gemini 3, Gemini Cloud Assist can now:</span></p> <ul> <li role="presentation"><strong style="vertical-align: baseline;">Increase your development velocity: </strong><span style="vertical-align: baseline;">Accelerate production-readiness using intent-driven architectures that unify best practices, security policies, and enterprise compliance.</span></li> <li role="presentation"><strong style="vertical-align: baseline;">Streamline production operations</strong><span style="vertical-align: baseline;">: Triage, diagnose and resolve production issues faster, through Gemini-based troubleshooting, recommendations and remediations.</span></li> <li role="presentation"><strong style="vertical-align: baseline;">Automate cost optimization:  </strong><span style="vertical-align: baseline;">Automatically detect, analyze, root-cause, and alert you about cost anomalies for your projects on a daily basis. </span></li> <li role="presentation"><strong style="vertical-align: baseline;">Meet your teams where they are:  </strong><span style="vertical-align: baseline;">Through proactive agents and MCP tools, engage with functionality using surfaces that range from the Google Cloud console to your CLI and IDE, so teams can stay in a flow-state.</span></li> </ul> <p><span style="vertical-align: baseline;">The future of operations is agentic. You can begin your journey with our proactive cloud by enabling </span><a href="https://console.cloud.google.com/gemini-admin/products"><span style="text-decoration: underline; vertical-align: baseline;">Gemini Cloud Assist</span></a><span style="vertical-align: baseline;"> in your project settings today.</span></p></div>
  49. Product Manager, Google Cloud

    Mon, 02 Mar 2026 17:00:00 -0000

    <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">Managing planned maintenance is critical for ensuring business continuity and application performance. However, as your usage of cloud services grows, staying on top of maintenance schedules can be complex and time-consuming. Current approaches often result in inconsistent notifications and varying levels of control across different products. To help you avoid missed maintenance windows and disruptions, we are announcing the General Availability (GA) of Unified Maintenance, a centralized dashboard that lets you view and manage maintenance events across your Google Cloud services.</span></p> <p><span style="vertical-align: baseline;">Unified Maintenance consolidates maintenance updates into a single view, making it easier to track upcoming events. With Unified Maintenance, you can:</span></p> <ul> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">View planned maintenance:</strong><span style="vertical-align: baseline;"> See events for services like Compute Engine, Google Kubernetes Engine (GKE), Cloud SQL, Memorystore, AlloyDB, and Looker in one dashboard (see </span><a href="https://docs.cloud.google.com/unified-maintenance/docs/supported-services"><span style="text-decoration: underline; vertical-align: baseline;">supported services</span></a><span style="vertical-align: baseline;">).</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">Get standardized alerts:</strong><span style="vertical-align: baseline;"> Receive consistent maintenance information through Cloud Logging, which allows you to set up alerts and integrate them with your existing monitoring or ticketing systems.</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">Understand your options:</strong><span style="vertical-align: baseline;"> The </span><a href="https://console.cloud.google.com/cloud-hub/maintenance" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">dashboard</span></a><span style="vertical-align: baseline;"> clearly indicates which maintenance events offer user controls.</span></p> </li> </ul> <h3><span style="vertical-align: baseline;">What’s next</span></h3> <p><span style="vertical-align: baseline;">We are working to add support for more Google Cloud services and enhance the platform's capabilities. Our roadmap includes expanded scopes for folders and organizations, as well as application-level visibility.</span></p> <p><span style="vertical-align: baseline;">You can access the Unified Maintenance dashboard directly in the Google Cloud console to view upcoming events for your subscribed services. To learn more about how to use these new features, read the </span><a href="https://docs.cloud.google.com/unified-maintenance/docs/overview"><span style="text-decoration: underline; vertical-align: baseline;">documentation</span></a><span style="vertical-align: baseline;"> and the </span><a href="https://docs.cloud.google.com/unified-maintenance/docs/set-up-unified-maintenance"><span style="text-decoration: underline; vertical-align: baseline;">Get started guide</span></a><span style="vertical-align: baseline;">.</span></p></div>
  50. Principal Platform Engineer, John Lewis Partnership

    Wed, 04 Feb 2026 18:00:00 -0000

    <div class="block-paragraph_advanced"><p><span style="font-style: italic; vertical-align: baseline;">For any organization that has invested in an internal developer platform, a question inevitably arises: Is it actually working? </span></p> <p><span style="font-style: italic; vertical-align: baseline;">Simply tracking adoption rates won't tell you if your platform is truly delivering value to your developers. This was the challenge faced by John Lewis, a major UK retailer. In our previous articles (parts </span><a href="https://cloud.google.com/blog/products/application-development/simplifying-platform-engineering-at-john-lewis-part-one"><span style="font-style: italic; text-decoration: underline; vertical-align: baseline;">1</span></a><span style="font-style: italic; vertical-align: baseline;"> and </span><a href="https://cloud.google.com/blog/products/application-development/simplifying-platform-engineering-at-john-lewis-part-two"><span style="font-style: italic; text-decoration: underline; vertical-align: baseline;">2</span></a><span style="font-style: italic; vertical-align: baseline;">) we introduced the John Lewis Digital Platform (JLDP) and how it enabled dozens of product teams to build high-quality software rapidly to power www.johnlewis.com and other critical applications. But how did they know that the platform was actually successful? Traditional product metrics like revenue and sales don’t translate easily to this world. When you focus only on whether your tenants use the platform, you don’t understand whether it’s bringing them value.</span></p> <p><span style="font-style: italic; vertical-align: baseline;">In this article, Alex Moss from the John Lewis platform team discusses how they moved beyond simple usage metrics to develop a sophisticated, multi-stage approach to measuring the real value of their platform — a journey that took them from lead-time metrics, to </span><a href="https://dora.dev/" rel="noopener" target="_blank"><span style="font-style: italic; text-decoration: underline; vertical-align: baseline;">DORA</span></a><span style="font-style: italic; vertical-align: baseline;">, and finally to a "Technical Health" score. Along the way, they explore how the JLDP’s purpose evolved — and its value along with it. - Darren Evans</span></p> <h3><strong style="vertical-align: baseline;">Initial measurement: A focus on platform value</strong></h3> <p><span style="vertical-align: baseline;">In the early days of the platform, understanding its value was actually much easier. This was because the platform was created with a very clear purpose: to enable speed of change. The John Lewis business wanted to create multiple product teams working on several features of johnlewis.com in parallel, and to put those features in front of customers quickly for feedback.</span></p> <p><span style="vertical-align: baseline;">Its origins in the world of the company’s John Lewis Digital online business resulted in it being treated as a product from a very early stage, and therefore integrated with that area’s reporting mechanisms too. Thus, it became normal to link the platform objectives to the online business’s broader goals each quarter and report on measurable key results. This kept the focus on the reasons the platform is important: do improvements to the platform continue to justify using it over seeking out a different one? We cannot afford to rest on our laurels!</span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/1_aSY3nPB.max-1000x1000.png" alt="1"> </a> <figcaption class="article-image__caption "><p data-block-key="nnhmb">The six annual measures reported against every quarter. The specific measures have varied over the years.</p></figcaption> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">In addition to this, in the first few years of the platform’s existence, there were three simple metrics that best indicated how the platform was living up to the rationale for creating it:</span></p> <ol> <li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">Service Creation Lead Time:</strong><span style="vertical-align: baseline;"> How long it took to create a tenancy (the space in which a product team was creating their software)</span></p> </li> <li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">Onboarding Lead Time:</strong><span style="vertical-align: baseline;"> How long it took that product team to deploy something into production</span></p> </li> <li><strong style="vertical-align: baseline;">First Customer Lead Time:</strong><span style="vertical-align: baseline;"> How long it took that product team to designate their service as “live to customers”</span></li> </ol></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/2_DVTZRKS.max-1000x1000.png" alt="2"> </a> <figcaption class="article-image__caption "><p data-block-key="nnhmb">Some screenshots from the early version of the platform's self-written service catalogue, tracking the three metrics mentioned</p></figcaption> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">This was then combined with the number of tenants present on the platform into a report, which was displayed as part of an initial home-grown Service Catalogue shown above (which was later </span><a href="https://medium.com/john-lewis-software-engineering/weve-gone-backstage-this-is-how-we-use-it-on-our-digital-platform-b299cd4acb24" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">replaced with Backstage</span></a><span style="vertical-align: baseline;">). This report served two purposes:</span></p> <ol> <li aria-level="1" style="list-style-type: lower-alpha; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">A very clear visualization for stakeholders of how much their platform was being adopted, and how fast they were able to get up and running (in particular, “Service Creation” being measured in single-digit hours, in comparison to the weeks teams would traditionally have had to wait). This is important, because in the early days of your product, you need to justify its continued growth and investment.</span></p> </li> <li aria-level="1" style="list-style-type: lower-alpha; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">A useful way for the platform team themselves (and stakeholders) to see which teams were taking their time about getting something into production. Is my product actually helping you? And if not, what more could we be doing?</span></p> </li> </ol> <p><span style="vertical-align: baseline;">Using this as a conversation-starter with our tenants opened doors to rich sources of feedback that could be turned into platform features: When we asked tenants “What’s stopping you from going live?”, they often answered that the product they were building was simply complex. But we also often saw that our own processes were getting in the way. This was important, as we could then do something about it.</span></p> <p><span style="vertical-align: baseline;">The easiest of these barriers for us to overcome were typically technology-related. In </span><a href="https://cloud.google.com/blog/products/application-development/simplifying-platform-engineering-at-john-lewis-part-one"><span style="text-decoration: underline; vertical-align: baseline;">previous articles</span></a><span style="vertical-align: baseline;">, we covered two examples, “My team is spending a lot of time writing Terraform to provision PubSub,” and “we’re having trouble learning how to use Kubernetes.” To help, the platform team created “paved roads” to enable self-service provisioning or simplification of Kubernetes, significantly reducing these burdens for teams.</span></p> <p><span style="vertical-align: baseline;">The more significant opportunities to streamline getting new services live were a result of our processes (e.g., security approvals) — and if your platform is empowered to simplify these sorts of organizational functions, then the gains can be extremely beneficial. One such example was the Information Security risk assurance process. Gaining the necessary security sign-offs and producing the required documentation was a necessary but time-consuming task, and - with the rate of change in the business - this was often something that many teams were going through in parallel. Our platform team successfully negotiated a simplified process for its tenants. It was able to do this because, by being resident on the platform, they could guarantee that security controls were in place and that policies were being followed. This was a direct result of the platform building features to meet those needs, and being able to provide evidence that they were being used — removing the need for the tenant team to either document or invent this themselves. This is still simplifying the developer experience through platform engineering, even though the solution is a less technically-based one.</span></p> <p><span style="vertical-align: baseline;">Sometimes the conversation resulted in feedback that wasn’t even platform-shaped — for example, helping teams understand concepts like feature flagging and dark launching, or software design options to help break dependencies with legacy systems. John Lewis’ platform teams are staffed with experienced engineers, ideally ones with software development experience, which helps a lot with these sorts of interactions.</span></p> <p><span style="vertical-align: baseline;">A key point here is that by measuring how effectively teams were making it into production, we could identify who to talk to and elucidate the feedback we needed on what problems needed to be addressed. Simply relying on your tenants thinking of this themselves when they don’t see the bigger picture (or have other priorities) is not nearly as effective.</span></p> <p><span style="vertical-align: baseline;">We then combined the process with more traditional approaches such as sending out a survey or use of Net Promoter Scoring to help build popularity in the product. The results of these were usually very positive, and could be used to generate mindshare — especially where a product team was comfortable talking about their positive experiences in internal tech conferences and the like.</span></p> <h3><strong style="vertical-align: baseline;">Helping understand team performance</strong></h3> <p><span style="vertical-align: baseline;">A few years into the life of the platform, our emphasis started to shift. There was less of a need to prove the value of the platform — the business and our engineers were happy — so we shifted from “how can we get you into production as quickly as possible” towards “how can we enable you to continue to be as fast, but also reduce friction, in your day-to-day activities.” This led us towards DORA metrics.</span></p> <p><span style="vertical-align: baseline;">Our initial DORA implementations involved mining information from our systems of record for change and incident, complimented by our already-mature observability stack for availability data, as well as pulling events from things like cloud audit logs. We built software to do this and stored it in BigQuery, which enabled us to visualize the data in our home-grown Service Catalogue tool. Later, we moved this into Grafana dashboards instead, which are still in use today:</span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/3_N8Q4Xha.max-1000x1000.png" alt="3"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">Looking for patterns in this data led to us discovering additional features that would be useful for us to build. Two major examples of this were </span><span style="font-style: italic; vertical-align: baseline;">handling change</span><span style="vertical-align: baseline;">, and </span><span style="font-style: italic; vertical-align: baseline;">operational readiness</span><span style="vertical-align: baseline;">.</span></p> <p><span style="vertical-align: baseline;">JLP’s service management processes were geared towards handling complex release processes across multiple large systems and/or teams - but we had fundamentally changed our architecture by adopting microservices. This empowered teams to release independently at will, and therefore manage the consequences of failed changes themselves. We used the data we’d collected about change failure rates and frequency of small releases to justify a different approach: allowing tenants to automatically raise and close changes as part of their CI/CD pipelines. After clearing this approach with our Service Management team, we developed a CLI tool that teams could use within their pipelines. This had the additional benefit of allowing us to capture useful data at point of release, rather than scraping more awkward data sources. The automated change “carrot” was very popular and was widely adopted, shifting the approval point left to the pull request rather than later in the release process. This reduced time wastage, change-set size and risk of collisions.</span></p> <p><span style="vertical-align: baseline;">In a similar vein, with more teams operating their own services, the need for a central site-wide operations team was reduced. We could see from our metrics that teams practicing “You Build It, You Run It” had fewer incidents and were resolving them much more quickly. We used this as evidence to bring in tooling to help them respond to incidents faster, and decouple the centralized ops teams from those processes — in some cases allowing them to focus on legacy systems, and in others, removing the need for the service entirely (which resulted in significant cost savings, despite the fact that we had more individual product teams on-call). This, and supporting observability and alerting tooling, was all configured through the platform’s paved-road pipeline described in our </span><a href="https://cloud.google.com/blog/products/application-development/simplifying-platform-engineering-at-john-lewis-part-one"><span style="text-decoration: underline; vertical-align: baseline;">previous article</span></a><span style="vertical-align: baseline;">.</span></p> <p><span style="vertical-align: baseline;">The DORA metrics helped us architecturally as well. Operational data shined a light on the brittleness of third-party and legacy services, thereby driving greater investment into resilience engineering, alternative solutions, and in some cases, causing us to re-evaluate our build vs. buy decisions. </span></p> <h3><strong style="vertical-align: baseline;">Choosing what to measure</strong></h3> <p><span style="vertical-align: baseline;">It’s very important to choose wisely about what to measure. Experts in the field (such as </span><a href="https://www.youtube.com/watch?v=trO_fiTAZeM" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">Laura Tacho</span></a><span style="vertical-align: baseline;">) influenced us to avoid vanity metrics and to be cautious about interpreting the ones we do collect. It’s also important for metrics to be meaningful to the target audience, and presented accordingly.</span></p> <p><span style="vertical-align: baseline;">As an example, we communicate about cost and vulnerability with our teams, but the form this takes depends on the intended audience’s role. For example, we send new vulnerabilities or spikes in cost directly to product teams’ collaboration channels, because experience has taught us that having our engineers see these vulnerabilities results in a faster response. On the other hand, for compliance reporting or review by team leads, reports are more effective at summarising the areas that need action. Because if we know one thing, it’s that nobody wants to be a leader of the “vulnerabilities outside of policy” dashboard!</span></p> <p><span style="vertical-align: baseline;">It was not unusual for us to historically look at measures such as the number or frequency of incidents. But in a world of highly automated response systems, this is a trap, as alerts can be easily duplicated. Focusing too much on a number can drive the wrong behavior — at worst, deliberately avoiding creating an incident at all! Instead, it’s much better to focus on the impact of the parent incident and how long it took to recover. Another example is reporting on the number of vulnerabilities. Imagine you have a package that is used extensively across many components in a distributed system. Disclosing that the package has a vulnerability can create a false sense of scale, when in fact patching the base image deals with the problem swiftly. Instead, it’s better to look at the speed of response than a pre-agreed policy based on severity. This is both a much more effective and reasonable metric for teams to act on, so we see better engagement.</span></p> <p><span style="vertical-align: baseline;">It’s very important that you put across as much context as possible when presenting the data so that the right conclusions can be drawn — especially where those reports are seen by decision-makers. With that in mind, we combined raw metrics we could visualize with user opinion about them. This helped to bring that missing context: Is the team that’s suffering from a high change failure rate also struggling with its release processes and batch size? Is the team that’s not addressing vulnerabilities quickly also reporting that they’re spending too much time on feature development and not enough on operational matters? We reached for a different tool — </span><a href="https://getdx.com/" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">DX</span></a><span style="vertical-align: baseline;"> — to help us bring this sort of information to bear. In our </span><a href="https://cloud.google.com/blog/products/application-development/how-john-lewis-partnership-chose-its-monitoring-metrics"><span style="text-decoration: underline; vertical-align: baseline;">follow-up article</span></a><span style="vertical-align: baseline;">, we’ll elaborate on how we did this and how it prompted us to expand the data we collected about our tenants. Stay tuned!</span></p> <p><span style="font-style: italic; vertical-align: baseline;">To learn more about shifting down with platform engineering on Google Cloud, start </span><a href="https://cloud.google.com/solutions/platform-engineering"><span style="font-style: italic; text-decoration: underline; vertical-align: baseline;">here</span></a><span style="font-style: italic; vertical-align: baseline;">.</span></p></div>
  51. Principal Platform Engineer, John Lewis Partnership

    Wed, 04 Feb 2026 18:00:00 -0000

    <div class="block-paragraph_advanced"><p><span style="font-style: italic; vertical-align: baseline;">In </span><a href="https://cloud.google.com/blog/products/application-development/at-john-lewis-partnership-measuring-developer-platform-value"><span style="font-style: italic; text-decoration: underline; vertical-align: baseline;">part one</span></a><span style="font-style: italic; vertical-align: baseline;"> of this article, Alex Moss from the John Lewis Partnership covered the metrics that they use to measure the value of their developer platform. Now, let's talk about a crucial aspect of any measurement strategy: choosing the right things to measure. It's easy to get lost in a sea of data or to focus on metrics that look impressive, but don't actually reflect the health of your platform or the experience of your developers. Here, Alex shares the John Lewis philosophy on how to choose meaningful metrics and present them in a way that drives the right conversations and actions, ensuring that the data is always presented with as much context as possible. - Darren Evans</span></p> <p><span style="vertical-align: baseline;">While the solution we detailed in the first half of this article worked very well, relying solely on objective measures comes with a number of traps. They are very easy to misinterpret: either wasting time (“the team is working on another product at the moment”) or not telling the right story (“the incident wasn’t closed properly”). This leads to a scaling challenge: Chatting with a small number of teams to understand a situation is one thing. But when you are only one small team trying to build a product, and you need to talk across several dozen teams, it’s not so easy.</span></p> <h3><strong style="vertical-align: baseline;">Collecting engineers’ subjective feedback</strong></h3> <p><span style="vertical-align: baseline;">We needed a way to collate more subjective feedback, ideally in a form that we could visualize and contrast to the objective DORA and other service metrics we held.</span></p> <p><span style="vertical-align: baseline;">Our initial attempt at this involved creating Service Operability Assessments — questionnaires that tenants fill in every quarter. Service Operability Assessments are intended to hold a series of thought-provoking questions aimed at whether the team is following good practices for running their service. This worked well with an experienced facilitator (usually a senior platform engineer) who could ask further probing questions and pull out the key feedback and actions. But as you might imagine, this suffered from scaling challenges. We eventually let this be handled entirely self-service — an imperfect system, since many teams are quite happy to just copy/paste their answers from the previous quarter, which may or may not reflect reality!</span></p> <p><span style="vertical-align: baseline;">We then learned about a tool called </span><a href="https://getdx.com/" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">the DX platform</span></a><span style="vertical-align: baseline;">, which significantly changed how we approached this, and which is now used across our entire Engineering community. It works by surveying individual engineers (rather than teams) for a few minutes every three months. The questions are curated based on DX’s research, backed by the founders of DORA and other similar frameworks. We’ve found it very helpful to be able to slice the results in different ways, including looking at areas across whole platforms or deep-diving on particular teams. The latter, in combination with our DORA data, makes for rich conversations. For example, in the DX tool, a team which recently suffered through some highly impactful incidents might also have registered concerns on “Production Debugging,” while another team that saw a marked drop in release frequency flagged worries around “Change Confidence” or “Ease of Release.” The platforms team can at this point step in to offer advice or potentially implement new features to help with the issues the teams are seeing.</span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/1_J4WNCsj.max-1000x1000.png" alt="1"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">The pre-built drivers and reports in DX are tremendously useful, but we also augment it with our own custom queries to help us understand areas of current focus. For example, we measure Customer Satisfaction (CSAT) for the platform and its portal (Backstage), and collect data on how long it takes for a newcomer to begin submitting pull requests and ask them about how they found the onboarding process. We also recently started assessing engineers’ opinions on the effectiveness of AI coding assistants to help justify further investment in them (instead of just relying on market insight).</span></p> <p><span style="vertical-align: baseline;">An example of where this helped focus our efforts was with documentation, namely, building capabilities into our Backstage developer portal to make it easier for teams to view each others’ docs through pipelines that automatically publish content and make it discoverable.</span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/2_gf9lDAw.max-1000x1000.png" alt="2"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><h3><strong style="vertical-align: baseline;">Service health - Feature adoption &amp; beyond</strong></h3> <p><span style="vertical-align: baseline;">Outside of the insights we generate from the likes of DORA and DX, we’ve recently begun questioning not only whether the platform itself is valuable, but whether tenants are </span><span style="font-style: italic; vertical-align: baseline;">getting the value they should</span><span style="vertical-align: baseline;"> from it. In other words, we’ve effectively started to measure platform feature adoption.</span></p> <p><span style="vertical-align: baseline;">To do this, we built out what we refer to internally as our Technical Health feature. It takes the form of a custom plugin that integrates with our Backstage Developer Portal, which then queries an in-house API that surfaces data fed from a large number of small jobs that collect information on the things we want to measure. These jobs are independently releasable themselves, which allowed us to scale this up pretty quickly. </span></p> <p><span style="vertical-align: baseline;">We currently capture four categories of health measures:</span></p> <ol> <li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">Technical health: </strong><span style="vertical-align: baseline;">We currently have 17 “technical” measures. Examples here include measuring whether teams are using our paved road pipeline and custom Microservice CRD (see previous articles </span><a href="https://cloud.google.com/blog/products/application-development/simplifying-platform-engineering-at-john-lewis-part-one"><span style="text-decoration: underline; vertical-align: baseline;">1</span></a><span style="vertical-align: baseline;"> and </span><a href="https://cloud.google.com/blog/products/application-development/simplifying-platform-engineering-at-john-lewis-part-two"><span style="text-decoration: underline; vertical-align: baseline;">2</span></a><span style="vertical-align: baseline;">) rather than “terraforming” their own resources, following our recommended Kubernetes practices (such as resource sizing, disruption budgets and lifecycle probes), keeping base images up to date, and the like. We also include some “softer” technical measures such as whether they are running pipelines frequently enough to pick up changes (we don’t run this for teams), reviewing their operability assessments, staying on top of git branches, and so on.</span></p> </li> <li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">Operational readiness:</strong><span style="vertical-align: baseline;"> Then, there are 18 measures relating to operational health — things like whether a pre-flight configuration is in place, whether runbooks are written, docs have been published, and so on. This is an evolution of an Operational Readiness checklist from several years ago (back when we used to have separate Delivery and Operations teams, and therefore these sorts of checks were mandatory for “handover”). We tailored this checklist to the specific features of the platform that help teams achieve good operability, rather than being a generic list. This also serves to help our Service Management team feel confident that the right practices are being followed, thereby eliminating a point of friction when carrying out manual reviews.</span></p> </li> <li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">Migrations: </strong><span style="vertical-align: baseline;">From time to time, the Platform requires tenants to carry out work to keep up with changes to the platform itself. A classic example of this is getting teams to deal with deprecated Kubernetes API versions. This also includes adoption of different features that we want to drive more forcefully in order to remove the older way of doing things (say for example, in favour of something more secure). We found that as the Platform grew, we had a long tail of migration work that we needed teams to perform, providing an easy way for Product Managers and Delivery Leads to prioritize their teams’ workloads.</span></p> </li> <li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">Broader engineering practices: </strong><span style="vertical-align: baseline;">We recently opened up the feature to allow other teams to contribute — in this case, our Engineering leadership — to build in their own measures, such as whether teams are keeping up to date with versions of our design system or whether they’re following broader engineering practices that extend beyond just the JL Digital Platform. </span></p> </li> </ol> <p><span style="vertical-align: baseline;">We present this data through aggregated views (like the example shown below), as well as individual tasks and broader leaderboards — all designed to catch the eye of those with influence over a team’s priorities. We’ve found that the desire for an engineer to turn a traffic-light green can be a powerful motivator — far more effective than relying on documentation or announcements.</span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/3_paqGoLi.max-1000x1000.png" alt="3"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">This technology works through custom plugins that we’ve built for the Backstage Portal. Each “health check” is itself its own microservice (often running as a job) which interrogates the appropriate system to determine whether the measure is met. For example, one microservice checks that a PodDisruptionBudget has been created by querying Kubernetes directly, while another that looks at whether distroless base images are in use, does so by inspecting container image layers. There’s a template for creating new metrics, which makes it easy for engineers to create new ones — including those outside the platform team themselves. The results are stored in BigQuery, with an API to make Backstage plugin development simpler.</span></p> <p><span style="vertical-align: baseline;">A reality of introducing measures like this is that it drives more work into the product teams. It is important that your culture be ready for this. If we had implemented these measures very early in the platform’s life, this would likely have affected how the product was perceived — perhaps as very strict or inhibiting the pace of change with guardrails. This can negatively impact overall adoption. By introducing these later on, we benefited from many tenants who already saw the platform as very valuable, as well as the confidence that we had selected the right measures and could apply them consistently. That said, we did still see a small drop in CSAT for the platform after we started doing this. We try to be considerate about the pace that we launch each measure to give product teams the time to absorb the work, as well as provide a means for teams to suppress the indicators that aren’t relevant to them. For example, a tenant might deliberately choose not to use pod autoscaling for performance reasons, or have a functional reason why they can’t use our Microservice CRD.</span></p> <p><span style="vertical-align: baseline;">The introduction of these sorts of assurance measures on tenant behaviour is a reflection of the maturity of the platform. In the early days, we relied on highly skilled teams to do the right thing whilst going fast. But as time has passed, we’ve witnessed a variety of skills and capabilities, combined with shifts in ownership of services, that pushed us to introduce techniques to drive the right outcomes. This is also due to the platform itself becoming complex — the cognitive load for a new team is much higher than it was, due to all its new features. We needed to put some lights along the edges of our paved road to help teams stay on it!</span></p> <p><span style="vertical-align: baseline;">Throughout this evolution, we’ve continued to report on our key results for the business themselves: Are we still doing what they want of us? This has naturally shifted from “go fast, enable teams” (which we largely see as a solved problem, to be honest) towards “do it safely, and manage your technical debt.”</span></p> <h3><strong style="vertical-align: baseline;">Are you being served? Key takeaways</strong></h3> <p><span style="vertical-align: baseline;">Long story short, the question of whether a developer platform has value is complex, and can be answered in many ways. As you embark on building out — and quantifying — your own developer platform, here are a few concluding thoughts to keep in mind:  </span></p> <ol> <li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">Measurement is a journey, not a destination:</strong><span style="vertical-align: baseline;"> Start by measuring something meaningful to your stakeholders, but be prepared to adapt as your platform evolves. In the beginning, it’s okay to prioritize further investment in your product, but it’s better to actually measure how the platform is enabling your teams. The things that mattered when you were initially proving out the platform’s viability are unlikely to be what are important several years later when your features are more mature and your priorities have shifted.</span></p> </li> <li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">Listen to the humans: </strong><span style="vertical-align: baseline;">Don’t assume that just because your platform is being used, that it is providing value. The most powerful metrics are often qualitative; engineers wanting to use your tool and CSAT are strong signals, but asking them questions about how they are using it is a better way to gain insight into how you can improve it. It is hard to figure out what’s working (and what isn’t) through measurement alone.</span></p> </li> <li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">Data is for enabling, not just reporting:</strong><span style="vertical-align: baseline;"> Use your insights to help teams improve, not just to show graphs to leadership. Further, be transparent about what specific data led you to act. For example, when you see a dip in release frequency for a specific team, use that data to start a conversation about potential roadblocks rather than simply flagging it as a problem. By doing this, you build the trust and goodwill with both leadership and your tenants to keep moving the platform forward. </span></p> </li> </ol> <hr/> <p><sub><span style="font-style: italic; vertical-align: baseline;">The evolution of the John Lewis Partnership’s measurement strategy serves as a compelling case study. By transitioning from basic lead-time tracking to a holistic model — blending DORA metrics with qualitative developer feedback — they demonstrated that true platform success is defined by the genuine value it delivers, not merely by adoption rates.</span></sub></p> <p><sub><span style="font-style: italic; vertical-align: baseline;">To learn more about platform engineering on Google Cloud, check out some of our other articles: Using Platform Engineering to simplify the developer experience - </span><a href="https://cloud.google.com/blog/products/application-development/simplifying-platform-engineering-at-john-lewis-part-one"><span style="font-style: italic; text-decoration: underline; vertical-align: baseline;">part one</span></a><span style="font-style: italic; vertical-align: baseline;">, </span><a href="https://cloud.google.com/blog/products/application-development/simplifying-platform-engineering-at-john-lewis-part-two"><span style="font-style: italic; text-decoration: underline; vertical-align: baseline;">part two</span></a><span style="font-style: italic; vertical-align: baseline;">, </span><a href="https://cloud.google.com/blog/products/application-development/common-myths-about-platform-engineering"><span style="font-style: italic; text-decoration: underline; vertical-align: baseline;">5 myths about platform engineering: what it is and what it isn’t</span></a><span style="font-style: italic; vertical-align: baseline;"> and</span><span style="font-style: italic; vertical-align: baseline;"> </span><a href="https://cloud.google.com/blog/products/application-development/another-five-myths-about-platform-engineering"><span style="font-style: italic; text-decoration: underline; vertical-align: baseline;">Another five myths about platform engineering</span></a><span style="font-style: italic; vertical-align: baseline;">. We also recommend reading about </span><a href="https://cloud.google.com/blog/products/application-development/introducing-app-hub"><span style="text-decoration: underline; vertical-align: baseline;">App Hub</span></a><span style="vertical-align: baseline;">, </span><span style="font-style: italic; vertical-align: baseline;">our foundational tool for managing application-centric governance across your organization.</span></sub></p></div>
  52. 10X Lead, delta Team, Google Cloud Consulting

    Thu, 08 Jan 2026 17:00:00 -0000

    <div class="block-paragraph_advanced"><p class="p1">FINRA, the Financial Industry Regulatory Authority, consistently seeks to achieve the highest standards in its technology practices. To elevate its software development lifecycle, FINRA — which oversees member broker-dealers — engaged Google consultants to help apply a metrics-driven methodology to its engineering practices.</p> <p class="p1"><a href="https://dora.dev/" rel="noopener" target="_blank">DORA</a> is a popular framework <span style="vertical-align: baseline;">for helping organization improve software delivery performance through capabilities that can be measured by key metrics. These include </span>deployment frequency, change lead time, change failure rate, failed deployment recovery time, and rework.</p> <p class="p1">While FINRA had begun laying the groundwork to adopt DORA internally, the organization recognized an opportunity to accelerate implementation by tapping Google's firsthand experience.</p> <p class="p1">Google conducted a discovery effort alongside technology leaders to identify opportunities for improvement. The recommendation that followed included increasing the existing focus on continuous improvement, adopting a user-centric approach to developing software and further enabling a generative culture within the department.</p> <p class="p1">The implementation itself was deliberately flexible. Rather than recommending a one-size-fits-all approach, Google helped FINRA tailor its actions to individual team objectives. Teams prioritizing product value concentrated on lead time and deployment frequency metrics, while teams focused on stability concentrated on change failure rates and<span style="vertical-align: baseline;"> failed deployment recovery time</span>.</p> <p class="p1">Over the first year of implementation, engineering teams demonstrated continuous improvement across DORA capabilities, achieving a 9% per-developer productivity gain and reporting directionally positive developer experience feedback.</p> <p class="p1">Sprint velocities also improved by 5%, enabling smaller engineering teams to deliver greater incremental product value to the business. Beyond raw metrics, teams also reported heightened transparency around delivery performance and appreciation for a standardized methodology.</p> <p class="p1">Looking ahead, FINRA is maturing its DORA practice by providing more granular metrics tied to high-level DORA measurements, increasing emphasis on developer experience and correlating product metrics with software delivery performance indicators.</p> <p class="p1"><em>Want to discover what AI can do for governments, nonprofits, and other public sector organizations? Register to attend our upcoming <a href="https://cloudonair.withgoogle.com/events/gemini-for-government-your-front-door-for-mission-ai" rel="noopener" target="_blank">Gemini for Government webinar on February 5</a>, where we will dive deeper into the transformative technology powering the next wave of innovation across the public sector.</em></p></div>
  53. Senior Product Marketing Manager

    Tue, 09 Dec 2025 17:00:00 -0000

    <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">The </span><a href="https://cloud.google.com/blog/products/ai-machine-learning/announcing-the-2025-dora-report"><span style="text-decoration: underline; vertical-align: baseline;">2025 State of AI-assisted Software Development report</span></a><span style="vertical-align: baseline;"> revealed a critical truth: AI is an amplifier. It magnifies the strengths of high-performing organizations and the dysfunctions of struggling ones.</span></p> <p><span style="vertical-align: baseline;">While AI adoption is now near-universal, with 90% of developers using it in their daily workflows, success is not guaranteed. Our cluster analysis of nearly 5,000 technology professionals reveals significant variation in team performance: Not everyone experiences the same outcomes from adopting AI. </span></p> <p><span style="vertical-align: baseline;">From this disparity, we can conclude that how they are using AI is a critical factor. We wanted to understand the particular capabilities and conditions that enable teams to achieve positive outcomes, leading us to develop the </span><a href="https://cloud.google.com/resources/content/2025-dora-ai-capabilities-model-report"><span style="text-decoration: underline; vertical-align: baseline;">DORA AI Capabilities Model report</span></a><span style="vertical-align: baseline;">. </span></p> <p><span style="vertical-align: baseline;">This companion guide to the 2025 DORA Report is designed to help you navigate our new reality. It provides actionable strategies, implementation tactics, and measurement frameworks to help technology leaders build an environment where AI thrives.</span></p> <h3><strong style="vertical-align: baseline;">Seven capabilities that amplify success</strong></h3> <p><span style="vertical-align: baseline;">Successfully using AI requires cultivating your technical and cultural environment. From the same set of respondents who participated in the 2025 DORA survey, we identified seven foundational capabilities that are proven to amplify the positive impact of AI on organizational performance:</span></p> <ol> <li role="presentation"><strong style="vertical-align: baseline;">Clear and communicated AI stance</strong><span style="vertical-align: baseline;">: Ambiguity creates risk. A clear policy provides the psychological safety developers need to experiment effectively.</span></li> <li role="presentation"><strong style="vertical-align: baseline;">Healthy data ecosystems</strong><span style="vertical-align: baseline;">: AI is only as good as the data it learns from. Investing in high-quality, accessible, and unified internal data significantly amplifies AI's benefits.</span></li> <li role="presentation"><strong style="vertical-align: baseline;">AI-accessible internal data</strong><span style="vertical-align: baseline;">: This involves "context engineering," moving beyond simple prompts to securely connect AI tools to your internal documentation and codebases.</span></li> <li role="presentation"><strong style="vertical-align: baseline;">Strong version control practices</strong><span style="vertical-align: baseline;">: As AI increases the volume and velocity of code generation, version control becomes your critical safety net. Frequent commits and robust rollback capabilities are essential for maintaining stability in an AI-assisted world.</span></li> <li role="presentation"><strong style="vertical-align: baseline;">Working in small batches</strong><span style="vertical-align: baseline;">: AI can easily generate massive blocks of code, which are hard to review and test. Enforcing the discipline of small batches counteracts this risk, ensuring that speed translates to product performance rather than instability.</span></li> <li role="presentation"><strong style="vertical-align: baseline;">User-centric focus</strong><span style="vertical-align: baseline;">: Speed is irrelevant if you are moving in the wrong direction. Adopting AI tools can actually harm teams that lack a user-centric focus. Keeping user needs as your North Star is essential for guiding AI-assisted development.</span></li> <li><strong style="vertical-align: baseline;">Quality internal platforms</strong><span style="vertical-align: baseline;">: A platform provides the automated, secure "paved roads" that allow AI benefits to scale across the organization. It prevents individual productivity gains from being lost to downstream bottlenecks.</span></li> </ol></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/dora-ai-capabilities-model.max-1000x1000.jpg" alt="dora-ai-capabilities-model"> </a> <figcaption class="article-image__caption "><p data-block-key="y4u85">The DORA AI Capabilities Model shows which capabilities amplify the effect of AI adoption on</p><p data-block-key="7k909">specific outcomes</p></figcaption> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><h3><strong style="vertical-align: baseline;">Where to start: Assessing your team</strong></h3> <p><span style="vertical-align: baseline;">Every organization starts their AI journey differently. To help you prioritize, this report introduces seven distinct team archetypes derived from our cluster analysis. These profiles range from "harmonious high-achievers," who excel in both performance and well-being, to teams facing "foundational challenges" or those stuck in a "legacy bottleneck," where unstable systems undermine morale.</span></p> <p><span style="vertical-align: baseline;">Identifying the profile that best matches your team can help pinpoint the most impactful interventions. For example, a "high impact, low cadence" team might prioritize automation to improve stability, while a team "constrained by process" might focus on reducing friction through a better AI stance.</span></p> <h3><strong style="vertical-align: baseline;">Digging deeper with Value Stream Mapping</strong></h3> <p><span style="vertical-align: baseline;">Once you understand your team's profile, how do you direct your efforts? The report includes a step-by-step facilitation guide for running a Value Stream Mapping (VSM) exercise.</span></p> <p><span style="vertical-align: baseline;">VSM acts as an AI force multiplier. By visualizing your flow from idea to customer, you can identify where work waits and where friction exists. This ensures that the efficiency gains from AI aren't just creating local optimizations that pile up work downstream, but are instead channeled into solving system-level constraints.</span></p> <h3><strong style="vertical-align: baseline;">Get better at getting better</strong></h3> <p><span style="vertical-align: baseline;">AI adoption is an organizational transformation. The greatest returns come not from the tools themselves, but from investing in the foundational systems that enable them.</span></p> <ul> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><a href="https://cloud.google.com/resources/content/2025-dora-ai-capabilities-model-report"><span style="text-decoration: underline; vertical-align: baseline;">Download the full report</span></a></p> </li> <li><span style="vertical-align: baseline;">Join the </span><a href="https://dora.community/" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">DORA community</span></a></li> </ul></div>
  54. Practice Lead, SRE

    Mon, 08 Dec 2025 17:00:00 -0000

    <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">When was the last time you </span><span style="font-style: italic; vertical-align: baseline;">knew — </span><span style="vertical-align: baseline;">not just </span><span style="font-style: italic; vertical-align: baseline;">hoped</span><span style="vertical-align: baseline;"> — that your disaster recovery plan would work perfectly?</span></p> <p><span style="vertical-align: baseline;">For most of us, the answer is unclear. Sure, you may have a DR plan, a meticulously crafted document stored in a wiki or a shared drive, that gets dusted off for compliance audits or the occasional tabletop drill. You assume its procedures are correct, its contact lists are current, and its dependencies are fully mapped, and you certainly </span><span style="font-style: italic; vertical-align: baseline;">hope</span><span style="vertical-align: baseline;"> it works.</span></p> <p><span style="vertical-align: baseline;">But </span><a href="https://sre.google/prodverbs/?slide=10" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">hope is not a strategy</span></a><span style="vertical-align: baseline;">.</span></p> <p><span style="vertical-align: baseline;">Why wouldn’t it work? One problem is that systems are rarely static anymore. In a world where you deploy new microservices dozens of times per day, make constant configuration changes, and maintain an ever-growing web of third-party API dependencies, the DR plan you wrote last quarter is probably just as useful as one from 10 years ago. </span></p> <p><span style="vertical-align: baseline;">And if the failover does work, will it work well enough to meet the promises you've made to your customers (or board of directors or regulators)? When a key component fails, could you still even meet your target availability and latency targets, a.k.a., your Service Level Objectives (SLOs)?</span></p> <p><span style="vertical-align: baseline;">So, how do you close this gap between your current aspirational DR plan and a DR plan that you actually have confidence in? The answer isn't to write more documents or run more theatrical drills. The answer is to stop </span><span style="font-style: italic; vertical-align: baseline;">assuming</span><span style="vertical-align: baseline;"> and start </span><span style="font-style: italic; vertical-align: baseline;">proving</span><span style="vertical-align: baseline;">.</span></p> <p><span style="vertical-align: baseline;">This is where chaos engineering comes in. Unlike what the name might imply, chaos engineering isn’t a tool for recklessly breaking things. Instead, it’s a framework that provides data-driven confidence in your SLOs under stress. By running controlled experiments that simulate real-world disasters like a database failover or a regional outage, you can quantitatively measure the impact of those failures on your systems’ performance. Chaos engineering is how you transform your DR hypotheses into a proven method to ensure resilience. By validating your plan through experimentation, you create tangible evidence, verifying that your plan will safeguard your infrastructure and keep your promises to customers.</span></p> <h3><strong style="vertical-align: baseline;">Demystifying chaos engineering</strong></h3> <p><span style="vertical-align: baseline;">In a nutshell, chaos engineering is the practice of running controlled, scientific experiments to find weaknesses in your system before they cause a real outage. </span></p> <p><span style="vertical-align: baseline;">At its core, it’s about building confidence in your system’s resilience. The process starts with understanding your system's </span><strong style="vertical-align: baseline;">steady state</strong><span style="vertical-align: baseline;">, which is its normal, measurable, and healthy output. You can't know the true impact of a failure without first defining what "good" looks like. This understanding allows you to form a clear, testable </span><strong style="vertical-align: baseline;">hypothesis</strong><span style="vertical-align: baseline;">: a statement of belief that your system's steady state will persist even when a specific, turbulent condition is introduced.</span></p> <p><span style="vertical-align: baseline;">To test this hypothesis, you then execute a controlled </span><strong style="vertical-align: baseline;">action</strong><span style="vertical-align: baseline;">, which is a precise and targeted failure injected into the system. This isn't random mischief; it's a specific simulation of real-world failures, such as consuming all CPU on a host (</span><strong style="vertical-align: baseline;">resource exhaustion</strong><span style="vertical-align: baseline;">), adding network latency (</span><strong style="vertical-align: baseline;">network failure</strong><span style="vertical-align: baseline;">), or terminating a virtual machine (</span><strong style="vertical-align: baseline;">state failure</strong><span style="vertical-align: baseline;">). While this action is running, automated </span><strong style="vertical-align: baseline;">probes</strong><span style="vertical-align: baseline;"> act as your scientific instruments, continuously monitoring the system's state to measure the effect. </span></p> <p><span style="vertical-align: baseline;">Together, these components form a complete scientific loop: you use a </span><strong style="vertical-align: baseline;">hypothesis</strong><span style="vertical-align: baseline;"> to predict resilience, run an experiment by applying an </span><strong style="vertical-align: baseline;">action</strong><span style="vertical-align: baseline;"> to simulate adversity, and use </span><strong style="vertical-align: baseline;">probes</strong><span style="vertical-align: baseline;"> to measure the impact, turning uncertainty into hard data.</span></p> <h3><strong style="vertical-align: baseline;">Using chaos to validate disaster recovery plans</strong></h3> <p><span style="vertical-align: baseline;">Now that you understand the building blocks of a chaos experiment, you can build the bridge to your ultimate goal: transforming your DR plan from a document of hope into an evidence-based procedure. The key is to stop seeing your DR plan as a set of instructions and start seeing it for what it truly is: a collection of unproven hypotheses.</span></p> <p><span style="vertical-align: baseline;">When you think about it, every significant statement in your DR document is a claim waiting to be tested. When your plan states, </span><span style="font-style: italic; vertical-align: baseline;">"The database will failover to the replica in under 5 minutes,"</span><span style="vertical-align: baseline;"> that isn't a fact, it's a </span><strong style="vertical-align: baseline;">hypothesis</strong><span style="vertical-align: baseline;">. When it says, </span><span style="font-style: italic; vertical-align: baseline;">"In the event of a regional outage, traffic will be successfully rerouted to the secondary region,"</span><span style="vertical-align: baseline;"> that's another hypothesis. Your DR plan is filled with these critical assumptions about how your system </span><span style="font-style: italic; vertical-align: baseline;">should</span><span style="vertical-align: baseline;"> behave under duress. Until you test them, they remain nothing more than educated guesses.</span></p> <p><span style="vertical-align: baseline;">Chaos experiments are the ultimate validation tools, </span><strong style="vertical-align: baseline;">live-fire drills</strong><span style="vertical-align: baseline;"> that put your DR hypotheses to a real, empirical test. Instead of just talking through a scenario, you use controlled </span><strong style="vertical-align: baseline;">actions</strong><span style="vertical-align: baseline;"> to safely and precisely simulate the disaster. You're no longer asking "what if?"; you're actively measuring "what happens when."</span></p> <p><span style="vertical-align: baseline;">For example, imagine you have a DR plan for a regional outage. When you adopt chaos engineering, you break down that plan into a hypothesis and an experiment. For example:</span></p> <ul> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">The hypothesis:</strong><span style="vertical-align: baseline;"> "In case our primary region </span><code style="vertical-align: baseline;">us-central1</code><span style="vertical-align: baseline;"> becomes unreachable, the load balancers will failover all traffic to </span><code style="vertical-align: baseline;">us-east1</code><span style="vertical-align: baseline;"> within 3 minutes, with an error rate below 1%."</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">The chaos experiment:</strong><span style="vertical-align: baseline;"> Run an </span><strong style="vertical-align: baseline;">action</strong><span style="vertical-align: baseline;"> that simulates a regional outage by injecting a "blackhole" that drops all network traffic to and from </span><code style="vertical-align: baseline;">us-central1</code><span style="vertical-align: baseline;"> for a limited time. Your </span><strong style="vertical-align: baseline;">probes</strong><span style="vertical-align: baseline;"> then measure the actual failover time and error rates to validate the hypothesis.</span></p> </li> </ul> <p><span style="vertical-align: baseline;">In other words, by applying the chaos engineering methodology, you systematically move through your DR plan, turning each assumption into a proven fact. You're not just testing your plan; you're forging it in a controlled fire.</span></p> <h3><strong style="vertical-align: baseline;">Connecting chaos readiness to your SLOs</strong></h3> <p><span style="vertical-align: baseline;">Beyond simply proving system availability, chaos engineering builds trust in your reliability metrics, ensuring that you meet your SLOs even when services become unavailable. An SLO is a specific, acceptable target level of your service's performance measured over a specified period that reflects the user's experience. SLOs aren't just internal goals; they are the bedrock of customer trust and the foundation of your contractual service level agreements (SLAs).</span></p> <p><span style="vertical-align: baseline;">A traditional DR drill might get a "pass" because the backup system came online. But what if it took 20 minutes to fail over, during which every user saw errors? What if the backup region was under-provisioned, and performance became so slow that the service was unusable? From a technical perspective, you "recovered." But from a customer's perspective, you were down.</span></p> <p><span style="vertical-align: baseline;">A chaos experiment, however, can help you answer a critical question: </span><strong style="vertical-align: baseline;">"During a failover, did we still meet our SLOs?” </strong><span style="vertical-align: baseline;">Because your probes are constantly measuring performance against your SLOs, you get the full picture. You don't just see that the database failed over; you see that it took 7 minutes, during which your latency SLO was breached and your </span><a href="https://sre.google/sre-book/embracing-risk/#:~:text=Forming%20Your%20Error%20Budget,new%20releases%20can%20be%20pushed." rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">error budget</span></a><span style="vertical-align: baseline;"> was completely burned. This is the crucial, game-changing insight. It shifts the entire goal from simple disaster recovery to </span><strong style="vertical-align: baseline;">SLO preservation</strong><span style="vertical-align: baseline;">, which is what actually determines if a failure was a minor hiccup or a major business-impacting incident. It also provides the data necessary to set goals for system improvement. So the next time you run this experiment, you can measure if and how much your system resilience has improved, and ultimately if you can maintain your SLO during the disaster event.</span></p> <h3><strong style="vertical-align: baseline;">Build a culture of confidence</strong></h3> <p><span style="vertical-align: baseline;">The journey to resilience doesn't start by simulating a full regional failover. It starts with a single, small experiment. The goal is not to boil the ocean; it's to build momentum. Test one timeout, one retry mechanism, or one graceful error message.</span></p> <p><span style="vertical-align: baseline;">The biggest win from your first successful experiment won't be the technical data you gather. It will be the confidence you build. When your team sees that they can safely inject failure, learn from it, and improve the system, their entire relationship with failure changes. Fear is replaced by curiosity. That confidence is the catalyst for building a true, enduring culture of resilience. To learn more and get started with chaos engineering, check out </span><a href="https://cloud.google.com/blog/products/devops-sre/getting-started-with-chaos-engineering?e=48754805"><span style="text-decoration: underline; vertical-align: baseline;">this blog</span></a><span style="vertical-align: baseline;"> and </span><a href="https://sre.google/prodcast/#season3-episode12" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">this podcast</span></a><span style="vertical-align: baseline;">. And if you’re ready to get started, but unsure how, reach out to Google Cloud professional services to discuss how we can help.</span></p></div>
  55. Group Product Manager, Google Cloud

    Mon, 08 Dec 2025 17:00:00 -0000

    <div class="block-paragraph_advanced"><p style="text-align: justify;"><span style="vertical-align: baseline;">Earlier this year, we unveiled a big investment in platform and developer team productivity, with the launch of </span><a href="https://docs.cloud.google.com/application-design-center/docs/overview"><span style="text-decoration: underline; vertical-align: baseline;">Application Design Center</span></a><span style="vertical-align: baseline;">, </span><span style="vertical-align: baseline;">helping them streamline </span><span style="vertical-align: baseline;">the design and deployment of cloud application infrastructure, while ensuring applications are secure, reliable, and aligned with best practices</span><span style="vertical-align: baseline;">. And today, Application Design Center is generally available.</span></p> <p style="text-align: justify;"><span style="vertical-align: baseline;">We built Application Design Center to put applications at the center of your cloud experience, with a visual, canvas-style and AI-powered approach to design and modify Terraform-backed application templates. It also offers full lifecycle management that’s aligned with DevOps best practices across application design and deployment.</span></p> <p style="text-align: justify;"><span style="vertical-align: baseline;">Application Design Center is a core component of our </span><a href="https://docs.cloud.google.com/hub/docs/application-centric-google-cloud"><span style="text-decoration: underline; vertical-align: baseline;">application-centric cloud experience</span></a><span style="vertical-align: baseline;">. When you use Application Design Center to design and deploy your application infrastructure, your applications are easily discoverable, observable, and manageable. Application Design Center works in concert with </span><a href="https://cloud.google.com/app-hub/docs/overview"><span style="text-decoration: underline; vertical-align: baseline;">App Hub</span></a><span style="vertical-align: baseline;"> to automatically register application deployments, enabling a unified view and control plane for your application portfolio, and </span><a href="https://docs.cloud.google.com/hub/docs/overview"><span style="text-decoration: underline; vertical-align: baseline;">Cloud Hub</span></a><span style="vertical-align: baseline;">, to provide operational insights for your applications.</span></p> <p style="text-align: justify; padding-left: 40px;"><span style="font-style: italic; vertical-align: baseline;">“Google Application Design Center is a valuable enabler for Platform Engineering, providing a structured approach to harmonizing resource creation in Google Cloud Platform. By aligning tools, processes, and technologies, it streamlines workflows, reducing friction between development, operations, and other teams. This harmonization enhances collaboration, accelerates delivery, and ensures consistency across Google Cloud environments.”</span><span style="vertical-align: baseline;"> - </span><strong style="vertical-align: baseline;">Ervis Duraj, Principal Engineer,</strong><span style="vertical-align: baseline;"> </span><strong style="vertical-align: baseline;">MediaMarktSaturn Technology</strong></p> <h3><span style="vertical-align: baseline;">The gateway to an app-centric cloud</span></h3> <p style="text-align: justify;"><span style="vertical-align: baseline;">Our goal with Application Design Center is for you to innovate more, and administer less. It consists of </span><span style="vertical-align: baseline;">four key elements to help you minimize administrative overhead and maximize efficiency, so you can design and deploy applications with integrated best practices and essential guardrails. Let’s take a closer look.</span></p> <p role="presentation" style="text-align: justify;"><span style="vertical-align: baseline;">1. </span><strong style="vertical-align: baseline;">Terraform </strong><a href="https://docs.cloud.google.com/application-design-center/docs/supported-resources"><strong style="text-decoration: underline; vertical-align: baseline;">components</strong></a><strong style="vertical-align: baseline;"> and </strong><a href="https://docs.cloud.google.com/application-design-center/docs/design-application-templates"><strong style="text-decoration: underline; vertical-align: baseline;">application templates</strong></a><strong style="vertical-align: baseline;"> <br/></strong><span style="vertical-align: baseline;">Develop applications faster with our growing library of opinionated application templates. These provide well-architected patterns and pre-built components, including innovative "AI inference templates" to help you leverage AI to create dynamic and intelligent application foundations. As an example, at launch, Application Design Center provides opinionated templates for Google Kubernetes Engine (GKE) clusters (</span><a href="https://docs.cloud.google.com/application-design-center/docs/configure-gke-standard-cluster"><span style="text-decoration: underline; vertical-align: baseline;">Standard</span></a><span style="vertical-align: baseline;">, </span><a href="https://docs.cloud.google.com/application-design-center/docs/configure-gke-autopilot-cluster"><span style="text-decoration: underline; vertical-align: baseline;">Autopilot</span></a><span style="vertical-align: baseline;"> and </span><a href="https://docs.cloud.google.com/application-design-center/docs/configure-gke-node-pool"><span style="text-decoration: underline; vertical-align: baseline;">NodePool</span></a><span style="vertical-align: baseline;">) to run AI inference workloads using a variety of LLM models, as well as for enterprise-grade production clusters or single-region web app clusters. </span></p> <p><span style="vertical-align: baseline;">You can also </span><a href="https://docs.cloud.google.com/application-design-center/docs/import-components"><span style="text-decoration: underline; vertical-align: baseline;">ingest and manage your existing Terraform configurations</span></a><span style="vertical-align: baseline;"> (“Bring your own Terraform”) directly from Git repositories. Once imported, you can use Application Design Center to design with your own Terraform, or in combination with Google-provided Terraform, to create standardized, opinionated infrastructure patterns for sharing and reuse across your application teams.</span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/3-_Catalog_Share.gif" alt="3- Catalog Share"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><p role="presentation" style="text-align: justify;"><span style="vertical-align: baseline;">2. </span><strong style="vertical-align: baseline;">AI-powered design for rapid application designing and prototyping <br/></strong><span style="vertical-align: baseline;">Application Design Center integrates with Google's </span><a href="https://cloud.google.com/gemini/docs/cloud-assist/design-application"><span style="text-decoration: underline; vertical-align: baseline;">Gemini Cloud Assist Design Agent,</span></a><span style="vertical-align: baseline;"> empowering you to design actual, deployable application infrastructure application templates on Google Cloud that you can export as Terraform infrastructure-as-code. </span></p> <p style="text-align: justify;"><span style="vertical-align: baseline;">With Gemini Cloud Assist, you can describe your application design intents using natural language. In return, Gemini interactively generates multi-product application template suggestions, complete with visual architecture diagrams and summarized benefits. You can then refine these proposals through multi-turn reasoning or by directly manipulating the architecture within the Application Design Center canvas. </span></p> <p><span style="vertical-align: baseline;">Additionally, all designs that you create with Gemini are automatically observable, optimizable, and enabled for troubleshooting assistance during runtime, thanks to their tight integration with </span><a href="https://cloud.google.com/products/gemini/cloud-assist?hl=en"><span style="text-decoration: underline; vertical-align: baseline;">Gemini Cloud Assist</span></a><span style="vertical-align: baseline;">.</span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/1-Components_and_templates.gif" alt="1-Components and templates"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><p role="presentation" style="text-align: justify;"><span style="vertical-align: baseline;">3. </span><strong style="vertical-align: baseline;">A secure, sharable catalog of application templates with full lifecycle management<br/></strong><span style="vertical-align: baseline;">Platform admins can curate a collection of application templates built from Google's best-practice components. This provides developers a trusted, self-service experience from which they can quickly discover and deploy compliant applications. Tight integration with </span><a href="https://docs.cloud.google.com/hub/docs/overview"><span style="text-decoration: underline; vertical-align: baseline;">Cloud Hub</span></a><span style="vertical-align: baseline;"> transforms these governed templates into a live operational command center, complete with unified visibility into the health and deployment status of the resulting applications. This closes the critical loop between design and runtime, so that your production environments reflect your organization’s approved architectural standards.</span></p> <p><span style="vertical-align: baseline;">Also, Application Design Center’s robust </span><a href="https://docs.cloud.google.com/application-design-center/docs/manage-application-instances#create-application-revision"><span style="text-decoration: underline; vertical-align: baseline;">application template revisions</span></a><span style="vertical-align: baseline;"> serve as an immutable audit trail. It automatically detects and flags configuration drift between your intended designs and deployed applications, so that developers can remediate unauthorized changes or safely push approved configuration updates. This helps ensure continuous state consistency and compliance from Day 1 and through the subsequent evolution of your application.</span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/2-Design_Agent.gif" alt="2-Design Agent"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><p role="presentation" style="text-align: justify;"><span style="vertical-align: baseline;">4. </span><strong style="vertical-align: baseline;">GitOps integration automating developers’ day-to-day software design lifecycle tasks <br/></strong><span style="vertical-align: baseline;">By integrating Application Design Center into existing CI/CD workflows, platform teams empower developers to own the complete software delivery lifecycle right from their IDE. Developers can leverage compliant application </span><span style="font-style: italic; vertical-align: baseline;">and</span><span style="vertical-align: baseline;"> infrastructure (IaC) code using Application Design Center application templates. </span></p> <p style="text-align: justify;"><span style="vertical-align: baseline;">Further, every infrastructure decision made through Application Design Center is committed to code, versioned, and auditable. Specifically, developers can download the application IaC template from Application Design Center and import it into their app repos (the single source of truth), clone their repo, and edit the Terraform directly in their local IDEs. Any modifications go through a Git pull request for review. Once approved, this automatically triggers the existing CI/CD setup to build, test, and deploy both app and infra changes in lockstep. This unified approach minimizes friction, enforcing "golden paths" and providing an end-to-end automated pathway from a line of code in the IDE to a fully deployed change in production. </span></p> <h3 style="text-align: justify;"><span style="vertical-align: baseline;">What's new since preview</span></h3> <p style="text-align: justify;"><span style="vertical-align: baseline;">This GA launch is packed with features that users have been asking for. We’re excited to share powerful new capabilities: enterprise-grade governance and security with </span><a href="https://cloud.google.com/sdk/gcloud/reference/design-center"><span style="text-decoration: underline; vertical-align: baseline;">public APIs and gcloud CLI support</span></a><span style="vertical-align: baseline;">; </span><a href="https://docs.cloud.google.com/application-design-center/docs/set-up-secure-perimeter"><span style="text-decoration: underline; vertical-align: baseline;">full compatibility with VPC service controls</span></a><span style="vertical-align: baseline;">; </span><a href="https://docs.cloud.google.com/application-design-center/docs/import-components"><span style="text-decoration: underline; vertical-align: baseline;">bring your own Terraform</span></a><span style="vertical-align: baseline;"> and </span><a href="https://docs.cloud.google.com/application-design-center/docs/download-and-deploy#export_terraform_code"><span style="text-decoration: underline; vertical-align: baseline;">GitOps support</span></a><span style="vertical-align: baseline;"> for integration with your existing application patterns and automation pipelines; agentic application patterns using GKE templates (</span><a href="https://docs.cloud.google.com/application-design-center/docs/configure-gke-standard-cluster"><span style="text-decoration: underline; vertical-align: baseline;">Standard</span></a><span style="vertical-align: baseline;">, </span><a href="https://docs.cloud.google.com/application-design-center/docs/configure-gke-autopilot-cluster"><span style="text-decoration: underline; vertical-align: baseline;">Autopilot</span></a><span style="vertical-align: baseline;"> and </span><a href="https://docs.cloud.google.com/application-design-center/docs/configure-gke-node-pool"><span style="text-decoration: underline; vertical-align: baseline;">NodePool</span></a><span style="vertical-align: baseline;">); and finally, a simplified onboarding experience with </span><a href="https://docs.cloud.google.com/application-design-center/docs/setup"><span style="text-decoration: underline; vertical-align: baseline;">app-managed project support</span></a><span style="vertical-align: baseline;">, making Application Design Center an AI-powered engine for your applications on Google Cloud.</span></p> <h3 style="text-align: justify;"><span style="vertical-align: baseline;">Get started today</span></h3> <p style="text-align: justify;"><span style="vertical-align: baseline;">To help you get started, Google provides a growing library of curated Google application templates built by experts. These templates combine multiple Google Cloud products and best practices to serve common use cases, which you can configure for deployment, and view as infrastructure as code in-line. Platform teams can then create and securely share the catalogs and collaborate with teammates on designs and self-service deployment for developers. For enterprises with existing Terraform patterns and assets, Application Design Center interoperates by enabling their import and reuse within its native design and configuration experience.</span></p> <p><span style="vertical-align: baseline;">Ready to experience the power of </span><a href="https://docs.cloud.google.com/application-design-center/docs/setup"><span style="text-decoration: underline; vertical-align: baseline;">Application Design Center</span></a><span style="vertical-align: baseline;">? </span><span style="vertical-align: baseline;">You can learn more about ADC and get started building in minutes using the </span><a href="https://docs.cloud.google.com/application-design-center/docs/quickstart-create-template"><span style="text-decoration: underline; vertical-align: baseline;">quickstart</span></a><span style="vertical-align: baseline;">. </span><span style="vertical-align: baseline;">You can start building your first AI-powered application template in minutes, </span><a href="https://cloud.google.com/products/application-design-center/pricing"><span style="text-decoration: underline; vertical-align: baseline;">free of cost</span></a><span style="vertical-align: baseline;">, and quickly deploy applications with working code. For deeper insights, explore the comprehensive public documentation </span><a href="https://docs.cloud.google.com/application-design-center/docs/overview"><span style="text-decoration: underline; vertical-align: baseline;">here</span></a><span style="vertical-align: baseline;">. We can't wait to see how you innovate with the Application Design Center!</span></p></div>
  56. Senior Product Manager

    Wed, 03 Dec 2025 23:00:00 -0000

    <div class="block-paragraph_advanced"><p><strong style="font-style: italic; vertical-align: baseline;">Editor's note</strong><span style="font-style: italic; vertical-align: baseline;">: This blog was updated on Dec. 4, 5, 7, and 12, 2025, with additional guidance on Cloud Armor WAF rule syntax, and WAF enforcement across App Engine Standard, Cloud Functions, and Cloud Run.</span></p> <p><span style="vertical-align: baseline;">Earlier today, Meta and Vercel publicly disclosed two vulnerabilities that expose services built using the popular open-source frameworks </span><strong style="vertical-align: baseline;">React</strong><span style="vertical-align: baseline;"> </span><strong style="vertical-align: baseline;">Server Components</strong><span style="vertical-align: baseline;"> (</span><a href="https://www.cve.org/CVERecord?id=CVE-2025-55182" rel="noopener" target="_blank"><strong style="text-decoration: underline; vertical-align: baseline;">CVE-2025-55182</strong></a><span style="vertical-align: baseline;">) and </span><strong style="vertical-align: baseline;">Next.js </strong><span style="vertical-align: baseline;">to remote code execution risks when used for some server-side use cases. At Google Cloud, we understand the severity of these vulnerabilities, also known as </span><a href="https://react2shell.com/" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">React2Shell</span></a><span style="vertical-align: baseline;">, and our security teams have shared their recommendations to help our customers take immediate, decisive action to secure their applications.</span></p> <h3><span style="vertical-align: baseline;">Vulnerability background</span></h3> <p><span style="vertical-align: baseline;">The </span><strong style="vertical-align: baseline;">React Server Components framework</strong><span style="vertical-align: baseline;"> is commonly used for building user interfaces. On Dec. 3, 2025, </span><a href="http://cve.org" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">CVE.org</span></a><span style="vertical-align: baseline;"> assigned this vulnerability as </span><a href="https://www.cve.org/CVERecord?id=CVE-2025-55182" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">CVE-2025-55182</span></a><span style="vertical-align: baseline;">. The official Common Vulnerability Scoring System (CVSS) base severity score has been determined as Critical, a severity of 10.0. </span></p> <ul> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">Vulnerable versions</strong><span style="vertical-align: baseline;">: React 19.0, 19.1.0, 19.1.1, and 19.2.0</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">Patched</strong><span style="vertical-align: baseline;"> in React 19.2.1</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">Fix</strong><span style="vertical-align: baseline;">: </span><a href="https://github.com/facebook/react/commit/7dc903cd29dac55efb4424853fd0442fef3a8700" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">https://github.com/facebook/react/commit/7dc903cd29dac55efb4424853fd0442fef3a8700</span></a></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">Announcement</strong><span style="vertical-align: baseline;">: </span><a href="https://react.dev/blog/2025/12/03/critical-security-vulnerability-in-react-server-components" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">https://react.dev/blog/2025/12/03/critical-security-vulnerability-in-react-server-components</span></a></p> </li> </ul> <p><span style="vertical-align: baseline;">Next.js is a web development framework that depends on React, and is also commonly used for building user interfaces. (The Next.js vulnerability was referenced as </span><a href="https://www.cve.org/CVERecord?id=CVE-2025-66478" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">CVE-2025-66478</span></a><span style="vertical-align: baseline;"> before being marked as a duplicate.)</span></p> <ul> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">Vulnerable versions</strong><span style="vertical-align: baseline;">: Next.js 15.x, Next.js 16.x, Next.js 14.3.0-canary.77 and later canary releases</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">Patched</strong><span style="vertical-align: baseline;"> versions are listed </span><a href="https://nextjs.org/blog/CVE-2025-66478#required-action" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">here</span></a><span style="vertical-align: baseline;">.</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">Fix</strong><span style="vertical-align: baseline;">: </span><a href="https://github.com/vercel/next.js/commit/6ef90ef49fd32171150b6f81d14708aa54cd07b2" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">https://github.com/vercel/next.js/commit/6ef90ef49fd32171150b6f81d14708aa54cd07b2</span></a></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">Announcement</strong><span style="vertical-align: baseline;">: </span><a href="https://nextjs.org/blog/CVE-2025-66478" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">https://nextjs.org/blog/CVE-2025-66478</span></a></p> </li> </ul> <p><span style="vertical-align: baseline;">Google Threat Intelligence Group (GTIG) has also published a new report to help understand the </span><a href="https://cloud.google.com/blog/topics/threat-intelligence/threat-actors-exploit-react2shell-cve-2025-55182"><span style="text-decoration: underline; vertical-align: baseline;">specific threats exploiting React2Shell</span></a><span style="vertical-align: baseline;">.</span></p> <p><span style="vertical-align: baseline;">We strongly encourage organizations who manage environments relying on the React and Next.js frameworks to update to the latest version, and take the mitigation actions outlined below.</span></p> <h3><span style="vertical-align: baseline;">Mitigating CVE-2025-55182</span></h3> <p><span style="vertical-align: baseline;">We have created and rolled out a new </span><strong style="vertical-align: baseline;">Cloud Armor web application firewall (WAF) rule</strong><span style="vertical-align: baseline;"> designed to detect and block exploitation attempts related to CVE-2025-55182. This new rule is </span><strong style="vertical-align: baseline;">available now</strong><span style="vertical-align: baseline;"> and is intended to help protect your internet-facing applications and services that use global or regional Application Load Balancers. We recommend deploying this rule as a temporary mitigation while your vulnerability management program patches and verifies all vulnerable instances in your environment.</span></p> <p><span style="vertical-align: baseline;">For customers using </span><a href="https://cloud.google.com/appengine/"><strong style="text-decoration: underline; vertical-align: baseline;">App Engine Standard</strong></a><span style="vertical-align: baseline;">, </span><a href="https://cloud.google.com/functions/"><strong style="text-decoration: underline; vertical-align: baseline;">Cloud Functions</strong></a><span style="vertical-align: baseline;">, </span><a href="https://cloud.google.com/run/"><strong style="text-decoration: underline; vertical-align: baseline;">Cloud Run</strong></a><span style="vertical-align: baseline;">, </span><a href="https://firebase.google.com/products/hosting" rel="noopener" target="_blank"><strong style="text-decoration: underline; vertical-align: baseline;">Firebase Hosting</strong></a><span style="vertical-align: baseline;"> or </span><a href="https://firebase.google.com/products/app-hosting" rel="noopener" target="_blank"><strong style="text-decoration: underline; vertical-align: baseline;">Firebase App Hosting</strong></a><span style="vertical-align: baseline;">, we provide an additional layer of defense for serverless workloads by automatically enforcing platform-level WAF rules that can detect and block the most common exploitation attempts related to CVE-2025-55182.</span></p> <p><span style="vertical-align: baseline;">For </span><a href="https://support.projectshield.google/s/article/Protecting-Your-Website-From-Known-Vulnerabilities" rel="noopener" target="_blank"><strong style="text-decoration: underline; vertical-align: baseline;">Project Shield</strong></a><span style="vertical-align: baseline;"> users, we have deployed WAF protections for all sites and no action is necessary to enable these WAF rules. For long-term mitigation, you will need to patch your origin servers as an essential step to eliminate the vulnerability (see additional guidance below).</span></p> <p><span style="vertical-align: baseline;">Cloud Armor and the Application Load Balancer can be used to deliver and protect your applications and services regardless of whether they are deployed on Google Cloud, on-premises, or on another infrastructure provider. If you are not yet using Cloud Armor and the Application Load Balancer, please follow the guidance further down to get started.</span></p> <p><span style="vertical-align: baseline;">While these platform-level rules and the optional Cloud Armor WAF rules (for services behind an Application Load Balancer) help mitigate the risk from exploits of the CVE, we continue to strongly recommend updating your application dependencies as the primary long-term mitigation.</span></p> <h3><span style="vertical-align: baseline;">Deploying the cve-canary WAF rule for Cloud Armor</span></h3> <p><span style="vertical-align: baseline;">To configure Cloud Armor to detect and protect from CVE-2025-55182, you can use the </span><a href="https://docs.cloud.google.com/armor/docs/waf-rules#cves_and_other_vulnerabilities"><code style="text-decoration: underline; vertical-align: baseline;">cve-canary</code><span style="text-decoration: underline; vertical-align: baseline;"> preconfigured WAF rule</span></a><span style="vertical-align: baseline;"> leveraging the new ruleID that we have added for this vulnerability. This rule is opt-in only, and must be added to your policy even if you are already using the cve-canary rules.</span></p> <p><span style="vertical-align: baseline;">In your Cloud Armor backend security policy, create a new rule and configure the following match condition:</span></p></div> <div class="block-code"><dl> <dt>code_block</dt> <dd>&lt;ListValue: [StructValue([(&#x27;code&#x27;, &quot;(has(request.headers[&#x27;next-action&#x27;]) || has(request.headers[&#x27;rsc-action-id&#x27;]) || request.headers[&#x27;content-type&#x27;].contains(&#x27;multipart/form-data&#x27;) || request.headers[&#x27;content-type&#x27;].contains(&#x27;application/x-www-form-urlencoded&#x27;)) &amp;&amp; evaluatePreconfiguredWaf(&#x27;cve-canary&#x27;,{&#x27;sensitivity&#x27;: 0, &#x27;opt_in_rule_ids&#x27;: [&#x27;google-mrs-v202512-id000001-rce&#x27;,&#x27;google-mrs-v202512-id000002-rce&#x27;]})&quot;), (&#x27;language&#x27;, &#x27;&#x27;), (&#x27;caption&#x27;, &lt;wagtail.rich_text.RichText object at 0x7f17a92ee790&gt;)])]&gt;</dd> </dl></div> <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">This can be accomplished from the Google Cloud console by navigating to Cloud Armor and modifying an existing or creating a new policy.</span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--medium h-c-grid__col h-c-grid__col--4 h-c-grid__col--offset-4 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/20251205_11am_rule_1.max-1000x1000.png" alt="20251205_11am_rule (1)"> </a> <figcaption class="article-image__caption "><p data-block-key="5admg">Cloud Armor rule creation in the Google Cloud console.</p></figcaption> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><p>Alternatively, the gcloud CLI can be used to create or modify a policy with the requisite rule:</p></div> <div class="block-code"><dl> <dt>code_block</dt> <dd>&lt;ListValue: [StructValue([(&#x27;code&#x27;, &#x27;gcloud compute security-policies rules create PRIORITY_NUMBER \\\r\n --security-policy SECURITY_POLICY_NAME \\\r\n --expression &quot;(has(request.headers[\&#x27;next-action\&#x27;]) || has(request.headers[\&#x27;rsc-action-id\&#x27;]) || request.headers[\&#x27;content-type\&#x27;].contains(\&#x27;multipart/form-data\&#x27;) || request.headers[\&#x27;content-type\&#x27;].contains(\&#x27;application/x-www-form-urlencoded\&#x27;)) &amp;&amp; evaluatePreconfiguredWaf(\&#x27;cve-canary\&#x27;,{\&#x27;sensitivity\&#x27;: 0, \&#x27;opt_in_rule_ids\&#x27;: [\&#x27;google-mrs-v202512-id000001-rce\&#x27;,\&#x27;google-mrs-v202512-id000002-rce\&#x27;]})&quot; \\\r\n --action=deny-403&#x27;), (&#x27;language&#x27;, &#x27;&#x27;), (&#x27;caption&#x27;, &lt;wagtail.rich_text.RichText object at 0x7f17a92eeac0&gt;)])]&gt;</dd> </dl></div> <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">Additionally, if you are managing your rules with Terraform, you may implement the rule via the following syntax:</span></p></div> <div class="block-code"><dl> <dt>code_block</dt> <dd>&lt;ListValue: [StructValue([(&#x27;code&#x27;, &#x27;rule {\r\n action = &quot;deny(403)&quot;\r\n priority = &quot;PRIORITY_NUMBER&quot;\r\n match {\r\n expr {\r\n expression = &quot;(has(request.headers[\&#x27;next-action\&#x27;]) || has(request.headers[\&#x27;rsc-action-id\&#x27;]) || request.headers[\&#x27;content-type\&#x27;].contains(\&#x27;multipart/form-data\&#x27;) || request.headers[\&#x27;content-type\&#x27;].contains(\&#x27;application/x-www-form-urlencoded\&#x27;)) &amp;&amp; evaluatePreconfiguredWaf(\&#x27;cve-canary\&#x27;,{\&#x27;sensitivity\&#x27;: 0, \&#x27;opt_in_rule_ids\&#x27;: [\&#x27;google-mrs-v202512-id000001-rce\&#x27;,\&#x27;google-mrs-v202512-id000002-rce\&#x27;]})&quot;\r\n }\r\n }\r\n description = &quot;Applies protection for CVE-2025-55182 (React/Next.JS)&quot;\r\n }&#x27;), (&#x27;language&#x27;, &#x27;&#x27;), (&#x27;caption&#x27;, &lt;wagtail.rich_text.RichText object at 0x7f17a92eef70&gt;)])]&gt;</dd> </dl></div> <div class="block-paragraph_advanced"><h3><span style="vertical-align: baseline;">Verifying WAF rule safety for your application and consuming telemetry</span></h3> <p><span style="vertical-align: baseline;">Cloud Armor rules can be </span><a href="https://docs.cloud.google.com/armor/docs/security-policy-overview#preview_mode"><span style="text-decoration: underline; vertical-align: baseline;">configured in preview mode</span></a><span style="vertical-align: baseline;">, a logging-only mode to test or monitor the expected impact of the rule without Cloud Armor enforcing the configured action. We recommend that the new rule described above first be deployed in preview mode in your production environments so that you can see what traffic it would block. </span></p> <p><span style="vertical-align: baseline;">Once you verify that the new rule is behaving as desired in your environment, then you can disable preview mode to allow Cloud Armor to actively enforce it.</span></p> <p><span style="vertical-align: baseline;">Cloud Armor per-request WAF logs are emitted as part of the Application Load Balancer logs to Cloud Logging. To see what Cloud Armor’s decision was on every request, load balancer logging first </span><a href="https://docs.cloud.google.com/load-balancing/docs/https/https-logging-monitoring"><span style="text-decoration: underline; vertical-align: baseline;">needs to be enabled on a per backend service basis</span></a><span style="vertical-align: baseline;">. Once it is enabled, all subsequent Cloud Armor decisions will be logged and can be found in Cloud Logging by </span><a href="https://docs.cloud.google.com/armor/docs/request-logging"><span style="text-decoration: underline; vertical-align: baseline;">following these instructions</span></a><span style="vertical-align: baseline;">.</span></p> <h3><span style="vertical-align: baseline;">Interaction of Cloud Armor rules with </span><span style="vertical-align: baseline;">vulnerability</span><span style="vertical-align: baseline;"> scanning tools</span></h3> <p><span style="vertical-align: baseline;">There has been a proliferation of scanning tools designed to help identify vulnerable instances of React and Next.js in your environments. Many of those scanners are designed to identify the version number of relevant frameworks in your servers and do so by crafting a </span><span style="vertical-align: baseline;">legitimate</span><span style="vertical-align: baseline;"> query and inspecting the response from the server to detect the version of React and </span><span style="vertical-align: baseline;">Next.js</span><span style="vertical-align: baseline;"> that is running. </span></p> <p><span style="vertical-align: baseline;">Our WAF rule is designed to detect and prevent exploit attempts of </span><span style="vertical-align: baseline;">CVE-2025-55182</span><span style="vertical-align: baseline;">. As the scanners discussed above are not attempting an exploit, but sending a safe query to </span><span style="vertical-align: baseline;">elicit</span><span style="vertical-align: baseline;"> a response revealing indications of the version of the software, </span><strong style="vertical-align: baseline;">the above Cloud Armor rule will not detect or block such scanners. </strong></p> <p><span style="vertical-align: baseline;">If the findings of these scanners indicate a vulnerable instance of software protected by Cloud Armor, that does not mean that an actual exploit attempt of the vulnerability will successfully get through your Cloud Armor security policy. Instead, such findings mean that the version React or Next.js detected is known to be vulnerable and should be patched.</span></p> <h3><span style="vertical-align: baseline;">How to get started with Cloud Armor for new users</span></h3> <p><span style="vertical-align: baseline;">If your workload is already using an Application Load Balancer to receive traffic from the internet, you can configure Cloud Armor to protect your workload from this and other application-level vulnerabilities (as well as DDoS attacks) by following </span><a href="https://docs.cloud.google.com/armor/docs/configure-security-policies"><span style="text-decoration: underline; vertical-align: baseline;">these instructions</span></a><span style="vertical-align: baseline;">. </span></p> <p><span style="vertical-align: baseline;">If you are not yet using an Application Load Balancer and Cloud Armor, you can get started with the </span><a href="https://docs.cloud.google.com/load-balancing/docs/https"><span style="text-decoration: underline; vertical-align: baseline;">external Application Load Balancer overview</span></a><span style="vertical-align: baseline;">, the </span><a href="https://docs.cloud.google.com/armor/docs/security-policy-overview"><span style="text-decoration: underline; vertical-align: baseline;">Cloud Armor overview</span></a><span style="vertical-align: baseline;">, and the </span><a href="https://docs.cloud.google.com/armor/docs/best-practices"><span style="text-decoration: underline; vertical-align: baseline;">Cloud Armor best practices</span></a><span style="vertical-align: baseline;">.</span></p> <p><span style="vertical-align: baseline;">If your workload is using </span><a href="http://docs.cloud.google.com/run/"><span style="text-decoration: underline; vertical-align: baseline;">Cloud Run</span></a><span style="vertical-align: baseline;">, </span><a href="https://cloud.google.com/functions"><span style="text-decoration: underline; vertical-align: baseline;">Cloud Run functions</span></a><span style="vertical-align: baseline;">, or </span><a href="https://cloud.google.com/appengine"><span style="text-decoration: underline; vertical-align: baseline;">App Engine</span></a><span style="vertical-align: baseline;"> and receives traffic from the internet, you must first </span><a href="https://docs.cloud.google.com/load-balancing/docs/https/setup-global-ext-https-serverless"><span style="text-decoration: underline; vertical-align: baseline;">set up an Application Load Balancer in front of your endpoint</span></a><span style="vertical-align: baseline;"> to leverage Cloud Armor security policies to protect your workload. You will then need to </span><a href="https://docs.cloud.google.com/armor/docs/integrating-cloud-armor#serverless"><span style="text-decoration: underline; vertical-align: baseline;">configure the appropriate controls</span></a><span style="vertical-align: baseline;"> to ensure that Cloud Armor and the Application Load Balancer can’t be bypassed.</span></p> <h3><span style="vertical-align: baseline;">Best practices and additional risk mitigations</span></h3> <p><span style="vertical-align: baseline;">Once you configure Cloud Armor, we recommend consulting our </span><a href="https://docs.cloud.google.com/armor/docs/best-practices"><span style="text-decoration: underline; vertical-align: baseline;">best practices guide</span></a><span style="vertical-align: baseline;">. Be sure to account for </span><a href="https://docs.cloud.google.com/armor/docs/security-policy-overview#limitations"><span style="text-decoration: underline; vertical-align: baseline;">limitations</span></a><span style="vertical-align: baseline;"> </span><span style="vertical-align: baseline;">discussed in the documentation to minimize risk and optimize performance while ensuring the safety and availability of your workloads. </span></p> <h3><span style="vertical-align: baseline;">Serverless platform protections</span></h3> <p><span style="vertical-align: baseline;">Google Cloud is enforcing platform-level protections across App Engine Standard, Cloud Functions, and Cloud Run to automatically help protect against common exploit attempts of CVE-2025-55182. This protection supplements the protections already in place for Firebase Hosting and Firebase App Hosting.</span></p> <p><strong style="vertical-align: baseline;">What this means for you:</strong></p> <ul> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">Applications deployed to those serverless services benefit from these WAF rules that are enabled by default to help provide a base level of protection without requiring manual configuration.</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">These rules are designed to block known malicious payloads targeting this vulnerability.</span></p> </li> </ul> <p><strong style="vertical-align: baseline;">Important considerations:</strong></p> <ul> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">Patching is still critical:</strong><span style="vertical-align: baseline;"> These platform-level defenses are intended to be a temporary mitigation. The most effective long-term solution is to update your application's dependencies to non-vulnerable versions of React and Next.js, and redeploy them.</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">Potential impacts:</strong><span style="vertical-align: baseline;"> While unlikely, if you believe this platform-level filtering is incorrectly impacting your application's traffic, please contact </span><a href="https://support.google.com/cloud/answer/6282346" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">Google Cloud Support</span></a><span style="vertical-align: baseline;"> and reference issue number 465748820.</span></p> </li> </ul> <h3><span style="vertical-align: baseline;">Long-term mitigation: Mandatory framework update and redeployment</span></h3> <p><span style="vertical-align: baseline;">While WAF rules provide critical frontline defense, the most comprehensive long-term solution is to patch the underlying frameworks.</span></p> <p><strong style="vertical-align: baseline;">While Google Cloud is providing platform-level protections and Cloud Armor options, we urge all customers running React and Next.js applications on Google Cloud to immediately update their dependencies to the latest stable versions (React 19.2.1 or the relevant version of Next.js listed </strong><a href="https://nextjs.org/blog/CVE-2025-66478#required-action" rel="noopener" target="_blank"><strong style="text-decoration: underline; vertical-align: baseline;">here</strong></a><strong style="vertical-align: baseline;">), and redeploy their services.</strong></p> <p><span style="vertical-align: baseline;">This applies specifically to applications deployed on:</span></p> <ul> <li role="presentation"><strong style="vertical-align: baseline;">Cloud Run, Cloud Run functions, or App Engine</strong><span style="vertical-align: baseline;">: Update your application dependencies with the updated framework versions and redeploy.</span></li> <li role="presentation"><strong style="vertical-align: baseline;">Google Kubernetes Engine (GKE)</strong><span style="vertical-align: baseline;">: Update your container images with the latest framework versions and redeploy your pods.</span></li> <li role="presentation"><strong style="vertical-align: baseline;">Compute Engine</strong><span style="vertical-align: baseline;">:</span><strong style="vertical-align: baseline;"> </strong><span style="vertical-align: baseline;">The public OS images provided by Google Cloud do not have React or Next.js packages installed by default. If you have installed a custom OS with the affected packages, update your workloads to include the latest framework versions and enable WAF rules in front of all workloads.</span></li> <li role="presentation"><strong style="vertical-align: baseline;">Firebase</strong><span style="vertical-align: baseline;">:</span><strong style="vertical-align: baseline;"> </strong><span style="vertical-align: baseline;">If you’re using Cloud Functions for Firebase, Firebase Hosting, or Firebase App Hosting, update your application dependencies with the updated framework versions and redeploy. Firebase Hosting and App Hosting are also automatically enforcing a rule to limit exploitation of CVE-2025-55182 through requests to custom and default domains.</span></li> </ul> <p><span style="vertical-align: baseline;">Patching your applications is an essential step to eliminate the vulnerability at its source and ensure the continued integrity and security of your services.</span></p> <p><span style="vertical-align: baseline;">We will continue to monitor the situation closely and provide further updates and guidance as necessary. Please refer to our official </span><a href="https://docs.cloud.google.com/support/bulletins#gcp-2025-072"><span style="text-decoration: underline; vertical-align: baseline;">Google Cloud Security advisories</span></a><span style="vertical-align: baseline;"> for the most current information and detailed steps.</span></p> <p><span style="vertical-align: baseline;">If you have any questions or require assistance, please contact </span><a href="https://support.google.com/cloud/answer/6282346" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">Google Cloud Support</span></a><span style="vertical-align: baseline;"> and reference issue number 465748820.</span></p></div>
  57. Key Enterprise Architect

    Mon, 13 Oct 2025 16:00:00 -0000

    <div class="block-paragraph"><p data-block-key="6kd7s">As engineers, we all dream of perfectly resilient systems — ones that scale perfectly, provide a great user experience, and never ever go down. What if we told you the key to building these kinds of resilient systems isn't avoiding failures, but deliberately causing them? Welcome to the world of chaos engineering, where you stress test your systems by <i>introducing</i> chaos, i.e., failures, into a system under a controlled environment. In an era where downtime can cost millions and destroy reputations in minutes, the most innovative companies aren't just waiting for disasters to happen — they're causing them and learning from the resulting failures, so they can build immunity to chaos before it strikes in production.</p><p data-block-key="396qd">Chaos engineering is useful for all kinds of systems, but particularly for cloud-based distributed ones. Modern architectures have evolved from monolithic to microservices-based systems, often comprising hundreds or thousands of services. These complex service dependencies introduce multiple points of failure, and it’s difficult if not impossible to predict all the possible failure modes through traditional testing methods. When these applications are deployed on the cloud, they are deployed across multiple availability zones and regions. This increases the likelihood of failure due to the highly distributed nature of cloud environments and the large number of services that coexist within them.</p><p data-block-key="93kcq">A common misconception is that cloud environments automatically provide application resiliency, eliminating the need for testing. Although cloud providers do offer various levels of resiliency and SLAs for their cloud products, these alone do not guarantee that your business applications are protected. If applications are not designed to be fault-tolerant or if they assume constant availability of cloud services, they will fail when a particular cloud service they depend on is not available.</p><p data-block-key="62d5j">In short, chaos engineering can take a team's worst "what if?" scenarios and transform them into well-rehearsed responses. Chaos engineering isn’t about breaking systems — engineering chaotically, as it were — it's about building teams that face production incidents with the calm confidence that only comes from having weathered that chaos before, albeit in controlled conditions.</p><p data-block-key="aipko">Google Cloud’s Professional Service Organization (PSO) Enterprise Architecture team consults on and provides hands-on expertise on customers’ cloud transformation journeys, including application development, cloud migrations, and enterprise architecture. And when advising on designing resilient architecture for cloud environments, we routinely introduce the principles and practices of chaos engineering and Site Reliability Engineering (SRE) practices.</p><p data-block-key="6ro3d">In this first blog post in a series, we explain the basics of chaos engineering — what it is and its core principles and elements. We then explore how chaos engineering is particularly helpful and important for teams running distributed applications in the cloud. Finally, we’ll talk about how to get started, and point you to further resources.</p><h2 data-block-key="pqp"><b>Understanding chaos engineering</b></h2><p data-block-key="fun25">Chaos engineering is a methodology invented by Netflix in 2010 when it created and popularized ‘Chaos Monkey’ to address the need to build more resilient and reliable systems in the face of increasing complexity in their AWS environment. Around the same time, Google introduced Disaster Resilience Testing, or DiRT, which enabled continuous and automated disaster readiness, response, and recovery of Google’s business, systems, and data. Here on Google Cloud’s PSO team, we offer various services to help customers implement DiRT as part of SRE practices. These offerings also include training on how to perform DiRT on applications and systems operating on Google Cloud. The central concept is straightforward: deliberately introduce controlled disruptions into a system to identify vulnerabilities, evaluate its resilience, and enhance its overall reliability.</p><p data-block-key="6t531">As a proactive discipline, chaos engineering enables organizations to identify weaknesses in their systems before they lead to significant outages or failures, where a system includes not only the technology components but also the people and processes of an organization. By introducing controlled, real-world disruptions, chaos engineering helps test a system's robustness, recoverability, and fault tolerance. This approach allows teams to uncover potential vulnerabilities, so that systems are better equipped to handle unexpected events and continue functioning smoothly under stress.</p><h3 data-block-key="59nsr"><b>Principles and practices of chaos engineering</b></h3><p data-block-key="df1o7">Chaos engineering is guided by a set of core principles about why it should be done, while practices define what needs to be done.</p><p data-block-key="8ao4o">Below are the principles of chaos engineering:</p><ol><li data-block-key="ftol1"><b>Build a hypothesis around steady state</b>: Prior to initiating any disruptive actions, you need to define what "normal" looks like for your system, commonly referred to as the "steady state hypothesis."</li><li data-block-key="6vvb8"><b>Replicate real-world conditions</b>: Chaos experiments should emulate realistic failure scenarios that the system might encounter in a production environment.</li><li data-block-key="decbe"><b>Run experiments in production</b>: Chaos engineering is firmly rooted in the belief that only a production environment with real traffic and dependencies can provide an accurate picture of resiliency. This is what separates chaos engineering from traditional testing.</li><li data-block-key="3de29"><b>Automate experiments:</b> Make resiliency testing part of a continuous ongoing process rather than a one-off test.</li><li data-block-key="am2bk"><b>Determine the blast radius</b>: Experiments should be meticulously designed to minimize adverse impacts on production systems. This requires categorizing applications and services in different tiers based on the impact the experiments can have on customers and other applications and services.</li></ol><p data-block-key="hldj">With these principles established, follow these practices when conducting a chaos engineering experiment:</p><ol><li data-block-key="1bkn"><b>Define steady state:</b> Identifies the specific metrics (e.g., latency, throughput) that you will look at and establish a baseline for them.</li><li data-block-key="c86r7"><b>Formulate a hypothesis</b>: This is the practice of creating a single testable statement, for example, ‘By deleting this container pod, user login will not be affected’. Hypotheses are generally created by identifying customer user journeys and deriving test scenarios from them.</li><li data-block-key="39bql"><b>Use a controlled environment:</b> While one chaos engineering principle states that experiments need to run in production, you should still start small and run your experiment in a non-production environment first, learn and adjust, and then gradually expand the scope to production environment.</li><li data-block-key="gtlb"><b>Inject failures</b>: This is the practice of causing disruption by injecting failures either directly into the system (e.g., deleting a VM, stopping a database instance) or indirectly by injecting failures in the environment (e.g. deleting a network route, adding a firewall rule).</li><li data-block-key="1410c"><b>Automate experimental execution</b>: Automation is crucial for establishing chaos engineering as a repeatable and scalable practice. This includes using automated tools for fault injection (e.g., making it part of a CI/CD pipeline) and automated rollback mechanisms.</li><li data-block-key="58mg2"><b>Derive actionable insights</b>: The primary objective of using chaos engineering is to gain insights into system vulnerabilities, thereby enhancing resilience. This involves rigorous analysis of experimental results; identifying weaknesses and areas for improvement; and disseminating findings to relevant teams to inform subsequent experimental design and system enhancements.</li></ol><p data-block-key="fh7in">In other words, chaos engineering isn't about breaking things for the sake of it, but about building more resilient systems by understanding their limitations and addressing them proactively.</p><h3 data-block-key="ftslk"><b>Elements of chaos engineering</b></h3><p data-block-key="evq8f">Here are the core elements you'll use in a chaos engineering experiment, derived from these five principles:</p><ul><li data-block-key="2isvq"><b>Experiments</b>: A chaos experiment constitutes a deliberate, pre-planned procedure wherein faults are introduced into a system to ascertain its response.</li><li data-block-key="d6djm"><b>Steady-state hypotheses</b>: A steady-state hypothesis defines the baseline operational state, or "normal" behavior, of the system under evaluation.</li><li data-block-key="3d8o5"><b>Actions</b>: An action represents a specific operation executed upon the system being experimented on.</li><li data-block-key="bpbv8"><b>Probes</b>: A probe provides a mechanism for observing defined conditions within the system during experimentation.</li><li data-block-key="f50fb"><b>Rollbacks</b>: An experiment may incorporate a sequence of actions designed to reverse any modifications implemented during the experiment.</li></ul><h2 data-block-key="327mk"><b>Getting started with chaos engineering</b></h2><p data-block-key="123gj">Now that you have a good understanding of chaos engineering and why to use it in your cloud environment, the next step is to try it out for yourself in your own development environment.</p><p data-block-key="6i4s2">There are multiple chaos engineering solutions in the market; some are paid products and some are open-source frameworks. To get started quickly, we recommend that you use <a href="https://chaostoolkit.org/" target="_blank">Chaos Toolkit</a> as your chaos engineering framework.</p><p data-block-key="atl4d">Chaos Toolkit is an open-source framework written in Python that provides a modular architecture where you can plug in other libraries (also known as ‘drivers’) to extend your chaos engineering experiments. For example, there are extension libraries for <a href="https://chaostoolkit.org/drivers/gcp/" target="_blank">Google Cloud</a>, <a href="https://chaostoolkit.org/drivers/kubernetes/" target="_blank">Kubernetes</a>, and many other technologies. Since Chaos Toolkit is a Python-based developer tool, you can begin by configuring your Python environment. You can find a good example of a Chaos Toolkit experiment and step-by-step explanation <a href="https://chaostoolkit.org/reference/tutorial/#getting-started-with-the-chaos-toolkit" target="_blank">here</a>.</p><p data-block-key="r2pl">Finally, to enable Google Cloud customers and engineers to introduce chaos testing in their applications, we’ve created a series of Google Cloud-specific chaos engineering recipes. Each recipe covers a specific scenario to introduce chaos in a particular Google Cloud service. For example, one recipe covers introducing chaos in an application/service running behind a Google Cloud internal or external application load balancer; another recipe covers simulating a network outage between an application running on Cloud Run and connecting to a Cloud SQL database by leveraging another Chaos Toolkit extension named <a href="https://chaostoolkit.org/drivers/toxiproxy/" target="_blank">ToxiProxy</a>.</p><p data-block-key="7bkoj">You can find a complete collection of recipes, including step-by-step instructions, scripts, and sample code, to learn how to introduce chaos engineering in your Google Cloud environment on <a href="https://github.com/GoogleCloudPlatform/chaos-engineering/blob/main/Chaos-Engineering-Recipes-Book.md" target="_blank">GitHub</a>. Then, stay tuned for subsequent posts, where we’ll talk about chaos engineering techniques, such as how to introduce faults into your Google Cloud environment.</p></div>
  58. Researcher

    Tue, 23 Sep 2025 14:00:00 -0000

    <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">Today, we are excited to announce the </span><a href="http://cloud.google.com/dora"><span style="text-decoration: underline; vertical-align: baseline;">2025 DORA Report: State of AI-assisted Software Development</span></a><span style="vertical-align: baseline;">. Drawing on insights from over 100 hours of qualitative data and survey responses from nearly 5,000 technology professionals from around the world. </span></p> <p><span style="vertical-align: baseline;">The report reveals a key insight: AI doesn't fix a team; it amplifies what's already there. Strong teams use AI to become even better and more efficient. Struggling teams will find that AI only highlights and intensifies their existing problems. The greatest return comes not from the AI tools themselves, but from a strategic focus on the quality of internal platforms, the clarity of workflows, and the alignment of teams.</span></p> <h3><strong style="vertical-align: baseline;">AI, the great amplifier</strong></h3> <p><span style="vertical-align: baseline;">As we established from the </span><a href="https://dora.dev/research/2024/" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">2024 report</span></a><span style="vertical-align: baseline;"> as well as the special report published this year called </span><a href="https://dora.dev/research/ai/" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">“Impact of Generative AI in Software Development”</span></a><span style="vertical-align: baseline;">, organizations are continuing to heavily adopt AI and receive substantial benefits across important outcomes. And there is evidence of learning to better integrate these tools into our workflow. Unlike last year, we observe a positive relationship between AI adoption on both software delivery throughput and product performance. It appears that people, teams, and tools are learning where, when, and how AI is most useful. However, AI adoption does continue to have a negative relationship with software delivery stability.</span></p> <p><span style="vertical-align: baseline;">This confirms our central theory - AI accelerates software development, but that acceleration can expose weaknesses downstream. Without robust control systems, like strong automated testing, mature version control practices, and fast feedback loops, an increase in change volume leads to instability. Teams working in loosely coupled architectures with fast feedback loops see gains, while those constrained by tightly coupled systems and slow processes see little or no benefit.</span></p> <p><strong style="vertical-align: baseline;">Key findings from the 2025 report</strong></p> <p><span style="vertical-align: baseline;">Beyond this central theme, this year’s research highlighted the following about modern software development:</span></p> <ul> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">AI adoption is near-universal</strong><span style="vertical-align: baseline;">: 90% of survey respondents report using AI at work. More than 80% believe it has increased their productivity. However, skepticism remains as 30% report little or no trust in the code generated by AI, a slightly lower percentage than last year but a key trend to note.</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">User-centricity is a prerequisite for AI success</strong><span style="vertical-align: baseline;">: AI becomes most useful when it's pointed at a clear problem, and a user-centric focus provides that essential direction. Our data shows this focus amplifies AI’s positive influence on team performance.</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">Platform engineering is the foundation</strong><span style="vertical-align: baseline;">: Our data shows that 90% of organizations have adopted at least one platform and there is a direct correlation between a high quality internal platform and an organization’s ability to unlock the value of AI, making it an essential foundation for success.</span></p> </li> </ul> <h3><strong style="vertical-align: baseline;">The seven team archetypes</strong></h3> <p><span style="vertical-align: baseline;">Simple software delivery metrics alone aren’t sufficient. They tell you what is happening but not why it’s happening. To connect performance data to experience, we conducted a cluster analysis that reveals seven common team profiles or archetypes, each with a unique interplay of performance, stability, and well-being. This model provides leaders with a way to diagnose team health and apply the right interventions. </span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/2_YtpOb3P.max-1000x1000.jpg" alt="2"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">The ‘Foundational challenges’ group are trapped in survival mode and face significant gaps in their processes and environment, leading to low performance, high system stability, and high levels of burnout and friction. While the ‘Harmonious high achievers’ excel across multiple areas, showing positive metrics for team well-being, product outcomes, and software delivery. </span></p> <p><span style="vertical-align: baseline;">Read more details of each archetype in the "Understanding your software delivery performance: A look at seven team profiles" chapter of the report.</span></p> <h3><strong style="vertical-align: baseline;">Unlocking the value of AI with the ‘DORA AI Capabilities Model’</strong></h3> <p><span style="vertical-align: baseline;">This year, we went beyond identifying AI’s impact to investigating the conditions in which AI-assisted technology-professionals  realize the best outcomes. The value of AI is unlocked not by the tools themselves, but by the surrounding technical practices and cultural environment.</span></p> <p><span style="vertical-align: baseline;">Our research identified seven capabilities that are shown to magnify the positive impact of AI in organizations.</span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/DORA_inline_2.max-1000x1000.png" alt="image1"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><h3><strong style="vertical-align: baseline;">Where leaders should get started</strong></h3> <p><span style="vertical-align: baseline;">One of the key insights derived from the research this year is that the value of AI will be unlocked by reimagining the system of work it inhabits. Technology leaders should treat AI adoption as an organizational transformation.</span></p> <p><span style="vertical-align: baseline;">Here’s where we suggest you begin:</span></p> <ul> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">Clarify and socialize your AI policies</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">Connect AI to your internal context</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">Prioritize foundational practices</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">Fortify your safety nets</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">Invest in your internal platform</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">Focus on your end-users</span></p> </li> </ul> <p><span style="vertical-align: baseline;">The </span><a href="https://dora.dev/" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">DORA research program</span></a><span style="vertical-align: baseline;"> is committed to serving as a compass to teams and organizations as we navigate the important and transformative period with AI. We hope the new team profiles and the DORA AI capabilities model provide a clear roadmap for you to move beyond simply adopting AI to unlocking its value by investing in teams and people. We look forward to learning how you put these insights into practice. To learn more:</span></p> <ul> <li role="presentation"><a href="http://cloud.google.com/dora"><span style="text-decoration: underline; vertical-align: baseline;">Download</span></a><span style="vertical-align: baseline;"> the full report</span></li> <li role="presentation"><span style="vertical-align: baseline;">Join the </span><a href="https://dora.community/" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">DORA community</span></a></li> <li><span style="vertical-align: baseline;">Share this </span><a href="https://dora.dev/research/2025/" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">overview</span></a><span style="vertical-align: baseline;"> with your colleagues</span></li> </ul></div>
  59. Cloud Solutions Architect Manager, Google Cloud

    Wed, 13 Aug 2025 16:00:00 -0000

    <div class="block-paragraph"><p data-block-key="bgr19">What guides your approach to software development? In our roles at Google, we’re constantly working to build better software, faster. Within Google, our Developer Platform team and Google Cloud have a strategic partnership and a shared strategy: together, we take our internal capabilities and engineering tools and package them up for Google Cloud customers.</p><p data-block-key="e2l3s">At the heart of this is understanding the many ways that software teams, big and small, need to balance efficiency, quality, and cost, all while delivering value. In our recent <a href="https://www.youtube.com/watch?v=T6a9gPSoqxo" target="_blank">talk at PlatformCon 2025</a>, we shared key parts of our platform strategy, which we call “shift down.”</p><p data-block-key="d6oe8"><b>Shift down is an approach that advocates for embedding decisions and responsibilities into underlying internal developer platforms (IDPs)</b>, thereby reducing the operational burden on developers. This contrasts with the <a href="https://cloud.google.com/devops">DevOps</a> trend of "shift left," which pushes more effort earlier into the development cycle, a method that is proving difficult at scale due to the sheer volume and rate of change in requirements. Our shift down strategy helps us maximize value with existing resources so businesses can achieve high innovation velocity with acceptable quality, acceptable risk, and sustainable costs across a diverse range of business models. In the talk, we share learnings that have been really helpful to us in our software and <a href="https://cloud.google.com/solutions/platform-engineering">platform engineering</a> journey:</p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/image1_98vVMdt.max-1000x1000.jpg" alt="image1"> </a> </figure> </div> </div> </div> <div class="block-aside"><dl> <dt>aside_block</dt> <dd>&lt;ListValue: []&gt;</dd> </dl></div> <div class="block-paragraph"><ol><li data-block-key="bgr19"><b>Work backwards from the business model:</b> By starting with the business model, organizations can intentionally guide platform evolution and investment to align with desired margins, risk tolerance, and quality requirements. At Google, our central platform must support diverse business models, necessitating continuous strategic refinement and adaptation.</li><li data-block-key="fs6ra"><b>Focus on quality attributes for central software control:</b> Quality attributes, such as reliability, security, efficiency, and performance, are <a href="https://en.wikipedia.org/wiki/Emergence" target="_blank">emergent</a> properties of software systems and are important for creating business value and managing risk. These are often referred to as “non-functional requirements” because they define how our software behaves, not what it functionally does. With a shift down strategy, we can embed the responsibility for assuring quality attributes directly into the underlying platform systems and infrastructure, thereby significantly reducing the operational burden on individual developers.</li><li data-block-key="5a5sh"><b>Abstractions and coupling are key technical tools to gain control of quality attributes:</b> We define two key technical components in the way we build platforms: <i>abstractions</i> and <i>coupling</i>. In a shift down strategy, abstractions provide understandability, risk management levers, accountability, and cost control by encapsulating complexity. Coupling refers to the interconnectedness and interdependence of components within a system or development ecosystem. For a successful shift down strategy, the right degree of coupling is crucial because it allows the development platform and ecosystem design to directly implement and influence quality attributes. In fact, coupling is how we offer entire infrastructure and platform solutions as coherent services like <a href="https://cloud.google.com/kubernetes-engine">Google Kubernetes Engine</a> (GKE).</li><li data-block-key="2pktp"><b>Shared responsibility, education, and policy are equally important social tools:</b> Shared responsibility is a crucial social tool within software at scale. This is actively cultivated through education, such as training engineers on platform and AI usage, and fostering a "one team" culture that encourages a shift from artifact-bound identities to overarching mission goals and client-focused engagement. Furthermore, explicit policies like centrally enforced style guides and secure-by-design APIs are fundamental for embedding quality attribute assurance directly into the platform and infrastructure, significantly reducing the operational burden on individual developers by ensuring consistency and automated controls at scale.</li><li data-block-key="bh7kd"><b>Use a map.</b> Supporting many business units with one platform is a vast and complex problem; we need a map. The ecosystem model is a framework that categorizes different types of software development environments, ranging from highly flexible, developer-controlled systems to highly opinionated, vertically integrated ones where the ecosystem itself assures quality attributes. Its critical purpose is to provide a visual and conceptual tool for evaluating how well our ecosystem controls match our business risk. This helps us ensure that the level of oversight and assurance of quality attributes aligns with the potential cost of mistakes. The goal is to be in the "ecosystem effectiveness zone," where controls are balanced to mitigate significant risks from human error without imposing overly restrictive systems that negatively impact velocity and developer satisfaction.</li></ol></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/1_xiA9TUH.max-1000x1000.png" alt="1"> </a> </figure> </div> </div> </div> <div class="block-paragraph"><p data-block-key="bgr19">6. <b>Divide up the problem space by identifying different platform and ecosystem types.</b></p><p data-block-key="dk549">Because the developer experience and platform infrastructure change with scale and degree of shifting down, it’s not enough to just know where the ecosystem effectiveness zone is — you have to identify the ecosystem by type. We differentiate ecosystem types by the degree of oversight and assurance for quality attributes. As an ecosystem becomes more vertically integrated, such as Google's highly optimized "Assured" (Type 4) ecosystem, the platform itself assumes increasing responsibility for vital quality attributes, allowing specialists like site reliability engineers (SRE) and security teams to have full ownership in taking action through large-scale observability and embedded capabilities. Conversely, in less uniform "YOLO," "AdHoc," or "Guided" (Type 0-2) ecosystems, developers have more responsibility for assuring these attributes, while central specialist teams have less direct control and enforcement mechanisms are less pervasive. It’s really important to note here that this is <b>not</b> a maturity model — the best ecosystem and platform type is the one that best fits your business need (see point #1 above!).</p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/2_SQqhW9d.max-1000x1000.png" alt="2"> </a> </figure> </div> </div> </div> <div class="block-paragraph"><h3 data-block-key="bgr19"><b>Intentional choices in platform engineering</b></h3><p data-block-key="2cujr">The most important takeaway is to make active choices. Tailor platform engineering for each business unit and application to achieve the best outcomes. Place critical emphasis on identifying and solving stable sub-problems in reliable, reusable ways across various business problems. This approach directly underpins our "shift down" strategy, moving toward composable platforms that embed decisions and responsibilities for software quality directly into the underlying platform infrastructure, thereby improving our ability to maximize business value with the right resources, at the right quality level, and with sustainable costs.</p><p data-block-key="8q0du"><a href="https://www.youtube.com/watch?v=T6a9gPSoqxo" target="_blank">Watch our full discussion</a> for more insights on effective platform engineering.</p></div>
  60. Product Manager

    Mon, 04 Aug 2025 16:00:00 -0000

    <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">Application owners are looking for three things when they think about optimizing cloud costs:</span></p> <ol> <li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">What are the most expensive resources?</span></p> </li> <li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">Which resources are costing me more this week or month?</span></p> </li> <li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">Which resources are poorly utilized?</span></p> </li> </ol> <p><span style="vertical-align: baseline;">To help you answer these questions quickly and easily, we </span><a href="https://cloud.google.com/blog/products/application-development/an-application-centric-ai-powered-cloud?e=13802955"><span style="text-decoration: underline; vertical-align: baseline;">announced</span></a><span style="vertical-align: baseline;"> Cloud Hub Optimization and Cost Explorer, in private preview, at Google Cloud Next 2025. And today, we are excited to announce that both Cloud Hub Optimization and Cost Explorer are now in public preview.</span></p> <h2><span style="vertical-align: baseline;">Application cost and utilization</span></h2> <p><span style="vertical-align: baseline;">As an app owner, your primary objective is keeping your application healthy at all times. Yet, monitoring all the individual components of your application, which may straddle dozens of Projects, can be quite overwhelming. </span><a href="https://cloud.google.com/products/app-hub"><span style="text-decoration: underline; vertical-align: baseline;">AppHub Applications</span></a><span style="vertical-align: baseline;"> allow you to reorganize cloud around your application, giving you the information and controls you need at your fingertips.</span></p> <p><span style="vertical-align: baseline;">In addition to supporting Google Cloud Projects, Cloud Hub Optimization and Cost Explorer leverage </span><a href="https://cloud.google.com/products/app-hub"><span style="text-decoration: underline; vertical-align: baseline;">App Hub</span></a><span style="vertical-align: baseline;"> applications to show you the cost-efficiency of your application’s workloads and services instantly. This is great for instance when you are trying to pinpoint deployments running on GKE clusters that might be wasting valuable resources, such as GPUs.</span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/1_CHO_utilization_summary_app.max-1000x1000.jpg" alt="1_CHO_utilization summary app"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><h2><span style="vertical-align: baseline;">Not just another cost dashboard</span></h2> <p><span style="vertical-align: baseline;">When you bring up Cloud Hub Optimization, you can immediately see the resources that are costing you the most, along with the percentage change in their cost. With this highly granular cost information, you can now attribute your costs to specific resources and resource owners to reason about any changes in costs.</span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/2_CHO_cost_summary.max-1000x1000.jpg" alt="2_CHO_cost summary"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">We have additionally integrated granular cost data from Cloud Billing and resource utilization data from Cloud Monitoring to give you a comprehensive picture of your cost efficiency. This includes average vCPU utilization for your Project, which helps you find the most promising optimization candidates across hundreds of Google Cloud Projects.</span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/3_CHO_utilization_summary_project.max-1000x1000.jpg" alt="3_CHO_utilization summary project"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">The Cost Explorer dashboard also shows you your costs logically organized at the product level, for even more cost explainability. Instead of seeing a lump sum cost for Compute Engine, you can now see your exact spend on individual products including Google Kubernetes Engine (GKE) clusters, Persistent Disks, Cloud Load Balancing, and more.</span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/4_CHO_cost_explorer.max-1000x1000.jpg" alt="4_CHO_cost explorer"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><h2><strong style="vertical-align: baseline;">Simple is powerful</strong></h2> <p><span style="vertical-align: baseline;">Customers who have tried these new tools love the information that is surfaced as well as the simplicity of the interfaces.</span></p> <p style="padding-left: 40px;"><span style="font-style: italic; vertical-align: baseline;">“My team has to keep an eye on cloud costs across tens of business units and hundreds of developers. The Cloud Hub Optimization and Cost Explorer dashboards are a force multiplier for my team as they tell us where to look for cost savings and potential optimization opportunities.”</span><span style="vertical-align: baseline;"> - Frank Dice, Principal Cloud Architect, Major League Baseball</span></p> <p><span style="vertical-align: baseline;">Customers especially appreciate the </span><a href="https://cloud.google.com/stackdriver/docs/costs/optimize-costs#supported_products"><span style="text-decoration: underline; vertical-align: baseline;">breadth of product coverage</span></a><span style="vertical-align: baseline;"> available out of the box without any additional setup, and the fact that there is no additional charge to using these features.</span></p> <h2><strong style="vertical-align: baseline;">What’s next</strong></h2> <p><span style="vertical-align: baseline;">As your organization “shifts left” on cloud cost management, we are working to help application owners and developers understand and optimize their cloud costs. You can try Cloud Hub Optimize and Cost Explorer </span><a href="https://console.cloud.google.com/cloud-hub/optimization"><span style="text-decoration: underline; vertical-align: baseline;">here</span></a><span style="vertical-align: baseline;">.</span></p> <p><span style="vertical-align: baseline;">You can also see a live demo of how Cloud Hub Optimization and Cost Explorer can be used to identify underutilized GKE clusters within seconds in the Google Cloud Next 2025 talk Maximize Your Cloud ROI.</span></p></div> <div class="block-video"> <div class="article-module article-video "> <figure> <a class="h-c-video h-c-video--marquee" href="https://youtube.com/watch?v=7csgD3iIc2Q" data-glue-modal-trigger="uni-modal-7csgD3iIc2Q-" data-glue-modal-disabled-on-mobile="true"> <div class="article-video__aspect-image" style="background-image: url(https://storage.googleapis.com/gweb-cloudblog-publish/images/maxresdefault_LGJSUja.max-1000x1000.jpg);"> <span class="h-u-visually-hidden">Maximize your cloud ROI: A practical approach to efficiency and optimization</span> </div> <svg role="img" class="h-c-video__play h-c-icon h-c-icon--color-white"> <use xlink:href="#mi-youtube-icon"></use> </svg> </a> </figure> </div> <div class="h-c-modal--video" data-glue-modal="uni-modal-7csgD3iIc2Q-" data-glue-modal-close-label="Close Dialog"> <a class="glue-yt-video" data-glue-yt-video-autoplay="true" data-glue-yt-video-height="99%" data-glue-yt-video-vid="7csgD3iIc2Q" data-glue-yt-video-width="100%" href="https://youtube.com/watch?v=7csgD3iIc2Q" ng-cloak> </a> </div> </div> <div class="block-paragraph_advanced"><hr/> <p><sup><span style="font-style: italic; vertical-align: baseline;">Major League Baseball trademarks and copyrights are used with permission of Major League Baseball. Visit MLB.com.</span></sup></p></div>
  61. Senior Product Manager

    Fri, 01 Aug 2025 16:00:00 -0000

    <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">Are you ready to unlock the power of Google Cloud and want guidance on how to set up your environment effectively? Whether you're a cloud novice or part of an experienced team looking to migrate critical workloads, getting your foundational infrastructure right is the key to success. That's where </span><a href="https://cloud.google.com/docs/enterprise/setup-checklist"><strong style="text-decoration: underline; vertical-align: baseline;">Google Cloud Setup</strong></a><span style="vertical-align: baseline;"> comes in — your guided pathway to a secure cloud foundation and quick start on Google Cloud.</span></p> <p><span style="vertical-align: baseline;">Google Cloud Setup helps you quickly implement Google Cloud's recommended best practices. Our goal is to provide a fast and easy path to deploying your workloads without unnecessary configuration effort. Think of it as your expert guide, walking you through the essential first steps so you can focus on what truly matters: rapidly deploying your innovative applications and services. To help you get started without financial barriers, all components and service integrations enabled during the setup process are free or include some level of no-cost access.</span></p></div> <div class="block-aside"><dl> <dt>aside_block</dt> <dd>&lt;ListValue: [StructValue([(&#x27;title&#x27;, &#x27;Try Google Cloud for free&#x27;), (&#x27;body&#x27;, &lt;wagtail.rich_text.RichText object at 0x7f17a960e700&gt;), (&#x27;btn_text&#x27;, &#x27;Get started for free&#x27;), (&#x27;href&#x27;, &#x27;https://console.cloud.google.com/freetrial?redirectPath=/welcome&#x27;), (&#x27;image&#x27;, None)])]&gt;</dd> </dl></div> <div class="block-paragraph_advanced"><h3><strong style="vertical-align: baseline;">Choose the foundation that fits your needs</strong></h3> <p><span style="vertical-align: baseline;">We understand that every organization and project has unique requirements. That's why Cloud Setup offers three distinct guided flows to choose from:</span></p> <ul> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">Proof-of-concept:</strong><span style="vertical-align: baseline;"> Designed for users who want to set up a lightweight environment to explore Google Cloud and run initial tests or sandbox workloads. This flow focuses on the minimum configuration to get you started quickly.</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">Production:</strong><span style="vertical-align: baseline;"> This flow is recommended for supporting production-ready workloads with security and scalability in mind. It aligns with Google Cloud’s best practices and is tailored for administrators setting up basic foundational infrastructure for production workloads.</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">Enhanced security:</strong><span style="vertical-align: baseline;"> Designed for organizations, regions or workloads with advanced security and compliance requirements, this flow defaults to more advanced security controls and is designed to help you meet rigorous requirements. Even this advanced foundation sets you up with a perpetual free tier up to certain usage limits.</span></p> </li> </ul></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/image1_LQ4uQKn.max-1000x1000.png" alt="1"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><h3><strong style="vertical-align: baseline;">Building blocks for a solid foundation</strong></h3> <p><span style="vertical-align: baseline;">Cloud Setup guides you through a series of onboarding steps, presenting defaults backed by</span><strong style="vertical-align: baseline;"> </strong><a href="https://cloud.google.com/security/best-practices"><strong style="text-decoration: underline; vertical-align: baseline;">Google Cloud best practices</strong></a><span style="vertical-align: baseline;">. Throughout the process, you'll also encounter key features designed to help protect your organization and prepare it for growth, including:</span></p> <ul> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><a href="https://cloud.google.com/kms/docs/kms-autokey"><strong style="text-decoration: underline; vertical-align: baseline;">Cloud KMS AutoKey</strong></a><strong style="vertical-align: baseline;">:</strong><span style="vertical-align: baseline;"> Automates the provisioning and assignment of customer-managed encryption keys (CMEK).</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><a href="https://cloud.google.com/security/products/security-command-center"><strong style="text-decoration: underline; vertical-align: baseline;">Security Command Center</strong></a><strong style="vertical-align: baseline;">: </strong><span style="vertical-align: baseline;">Provides security posture management for Google Cloud deployments including automatic project scanning for security issues such as open ports and misconfigured access controls.</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><a href="https://cloud.google.com/docs/observability"><strong style="text-decoration: underline; vertical-align: baseline;">Centralized Logging and Monitoring</strong></a><strong style="vertical-align: baseline;">:</strong><span style="vertical-align: baseline;"> Enables you to easily set up infrastructure to monitor your system's health and performance from a central location — critical for audit logging compliance and visualizing metrics across projects.</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><a href="https://cloud.google.com/vpc/docs/shared-vpc"><strong style="text-decoration: underline; vertical-align: baseline;">Shared VPC Networks</strong></a><strong style="vertical-align: baseline;">: </strong><span style="vertical-align: baseline;">Allows you to establish a centralized network across multiple projects, enabling secure and efficient communication between your Google Cloud resources.</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><a href="https://cloud.google.com/hybrid-connectivity"><strong style="text-decoration: underline; vertical-align: baseline;">Hybrid Connectivity</strong></a><strong style="vertical-align: baseline;">:</strong><span style="vertical-align: baseline;"> Facilitates connecting your Google Cloud environment to your on-premises infrastructure or other cloud providers. This is often a critical step for workload migrations.</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><a href="https://cloud.google.com/support"><strong style="text-decoration: underline; vertical-align: baseline;">Support plan</strong></a><strong style="vertical-align: baseline;">:</strong><span style="vertical-align: baseline;"> Enables you to quickly resolve any issues with help from experts at Google Cloud.</span></p> </li> </ul> <p><span style="vertical-align: baseline;">At the end of the guided flow, you can deploy your configuration directly via the Google Cloud console or download a </span><a href="https://cloud.google.com/docs/enterprise/deploy-foundation-using-terraform-from-console"><span style="text-decoration: underline; vertical-align: baseline;">Terraform configuration file</span></a><span style="vertical-align: baseline;"> for later deployment using other Infrastructure as Code (IaC) methods.</span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/2_RwqPvpA.gif" alt="2"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><h3><strong style="vertical-align: baseline;">Experience the cloud faster and smarter</strong></h3> <p><span style="vertical-align: baseline;">Organizations using Cloud Setup experience enjoy:</span></p> <ul> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">Faster application deployment: </strong><span style="vertical-align: baseline;">By simplifying the initial setup, you can get your applications up and running more quickly, accelerating your cloud journey.</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">Reduced setup effort:</strong><span style="vertical-align: baseline;"> Our streamlined flow significantly reduces the number of manual steps, allowing you to establish a basic foundation with less effort.</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">Greater access to Google Cloud's full potential: </strong><span style="vertical-align: baseline;">By establishing a solid foundation quickly, you can more easily explore and leverage a wider range of Google Cloud services to meet your evolving needs and unlock greater value.</span></p> </li> </ul> <p><span style="vertical-align: baseline;">Ready to start your Google Cloud journey? Visit Google Cloud Setup today for a streamlined path to a secure cloud foundation. Let us guide you through the initial steps so you can focus on innovation and growth.</span></p> <p><span style="vertical-align: baseline;">To learn more, visit:</span></p> <ul> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><a href="https://cloud.google.com/docs/enterprise/setup-checklist"><span style="text-decoration: underline; vertical-align: baseline;">Cloud Setup documentation</span></a></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><a href="https://console.cloud.google.com/cloud-setup/overview" style="font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, Cantarell, 'Open Sans', 'Helvetica Neue', sans-serif;"><span style="text-decoration: underline; vertical-align: baseline;">Cloud Setup overview</span></a><span style="vertical-align: baseline;"> (requires login)</span></p> </li> </ul></div>
  62. Product Manager

    Fri, 18 Jul 2025 16:00:00 -0000

    <div class="block-paragraph_advanced"><p style="text-align: justify;"><span style="vertical-align: baseline;">As developers and operators, you know that having access to the right information in the proper context is crucial for effective troubleshooting. This is why organizations invest a lot upfront curating monitoring resources across different business units: so information is easy to find and contextualize when needed.</span></p> <p style="text-align: justify;"><span style="vertical-align: baseline;">Today we are reducing the need for this upfront investment with an out-of-the-box </span><strong style="vertical-align: baseline;">Application Monitoring</strong><span style="vertical-align: baseline;"> experience for your organization on Google Cloud within </span><a href="https://cloud.google.com/stackdriver/docs"><span style="text-decoration: underline; vertical-align: baseline;">Cloud Observability</span></a><span style="vertical-align: baseline;">. </span></p> <p style="text-align: justify;"><span style="vertical-align: baseline;">Application Monitoring consists of a set of pre-curated dashboards with relevant metrics and logs mapped to a user-defined application in </span><a href="https://cloud.google.com/products/app-hub"><span style="text-decoration: underline; vertical-align: baseline;">App Hub</span></a><span style="vertical-align: baseline;">. It incorporates best practices pioneered by Google Site Reliability Engineers (SRE) to optimize manual troubleshooting and unlock AI-assisted troubleshooting.</span></p> <p><span style="vertical-align: baseline;">Application Monitoring automatically labels and brings together key telemetry for your application into a centralized experience, making it easy to discover, filter and correlate trends. It also feeds application context into </span><a href="https://cloud.google.com/gemini/docs/cloud-assist/investigations"><span style="text-decoration: underline; vertical-align: baseline;">Gemini Cloud Assist Investigations</span></a><span style="vertical-align: baseline;">, for AI-assisted troubleshooting. </span></p></div> <div class="block-aside"><dl> <dt>aside_block</dt> <dd>&lt;ListValue: [StructValue([(&#x27;title&#x27;, &#x27;Try Google Cloud for free&#x27;), (&#x27;body&#x27;, &lt;wagtail.rich_text.RichText object at 0x7f17a9bbb940&gt;), (&#x27;btn_text&#x27;, &#x27;Get started for free&#x27;), (&#x27;href&#x27;, &#x27;https://console.cloud.google.com/freetrial?redirectPath=/welcome&#x27;), (&#x27;image&#x27;, None)])]&gt;</dd> </dl></div> <div class="block-paragraph_advanced"><h3><span style="vertical-align: baseline;">1. Application, service and workload dashboards </span></h3> <p style="text-align: justify;"><strong style="font-style: italic; vertical-align: baseline;">No more spending hours configuring application dashboards. </strong></p> <p style="text-align: justify;"><span style="vertical-align: baseline;">From the moment you </span><a href="https://cloud.google.com/app-hub/docs/set-up-app-hub-folder"><span style="text-decoration: underline; vertical-align: baseline;">describe your application in App Hub</span></a><span style="vertical-align: baseline;">, Application Monitoring starts to automatically build dashboards tailored to your environment. Each dashboard comprises relevant telemetry for your application and is searchable, filterable and ready for deep dives — no configuration required. </span></p> <p style="text-align: justify;"><span style="vertical-align: baseline;">The dashboards offer an overview of charts detailing the </span><a href="https://sre.google/sre-book/monitoring-distributed-systems/#xref_monitoring_golden-signals" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">SRE Four Golden Signals</span></a><span style="vertical-align: baseline;">: traffic, latency, error rate, and saturation. This provides a high-level view of application performance, integrating automatically collected system metrics across various services and workloads such as load balancers, Cloud Run, GKE workloads, MIGs, and databases. From this overview, you can then drill down into services or workloads with performance issues or active alerts to access detailed metrics and logs.</span></p> <p><span style="vertical-align: baseline;">For example in the image below, a user defined an App Hub application called </span><span style="font-style: italic; vertical-align: baseline;">Cymbal BnB app</span><span style="vertical-align: baseline;">, with multiple services and workloads. The flow below shows the automatically generated experience with golden signals, alerts and relevant logs.</span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/1_zgV6J6C.gif" alt="1"> </a> <figcaption class="article-image__caption "><p data-block-key="g1e0b">Figure 1 - A user’s flow from an App Hub defined application (i.e. Cymbal BnB) to the automatic prebuilt Application Monitoring experience in Cloud Observability</p></figcaption> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><h3 role="presentation" style="text-align: justify;"><span style="vertical-align: baseline;">2. Labels and context propagation </span></h3> <p style="text-align: justify;"><strong style="font-style: italic; vertical-align: baseline;">See application labels propagated seamlessly across Google Cloud </strong></p> <p style="text-align: justify;"><span style="vertical-align: baseline;">Once Application Monitoring is enabled, your application labels are propagated across Google Cloud, so you can see and use them to filter and focus on the most essential signals across the logs, metrics and trace explorers.</span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/2_yj24vCu.max-1000x1000.png" alt="2"> </a> <figcaption class="article-image__caption "><p data-block-key="g1e0b">Figure 2 - Logs Explorer showing application automatically tagged with application labels</p></figcaption> </figure> </div> </div> </div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/3_kukVdIB.max-1000x1000.png" alt="3"> </a> <figcaption class="article-image__caption "><p data-block-key="g1e0b">Figure 3 - Metrics Explorer showing application labels automatically associated with metrics</p></figcaption> </figure> </div> </div> </div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/4_BGEDIwf.max-1000x1000.png" alt="4"> </a> <figcaption class="article-image__caption "><p data-block-key="g1e0b">Figure 4 - Trace Explorer showing AppHub label Integration</p></figcaption> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><h3><span style="vertical-align: baseline;">3. Gemini Cloud Assist Investigations</span></h3> <p style="text-align: justify;"><strong style="font-style: italic; vertical-align: baseline;">Troubleshoot issues faster with AI powered Investigations. </strong></p> <p><a href="https://cloud.google.com/gemini/docs/cloud-assist/investigations"><span style="text-decoration: underline; vertical-align: baseline;">Gemini Cloud Assist’s investigation feature</span></a><span style="vertical-align: baseline;"> makes it easier to troubleshoot issues because application boundaries and relationships have been propagated into the AI model, grounding it in context about your environment.  </span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/5_O7Wiid5.gif" alt="5"> </a> <figcaption class="article-image__caption "><p data-block-key="g1e0b">Figure 5 - Seamless entry point into Gemini Cloud Assist powered Investigations from application logs</p></figcaption> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><p style="text-align: justify;"><span style="vertical-align: baseline;">Note - Gemini Cloud Assist Investigations is currently in private preview</span></p> <h3><span style="vertical-align: baseline;">Try Application Monitoring today</span></h3> <p style="text-align: justify;"><span style="vertical-align: baseline;">The new</span><span style="vertical-align: baseline;"> Application Monitoring experience provides a low-effort unified view of application and infrastructure performance for your troubleshooting needs.</span></p> <p style="text-align: justify;"><span style="vertical-align: baseline;">Take advantage of the new Google Cloud Application Monitoring experience by:</span></p> <ol> <li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"> <p role="presentation" style="text-align: justify;"><span style="vertical-align: baseline;">Visiting your Cloud console</span></p> </li> <li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"> <p role="presentation" style="text-align: justify;"><a href="https://cloud.google.com/app-hub/docs/set-up-app-hub-folder"><span style="text-decoration: underline; vertical-align: baseline;">Setting up </span><strong style="text-decoration: underline; vertical-align: baseline;">Applications</strong><span style="text-decoration: underline; vertical-align: baseline;"> in AppHub</span></a></p> </li> <ol> <li aria-level="2" style="list-style-type: lower-alpha; vertical-align: baseline;"> <p role="presentation" style="text-align: justify;"><span style="vertical-align: baseline;">Adding </span><strong style="vertical-align: baseline;">Services</strong><span style="vertical-align: baseline;"> and </span><strong style="vertical-align: baseline;">Workloads</strong><span style="vertical-align: baseline;"> to your Application</span></p> </li> </ol> <li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"> <p role="presentation" style="text-align: justify;"><span style="vertical-align: baseline;">Navigating to </span><strong style="vertical-align: baseline;">Application Monitoring</strong><span style="vertical-align: baseline;"> in Cloud Observability to see your automatically built experience</span></p> </li> <li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"> <p role="presentation" style="text-align: justify;"><span style="vertical-align: baseline;">Enable your Gemini Cloud Assist SKU and </span><a href="https://cloud.google.com/earlyaccess/gemini-cloud-assist?e=48754805&amp;hl=en"><span style="text-decoration: underline; vertical-align: baseline;">sign up for the trusted tester program</span></a><span style="vertical-align: baseline;"> to get access to the</span><strong style="vertical-align: baseline;"> Investigations experience</strong></p> </li> </ol> <h3 style="text-align: justify;"><span style="vertical-align: baseline;">Related docs</span></h3> <ol style="list-style-type: lower-alpha;"> <li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"> <p role="presentation" style="text-align: justify;"><span style="vertical-align: baseline;">Application Monitoring </span><a href="https://cloud.google.com/stackdriver/docs/observability/about-application-monitoring"><span style="text-decoration: underline; vertical-align: baseline;">docs</span></a></p> </li> <li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"> <p role="presentation" style="text-align: justify;"><span style="vertical-align: baseline;">AppHub </span><a href="https://cloud.google.com/app-hub/docs/set-up-app-hub"><span style="text-decoration: underline; vertical-align: baseline;">docs</span></a></p> <ol style="list-style-type: lower-alpha;"> <li role="presentation" style="text-align: justify;"><span style="vertical-align: baseline;">Apphub </span><a href="https://cloud.google.com/app-hub/docs/supported-resources" style="font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, Cantarell, 'Open Sans', 'Helvetica Neue', sans-serif;"><span style="text-decoration: underline; vertical-align: baseline;">coverage docs</span></a></li> </ol> </li> </ol></div>
  63. Director of Engineering, Google Cloud

    Thu, 10 Jul 2025 09:30:00 -0000

    <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">At Google Cloud, we are committed to making it as seamless as possible for you to build and deploy the next generation of AI and agentic applications. Today, we’re thrilled to announce that we are </span><a href="https://docker.com/blog/build-ai-agents-with-docker-compose/" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">collaborating with Docker</span></a><span style="vertical-align: baseline;"> to drastically simplify your deployment workflows, enabling you to bring your sophisticated AI applications from local development to </span><a href="https://cloud.google.com/run"><span style="text-decoration: underline; vertical-align: baseline;">Cloud Run</span></a><span style="vertical-align: baseline;"> with ease. </span></p> <h3><strong style="vertical-align: baseline;">Deploy your compose.yaml directly to Cloud Run</strong></h3> <p><span style="vertical-align: baseline;">Previously, bridging the gap between your development environment and managed platforms like Cloud Run required you to manually translate and configure your infrastructure. Agentic applications that use MCP servers and self-hosted models added additional complexity. </span></p> <p><span style="vertical-align: baseline;">The open-source </span><a href="http://compose-spec.io" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">Compose Specification</span></a><span style="vertical-align: baseline;"> is one of the most popular ways for developers to iterate on complex applications in their local environment, and is the basis of Docker Compose. And now, </span><strong style="vertical-align: baseline;">gcloud run compose up</strong><span style="vertical-align: baseline;"> brings the simplicity of Docker Compose to Cloud Run, automating this entire process. Now in </span><a href="https://forms.gle/XDHCkbGPWWcjx9mk9" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">private preview</span></a><span style="vertical-align: baseline;">, you can deploy your existing</span><code style="vertical-align: baseline;"> compose.yaml</code><span style="vertical-align: baseline;"> file to Cloud Run with a single command, including building containers from source and leveraging Cloud Run’s volume mounts for data persistence.  </span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/compose.gif" alt="compose"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">Supporting the Compose Specification with Cloud Run makes for easy transitions across your local and cloud deployments, where you can keep the same configuration format, ensuring consistency and accelerating your dev cycle.</span></p> <p style="padding-left: 40px;"><span style="font-style: italic; vertical-align: baseline;">“We’ve recently evolved Docker Compose to support agentic applications, and we’re excited to see that innovation extend to Google Cloud Run with support for GPU-backed execution. Using Docker and Cloud Run, developers can now iterate locally and deploy intelligent agents to production at scale with a single command. It’s a major step forward in making AI-native development accessible and composable. We’re looking forward to continuing our close collaboration with Google Cloud to simplify how developers build and run the next generation of intelligent applications.” - </span><span style="vertical-align: baseline;">Tushar Jain, EVP Engineering and Product, Docker</span></p> <h3><strong style="vertical-align: baseline;">Cloud Run, your home for AI applications</strong></h3> <p><span style="vertical-align: baseline;">Support for the compose spec isn’t the only AI-friendly innovation you’ll find in Cloud Run. We recently announced </span><a href="https://cloud.google.com/blog/products/serverless/cloud-run-gpus-are-now-generally-available"><span style="text-decoration: underline; vertical-align: baseline;">general availability of Cloud Run GPUs</span></a><span style="vertical-align: baseline;">, removing a significant barrier to entry for developers who want access to GPUs for AI workloads. With its pay-per-second billing, scale to zero, and rapid scaling (which takes approximately 19 seconds for a gemma3:4b model for time-to-first-token), Cloud Run is a great hosting solution for deploying and serving LLMs. </span></p> <p><span style="vertical-align: baseline;">This also makes Cloud Run a strong solution for Docker’s recently </span><a href="https://www.docker.com/blog/docker-mcp-gateway-secure-infrastructure-for-agentic-ai/" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">announced</span></a><span style="vertical-align: baseline;"> OSS MCP Gateway and Model Runner, making it easy for developers to take the AI applications locally to production in the cloud seamlessly. By supporting Docker’s recent addition of </span><a href="https://github.com/compose-spec/compose-spec/blob/main/spec.md#models" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">‘models’ to the open Compose Spec</span></a><span style="vertical-align: baseline;">, you can deploy these complex solutions to the cloud with a single command.  </span></p> <h3><strong style="vertical-align: baseline;">Bringing it all together</strong></h3> <p><span style="vertical-align: baseline;">Let's review the compose file for the above demo. It consists of a multi-container application (defined in </span><code style="vertical-align: baseline;">services</code><span style="vertical-align: baseline;">) built from sources and leveraging a storage volume (defined in </span><code style="vertical-align: baseline;">volumes</code><span style="vertical-align: baseline;">). It also uses the new </span><code style="vertical-align: baseline;">models</code><span style="vertical-align: baseline;"> attribute to define AI models and a Cloud Run-extension defining the runtime image to use:</span></p></div> <div class="block-code"><dl> <dt>code_block</dt> <dd>&lt;ListValue: [StructValue([(&#x27;code&#x27;, &#x27;name: agent\r\nservices:\r\n webapp:\r\n build: .\r\n ports:\r\n - &quot;8080:8080&quot;\r\n volumes:\r\n - web_images:/assets/images\r\n depends_on:\r\n - adk\r\n\r\n adk:\r\n image: us-central1-docker.pkg.dev/jmahood-demo/adk:latest\r\n ports:\r\n - &quot;3000:3000&quot;\r\n models:\r\n - ai-model\r\n\r\nmodels:\r\n ai-model:\r\n model: ai/gemma3-qat:4B-Q4_K_M\r\n x-google-cloudrun:\r\n inference-endpoint: docker/model-runner:latest-cuda12.2.2\r\n\r\nvolumes:\r\n web_images:&#x27;), (&#x27;language&#x27;, &#x27;&#x27;), (&#x27;caption&#x27;, &lt;wagtail.rich_text.RichText object at 0x7f17a944ffd0&gt;)])]&gt;</dd> </dl></div> <div class="block-paragraph_advanced"><h3><strong style="vertical-align: baseline;">Building the future of AI</strong></h3> <p><span style="vertical-align: baseline;">We’re committed to offering developers maximum flexibility and choice by adopting open standards and supporting various agent frameworks.</span><strong style="vertical-align: baseline;"> </strong><span style="vertical-align: baseline;">This collaboration on Cloud Run and Docker is another example of how we aim to simplify the process for developers to build and deploy intelligent applications. </span></p> <p><span style="vertical-align: baseline;">Compose Specification support is available for our trusted users — </span><a href="https://forms.gle/XDHCkbGPWWcjx9mk9" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">sign up here for the private preview</span></a><span style="vertical-align: baseline;">. </span></p></div>
  64. Principal Platform Engineer, John Lewis Partnership

    Thu, 26 Jun 2025 16:00:00 -0000

    <div class="block-paragraph_advanced"><p><strong style="font-style: italic; vertical-align: baseline;">Editor's note:</strong><span style="font-style: italic; vertical-align: baseline;"> This is part one of the story. After you’re finished reading, head over to </span><a href="https://cloud.google.com/blog/products/application-development/simplifying-platform-engineering-at-john-lewis-part-two"><span style="font-style: italic; text-decoration: underline; vertical-align: baseline;">part two</span></a><span style="font-style: italic; vertical-align: baseline;">. </span></p> <hr/> <p><span style="vertical-align: baseline;">In 2017, John Lewis, a major UK retailer with a £2.5bn annual online turnover, was hampered by its monolithic e-commerce platform. This outdated approach led to significant cross-team dependencies, cumbersome and infrequent releases (monthly at best), and excessive manual testing, all further hindered by complex on-premises infrastructure. What was needed were some bold decisions to drive a quick and significant transformation.</span></p> <p><span style="vertical-align: baseline;">The John Lewis engineers knew there was a better way. Working with Google Cloud, they modernized their e-commerce operations with </span><a href="https://cloud.google.com/kubernetes-engine"><span style="text-decoration: underline; vertical-align: baseline;">Google Kubernetes Engine</span></a><span style="vertical-align: baseline;">. They started with the frontend, and started to see results fast: the frontend was moved onto Google Cloud in mere months, releases to the frontend browser journey started to happen weekly, and the business gladly backed expansion into other areas.</span></p> <p><span style="vertical-align: baseline;">At the same time, the team had a broader strategy in mind: to take </span><a href="https://cloud.google.com/solutions/platform-engineering"><span style="text-decoration: underline; vertical-align: baseline;">a platform engineering approach</span></a><span style="vertical-align: baseline;">, creating many product teams who built their own microservices to replace the functionality of the legacy commerce engine, as well as creating brand new experiences for customers. </span></p> <p><span style="vertical-align: baseline;">And so The John Lewis Digital Platform was born. The vision was to empower development teams and arm them with the tools and processes they needed to go to market fast, with full ownership of their own business services. The team’s motto? "You Build It. You Run It. You Own It." This decentralization of development and operational responsibilities would also enable the team to scale. </span></p> <p><span style="vertical-align: baseline;">This article features insights from Principal Platform Engineer Alex Moss, who delves into their strategy, platform build, and key learnings of John Lewis’ journey to modernize and streamline its operations with platform engineering — so you can begin to think about how you might apply platform engineering to your own organization.</span></p></div> <div class="block-aside"><dl> <dt>aside_block</dt> <dd>&lt;ListValue: [StructValue([(&#x27;title&#x27;, &#x27;Try Google Cloud for free&#x27;), (&#x27;body&#x27;, &lt;wagtail.rich_text.RichText object at 0x7f17aa22e220&gt;), (&#x27;btn_text&#x27;, &#x27;Get started for free&#x27;), (&#x27;href&#x27;, &#x27;https://console.cloud.google.com/freetrial?redirectPath=/welcome&#x27;), (&#x27;image&#x27;, None)])]&gt;</dd> </dl></div> <div class="block-paragraph_advanced"><h3><strong style="vertical-align: baseline;">Step 1: From monolithic to multi-tenant</strong></h3> <p><span style="vertical-align: baseline;">In order to make this happen, John Lewis needed to adopt a multi-tenant architecture — one tenant for each business service, allowing each owning team to work independently without risk to others -- and thereby permitting the Platform team to give the team a greater degree of freedom.</span></p> <p><span style="vertical-align: baseline;">Knowing that the business' primary objective was to greatly increase the number of product teams helped inform our initial design thinking, positioning ourselves to enable many independent teams even though we only had a handful of tenants. </span></p> <p><span style="vertical-align: baseline;">This foundational design has served us very well and is largely unchanged now, seven years later. Central to the multi-tenant concept is what we chose to term a "Service" — a logical business application, usually composed of several microservices plus components for storing data.</span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/article1-image1.max-1000x1000.png" alt="article1-image1"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">We largely position our platform as a “bring your own container” experience, but encourage teams to make use of other Google Cloud services — particularly for handling state. Adopting services like Firestore and Pub/Sub reduces the complexity that our platform team has to work with, particularly for areas like resilience and disaster recovery. We also favor Kubernetes over compute products like Cloud Run because it strikes the right balance for us between enabling development teams to have freedom whilst allowing our platform to drive certain certain behaviours, e.g., the right level of guardrails, without introducing too much friction.</span></p> <p><span style="vertical-align: baseline;">On our platform, Product Teams (i.e., tenants) have a large amount of control over their own Namespaces and Projects. This allows them to prototype, build, and ultimately operate, their workloads without dependency on others — a crucial element of enabling scale. </span></p> <p><span style="vertical-align: baseline;">Our early-adopter teams were extremely helpful in helping evolve the platform; they were accepting of the lack of features and willing to develop their own solutions, and provided very rich feedback on whether we were building something that met their needs.</span></p> <p><span style="vertical-align: baseline;">The first tenant to adopt the platform was rebuilding the </span><a href="http://johnlewis.com" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">johnlewis.com</span></a><span style="vertical-align: baseline;">, search capability, replacing a commercial-off-the-shelf solution. This team was staffed with experienced engineers familiar with modern software development and the advantages of a microservice-based architecture. They quickly identified the need for supporting services for their application to store data and asynchronously communicate between their components. They worked with the Platform Team to identify options, and were onboard with our desire to lean into Google Cloud native services to avoid running our own databases or messaging. This led to us adopting Cloud Datastore and Pub/Sub for our first features that extended beyond Google Kubernetes Engine.</span></p> <h3><strong style="vertical-align: baseline;">All roads lead to success</strong></h3> <p><span style="vertical-align: baseline;">A risk with a platform that allows very high team autonomy is that it can turn into a bit of a wild-west of technology choices and implementation patterns. To handle this, but to do so in a way that remained developer-centric, we adopted the concept of a </span><strong style="vertical-align: baseline;">paved road, </strong><span style="vertical-align: baseline;"> analogous to a “golden path.” </span></p> <p><span style="vertical-align: baseline;">We found that the paved road approach made it easier to:</span></p> <ul> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">build useful platform features to help developers do things rapidly and safely</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">share approaches and techniques, and engineers to move between teams</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">demonstrate to the wider organisation that teams are following required practices (which we do by building assurance capabilities, </span><strong style="vertical-align: baseline;">not </strong><span style="vertical-align: baseline;">by gating release)</span></p> </li> </ul> <p><span style="vertical-align: baseline;">The concept of the paved road permeates most of what the platform builds, and has inspired other areas of the John Lewis Partnership beyond the John Lewis Digital space.</span></p> <p><span style="vertical-align: baseline;">Our paved road is powered by two key features to enable simplification for teams:</span></p> <ol> <li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">The Paved Road Pipeline</strong><span style="vertical-align: baseline;">. This operates on the whole Service and drives capabilities such as Google Cloud resource provisioning and observability tools.</span></p> </li> <li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">The Microservice CRD</strong><span style="vertical-align: baseline;">. As the name implies, this is an abstraction at the microservice level. The majority of the benefit here is in making it easier for teams to work with Kubernetes.</span></p> </li> </ol> <p><span style="vertical-align: baseline;">Whilst both features were created with the developer experience in mind, we discovered that they also hold a number of benefits for the platform team too.</span></p> <p><span style="vertical-align: baseline;">The Paved Road Pipeline is driven by a configuration file — in yaml (of course!) — which we call the Service Definition. This allows </span><strong style="vertical-align: baseline;">the team that owns the tenancy</strong><span style="vertical-align: baseline;"> to describe, through easy-to-reason-about configuration, what they would like the platform to provide for them. Supporting documentation and examples help them understand what can be achieved. Pushes to this file then drive a CI/CD pipeline for a number of platform-owned jobs, which we refer to as provisioners. These provisioners are microservices-like themselves in that they are independently releasable and generally focus on performing one task well. Here are some examples of our provisioners and what they can do:</span></p> <ul> <li role="presentation"><span style="vertical-align: baseline;">Create Google Cloud resources in a tenant’s Project. For example, </span><a href="https://cloud.google.com/storage/docs/creating-buckets"><span style="text-decoration: underline; vertical-align: baseline;">Buckets</span></a><span style="vertical-align: baseline;">, </span><a href="https://cloud.google.com/pubsub/docs/overview"><span style="text-decoration: underline; vertical-align: baseline;">PubSub</span></a><span style="vertical-align: baseline;">, and </span><a href="https://firebase.google.com/docs/firestore" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">Firestore</span></a><span style="vertical-align: baseline;"> — amongst many others</span></li> <li role="presentation"><span style="vertical-align: baseline;">Configure platform-provided dashboards and custom dashboards based on golden-signal and self-instrumented metrics</span></li> <li role="presentation"><span style="vertical-align: baseline;">Tune alert configurations for a given microservice’s SLOs, and the incident response behaviour for those alerts</span></li> </ul></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/article1-image2.max-1000x1000.png" alt="article1-image2"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">Our product teams are therefore freed from the need to familiarize themselves deeply with how Google Cloud resource provisioning works, or Infrastructure-as-Code (IaC) tooling for that matter. Our preferred technologies and good practices can be curated by our experts, and developers can focus on building differentiating software for the business, while remaining fully in control of what is provisioned and when.</span></p> <p><span style="vertical-align: baseline;">Earlier, we mentioned that this approach has the added benefit of being something that the platform team can rely upon to build their own features. The configuration updated by teams for their Service can be combined with metadata about their team and surfaced via an API and events published to Pub/Sub. This can then drive updates to other features like incident response and security tooling, pre-provision documentation repositories, and more. This is an example of how something that was originally intended as a means to help teams avoid writing their own IaC can also be used to make it easier for us to build platform features, further improving the value-add — without the developer even needing to be aware of it!</span></p> <p><span style="vertical-align: baseline;">We think this approach is also more scalable than providing pre-built Terraform modules for teams to use. That approach still burdens teams with being familiar with Terraform, and versioning and dependency complexities can create maintenance headaches for platform engineers. Instead, we provide an easy-to-reason-about API and </span><strong style="vertical-align: baseline;">deliberately burden the platform team,</strong><span style="vertical-align: baseline;"> ensuring that the Service provides all the functionality our tenants require. This abstraction also means we can make significant refactoring choices if we need to.</span></p> <p><span style="vertical-align: baseline;">Adopting this approach also results in a broad consistency in technologies across our platform. For example, why would a team implement Kafka when the platform makes creating resources in Pub/Sub so easy? When you consider that this spans not just the runtime components that assemble into a working business service, but also all the ancillary needs for operating that software — resilience engineering, monitoring &amp; alerting, incident response, security tooling, service management, and so on—  this has a massive amplifying effect on our engineers’ productivity. All of these areas have full paved road capabilities on the John Lewis Digital Platform, reducing the cognitive load for teams in recognizing the need for, identifying appropriate options, and then implementing technology or processes to use them.</span></p> <p><span style="vertical-align: baseline;">That being said, one of the reasons we particularly like the paved road concept is because it doesn't preclude teams choosing to "go off-road." A paved road shouldn’t be mandatory, but it should be compelling to use, so that engineers aren’t tempted to do something else. Preventing use of other approaches risks stifling innovation and the temptation to think the features you've built are "good enough." The paved road challenges our Platform Engineers to keep improving their product so that it continues to meet our Developers' changing needs. Likewise, development teams tempted to go off-road are put off by the increasing burden of replicating powerful platform features. </span></p> <p><span style="vertical-align: baseline;">The needs of our Engineers don’t remain fixed, and Google Cloud are of course releasing new capabilities all the time, so we have extended the analogy to include a “dusty path” representing brand new platform features that aren’t as feature-rich as we’d like (perhaps they lack self-service provisioning or out-the-box observability). Teams are trusted to try different options and make use of Google Cloud products that we haven't yet paved. The Paved Road Pipeline allows for this experimentation - what we term "snowflaking". We then have an unofficial "rule of three", whereby if we notice at least 3 teams requesting the same feature, we move to make the use of it self-service.</span></p> <p><span style="vertical-align: baseline;">At the other end of the scale, teams can go completely solo — which we refer to as “crazy paving” — and might be needed to support wild experimentation or to accommodate a workload which cannot comply with the platform’s expectations for safe operation. Solutions in this space are generally not long-lived.</span></p> <p><span style="vertical-align: baseline;">In this article, we've covered how John Lewis revolutionized its e-commerce operations by adopting a multi-tenant, "paved road" approach to platform engineering. We explored how this strategy empowered development teams and streamlined their ability to provision Google Cloud resources and deploy operational and security features.</span></p> <p><span><span style="vertical-align: baseline;">In </span><a href="https://cloud.google.com/blog/products/application-development/simplifying-platform-engineering-at-john-lewis-part-two?e=48754805"><span style="text-decoration: underline; vertical-align: baseline;">part 2</span></a><span style="vertical-align: baseline;"> of this series, we'll dive deeper into how John Lewis further simplified the developer experience by introducing the Microservice CRD. You'll discover how this custom Kubernetes abstraction significantly reduced the complexity of working with Kubernetes at the component level, leading to faster development cycles and enhanced operational efficiency.</span></span></p> <p><span style="vertical-align: baseline;">To learn more about shifting down with platform engineering on Google Cloud, you can find more information available </span><a href="https://cloud.google.com/solutions/platform-engineering"><span style="text-decoration: underline; vertical-align: baseline;">here</span></a><span style="vertical-align: baseline;">. To learn more about how Google Kubernetes Engine (GKE) empowers developers to effortlessly deploy, scale, and manage containerized applications with its fully managed, robust, and intelligent Kubernetes service, you can find more information </span><a href="https://cloud.google.com/kubernetes-engine"><span style="text-decoration: underline; vertical-align: baseline;">here</span></a><span style="vertical-align: baseline;">.</span></p></div>
  65. Principal Platform Engineer, John Lewis Partnership

    Thu, 26 Jun 2025 16:00:00 -0000

    <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">In our </span><a href="https://cloud.google.com/blog/products/application-development/simplifying-platform-engineering-at-john-lewis-part-one"><span style="text-decoration: underline; vertical-align: baseline;">previous article</span></a><span style="vertical-align: baseline;"> we introduced the John Lewis Digital Platform and its approach to simplifying the developer experience through platform engineering and so-called paved road features. We focused on the ways that platform engineering enables teams to create resources in Google Cloud and deploy the platform's operational and security features within dedicated tenant environments. In this article, we will build upon that concept for the next level of detail — how the platform simplifies build and run at a component (typically for us, a microservice) level too.</span></p> <p><span style="vertical-align: baseline;">Within just over a year, the John Lewis Digital Platform had fully evolved into a product. We had approximately 25 teams using our platform, with several key parts of the johnlewis.com retail website running in production. We had built a self-service capability to help teams provision resources in Google Cloud, and firmly established that the foundation of our platform was on Google Kubernetes Engine (GKE). But we were hearing signals from some of the recent teams that there was a learning curve to Kubernetes. This was expected — we were driving a cultural change for teams to build and run their own services, and so we anticipated that our application developers would need some Kubernetes skills to support their own software. But our vision was that we wanted to make developers' lives easier — and their feedback was clear. In some cases, we observed that teams weren't following "good practice"  (despite the existence of good documentation!) such as not using anti-affinity rules or </span><code style="vertical-align: baseline;">PodDisruptionBudgets</code><span style="vertical-align: baseline;"> to help their workloads tolerate failure.</span></p></div> <div class="block-aside"><dl> <dt>aside_block</dt> <dd>&lt;ListValue: [StructValue([(&#x27;title&#x27;, &#x27;Try Google Cloud for free&#x27;), (&#x27;body&#x27;, &lt;wagtail.rich_text.RichText object at 0x7f17aa227280&gt;), (&#x27;btn_text&#x27;, &#x27;Get started for free&#x27;), (&#x27;href&#x27;, &#x27;https://console.cloud.google.com/freetrial?redirectPath=/welcome&#x27;), (&#x27;image&#x27;, None)])]&gt;</dd> </dl></div> <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">All the way back in 2017, Kelsey Hightower wrote: “</span><span style="font-style: italic; vertical-align: baseline;">Kubernetes is a platform for building platforms. It's a better place to start, not the endgame.”</span></p> <p><span style="vertical-align: baseline;">Kelsey's quote inspired us to act. We had the idea to write our own custom controller to simplify the point of interaction for a developer with Kubernetes — a John Lewis-specific abstraction that aligned to our preferred approaches. And thus the JL </span><code style="vertical-align: baseline;">Microservice</code><span style="vertical-align: baseline;"> was born.</span></p> <p><span style="vertical-align: baseline;">To do this, we declared a Kubernetes  </span><code style="vertical-align: baseline;">CustomResourceDefinition</code><span style="vertical-align: baseline;"> with a simplified specification containing just the fields we felt our developers needed to set. For example, as we expect our tenants to build and operate their applications themselves, attributes such as the number of replicas and the amount of resources needed are best left up to the developers themselves. But do they really need to be able to customize the rules defining how to distribute pods across nodes? How often do they need to change the </span><code style="vertical-align: baseline;">Service</code><span style="vertical-align: baseline;"> pointing towards their </span><code style="vertical-align: baseline;">Deployment</code><span style="vertical-align: baseline;">? When we looked closer, we realized just how much duplication there was — our analysis at the time suggested that only around 33% of the lines in the yaml files developers were producing were relevant to their application. This was a target-rich scenario for simplification.</span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/article2-image1.max-1000x1000.png" alt="article2-image1"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">To help us build this feature, we selected </span><a href="https://github.com/kubernetes-sigs/kubebuilder" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">Kubebuilder,</span></a><span style="vertical-align: baseline;">  using it to declare our </span><code style="vertical-align: baseline;">CustomResourceDefinition</code><span style="vertical-align: baseline;"> and then build the Controller (what we call </span><code style="vertical-align: baseline;">MicroserviceManager</code><span style="vertical-align: baseline;">). This turned out to be a beneficial decision — initial prototyping was quick, and the feature was launched a few months later, and very well-received. Our team had to skill up in the </span><a href="https://go.dev/" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">Go programming language</span></a><span style="vertical-align: baseline;">, but this trade-off felt worthwhile due to the advantages Kubebuilder was bringing to the table, and it has continued to be helpful for other software engineering since.</span></p> <p><span style="vertical-align: baseline;">The initial implementation replaced an engineer's need to understand and fully configure a </span><code style="vertical-align: baseline;">Deployment</code><span style="vertical-align: baseline;"> and </span><code style="vertical-align: baseline;">Service</code><span style="vertical-align: baseline;">, instead applying a much briefer yaml file containing only the fields they need to change. As well as direct translation of identical fields (</span><code style="vertical-align: baseline;">image</code><span style="vertical-align: baseline;"> and </span><code style="vertical-align: baseline;">replicas </code><span style="vertical-align: baseline;">are equivalent to what you would see in a </span><code style="vertical-align: baseline;">Deployment</code><span style="vertical-align: baseline;">, for example), it also allowed us to simplify the choices made by the Kubernetes APIs, because in John Lewis we didn't need some of that functionality. For example, </span><code style="vertical-align: baseline;">writablePaths: []</code><span style="vertical-align: baseline;"> is an easy concept for our engineers to understand, and behind the scenes, our controller is converting those into the more complex combination of </span><code style="vertical-align: baseline;">Volumes </code><span style="vertical-align: baseline;">and </span><code style="vertical-align: baseline;">VolumeMounts</code><span style="vertical-align: baseline;">. Likewise, </span><code style="vertical-align: baseline;">visibleToOtherServices: true</code><span style="vertical-align: baseline;"> is an example of us simplifying the interaction with Kubernetes </span><code style="vertical-align: baseline;">NetworkPolicy</code><span style="vertical-align: baseline;"> — rather than requiring teams to read our documentation to understand the necessary incantations to label their resources correctly, the controller understands those conventions and handles it for them.</span></p> <p><span style="vertical-align: baseline;">With the core concept of the </span><code style="vertical-align: baseline;">Microservice </code><span style="vertical-align: baseline;">resource established, we were able to improve the value-add by augmenting it with further features. We rapidly extended it out to define our Prometheus scrape configuration, then more complex features such as allowing teams to declare that they use Google Cloud Endpoints, and have the controller inject the necessary sidecar container into their </span><code style="vertical-align: baseline;">Deployment</code><span style="vertical-align: baseline;"> and wiring it up to the </span><code style="vertical-align: baseline;">Service</code><span style="vertical-align: baseline;">. As we added more features, existing tenants converted to use this specification, and it now makes up the majority of workloads declared on the platform.</span></p> <h3><strong style="vertical-align: baseline;">Moving the platform boundary</strong></h3> <p><span style="vertical-align: baseline;">Our motivation to build MicroserviceManager was focused on making developers' lives easier. But we discovered an additional benefit that we had not initially expected - it was something we could greatly benefit from </span><span style="font-style: italic; vertical-align: baseline;">within</span><span style="vertical-align: baseline;"> the platform as well. It enabled us to make changes behind the scenes without needing to involve our tenants — reducing toil for them and making it easier for us to improve our product. This was a slightly unexpected but an exceptionally powerful benefit. It is generally difficult to change the agreement that you’ve established between your tenants and the platform, and creating an abstraction like this has allowed us to bring more under our control, for everyone’s benefit.</span></p> <p><span style="vertical-align: baseline;">An example of this was something we observed through our live load testing of johnlewis.com when certain workloads burst up to several hundred </span><code style="vertical-align: baseline;">Pods</code><span style="vertical-align: baseline;"> — numbers that exceeded the typical number of </span><code style="vertical-align: baseline;">Nodes</code><span style="vertical-align: baseline;"> we had running in the cluster. This led to new </span><code style="vertical-align: baseline;">Node</code><span style="vertical-align: baseline;"> creation — therefore slower </span><code style="vertical-align: baseline;">Pod</code><span style="vertical-align: baseline;"> autoscaling and poor bin-packing. Experienced Kubernetes operators can probably guess what was happening here: our default antiAffinity rules were set to optimize for resilience such that no more than one replica was allowed on any given </span><code style="vertical-align: baseline;">Node</code><span style="vertical-align: baseline;">. The good news though was that because the workloads were under the control of our Microservice Manager, rather than us having to instruct our tenants to copy the relevant yaml into their Deployments, it was a straightforward change for us to replace the antiAffinity rules with the more modern </span><a href="https://kubernetes.io/docs/concepts/scheduling-eviction/topology-spread-constraints/" rel="noopener" target="_blank"><code style="text-decoration: underline; vertical-align: baseline;">podTopologyConstraints</code></a><span style="vertical-align: baseline;">, allowing us to customize the number of replicas that could be stacked on a Node for workloads exceeding a certain replica count. And this happened with no intervention from our tenants.</span></p> <p><span style="vertical-align: baseline;">A more complex example of this was when we rolled out our service mesh. In keeping with our general desire to let Google Cloud handle the complexity of running control planes components, we opted to use </span><a href="https://cloud.google.com/products/service-mesh"><span style="text-decoration: underline; vertical-align: baseline;">Google's Cloud Service Mesh</span></a><span style="vertical-align: baseline;"> product. But even then, rolling out a mesh to a business-critical platform in constant use is not without its risks. Microservice Manager allowed us to control the rate at which we enrolled workloads into the mesh through the use of a feature flag on the </span><code style="vertical-align: baseline;">Microservice</code><span style="vertical-align: baseline;"> resource. We could start rollout with platform-owned workloads first to test our approach, then make tenants aware of the flag for early adopters to validate and take advantage of some of Cloud Service Mesh’s features. To scale the rollout, we could then manipulate the flag to release in waves based on business importance, providing an opt-out mechanism if needed to. This again greatly simplified the implementation — product teams had very little to do, and we avoided having to chase approximately 40 teams running hundreds of Microservices to make the appropriate changes in their configuration. This feature flagging technique is something we make extensive use of to support our own experimentation.</span></p> <h3><strong style="vertical-align: baseline;">Beyond the microservice</strong></h3> <p><span style="vertical-align: baseline;">Building the Microservice Manager has led to further thinking in Kubernetes-native ways: the </span><a href="https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/#customresourcedefinitions" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">Custom Resource + Controller concept</span></a><span style="vertical-align: baseline;"> is a powerful technique, and we have built other features since using it. One example is a controller that converts the need for external connectivity into Istio resources to route via our egress gateway. Istio in particular is an example of a very powerful platform capability that comes with a high cognitive load for its users, and so is a perfect example of where platform engineering can help manage that for teams whilst still allowing them to take advantage of it. We have a number of ideas in this area now that our confidence in the technology has grown.</span></p> <p><span style="vertical-align: baseline;">In summary, the John Lewis Partnership leveraged Google Cloud and platform engineering to modernize their e-commerce operations and developer experience. By implementing a "paved road" approach with a multi-tenant architecture, they empowered development teams, accelerated deployment cycles, and simplified Kubernetes interactions using a custom Microservice CRD. This strategy allowed them to scale effectively and enhance the developer experience by reducing complexity while maintaining operational efficiency and scaling engineering teams effectively.</span></p> <p><span style="vertical-align: baseline;">To learn more about platform engineering on Google Cloud, check out some of our other articles:</span><span style="vertical-align: baseline;"> </span><a href="https://cloud.google.com/blog/products/application-development/common-myths-about-platform-engineering"><span style="text-decoration: underline; vertical-align: baseline;">5 myths about platform engineering: what it is and what it isn’t</span></a><span style="vertical-align: baseline;">, </span><a href="https://cloud.google.com/blog/products/application-development/another-five-myths-about-platform-engineering"><span style="text-decoration: underline; vertical-align: baseline;">Another five myths about platform engineering</span></a><span style="vertical-align: baseline;">, and </span><a href="https://cloud.google.com/blog/products/application-development/golden-paths-for-engineering-execution-consistency"><span style="text-decoration: underline; vertical-align: baseline;">Light the way ahead: Platform Engineering, Golden Paths, and the power of self-service</span></a><span style="vertical-align: baseline;">.</span></p></div>
  66. Sr. Staff UX Designer

    Wed, 28 May 2025 16:00:00 -0000

    <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">In the event of a cloud incident, everyone wants swift and clear communication from the cloud provider, and to be able to leverage that information effectively. </span><a href="https://cloud.google.com/blog/products/devops-sre/personalized-service-health-is-now-generally-available?e=48754805?utm_source%3Dmarketingweb"><span style="text-decoration: underline; vertical-align: baseline;">Personalized Service Health</span></a><span style="vertical-align: baseline;"> in the Google Cloud console addresses this need with fast, transparent, relevant, and actionable communications about Google Cloud service disruptions, customized to your specific footprint. This helps you to quickly identify the source of the problem, helping you answer the question, “Is it Google or is it me?” You can then integrate this information into your incident response workflows to resolve the incident more efficiently.</span></p> <p><span style="vertical-align: baseline;">We're excited to announce that you can prompt </span><a href="https://g.co/kgs/j2BVWVE" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">Gemini Cloud Assist</span></a><span style="vertical-align: baseline;"> to pull real-time information about active incidents, powered by Personalized Service Health, providing you with streamlined incident management, including discovery, impact assessment, and recovery. By combining Gemini's guidance with Personalized Service Health insights and up-to-the-minute information, you can assess the scope of impact and begin troubleshooting – all within a single, AI-driven Gemini Cloud Assist chat. Further, you  can initiate this sort of incident discovery from anywhere within the console, offering immediate access to relevant incidents without interrupting your workflow. You can also check for active incidents impacting your projects, gathering details on their scope and the latest updates directly sourced from Personalized Service Health</span><span style="vertical-align: baseline;">.</span></p></div> <div class="block-aside"><dl> <dt>aside_block</dt> <dd>&lt;ListValue: [StructValue([(&#x27;title&#x27;, &#x27;Try Google Cloud for free&#x27;), (&#x27;body&#x27;, &lt;wagtail.rich_text.RichText object at 0x7f17a988a490&gt;), (&#x27;btn_text&#x27;, &#x27;Get started for free&#x27;), (&#x27;href&#x27;, &#x27;https://console.cloud.google.com/freetrial?redirectPath=/welcome&#x27;), (&#x27;image&#x27;, None)])]&gt;</dd> </dl></div> <div class="block-paragraph_advanced"><h3><strong style="vertical-align: baseline;">Using Gemini Cloud Assist with Personalized Service Health</strong></h3> <p><span style="vertical-align: baseline;">We designed Gemini Cloud Assist with a user-friendly layout and a well-organized information structure. Crucial details, including dynamic timelines, latest updates, symptoms, and workarounds sourced directly from Personalized Service Health, are now presented in the console, enabling conversational follow-ups. Gemini Cloud Assist highlights critical insights from Personalized Service Health, helping you refine your investigations and understand the impact of incidents.</span></p> <p><span style="vertical-align: baseline;">To illustrate the power of this integration, the following demo showcases a typical incident response workflow leveraging the combined capabilities of Gemini and Personalized Service Health.</span></p> <p><strong style="vertical-align: baseline;">Incident discovery and triage<br/></strong><span style="vertical-align: baseline;">In the crucial first moments of an incident, Gemini Cloud Assist helps you answer "Is it Google or is it me?" Gemini Cloud Assist accesses data directly from Personalized Service Health, and provides feedback on which projects and at what locations are affected by a Google Cloud incident, speeding up the triage process.</span></p> <p><span style="vertical-align: baseline;">To illustrate how you can start this process, try asking Gemini Cloud Assist questions like:</span></p> <ul> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">Is my project impacted by a Google Cloud incident?</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">Are there any incidents impacting Google Cloud at the moment?</span></p> </li> </ul></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/1_UpdatedNew.gif" alt="1 UpdatedNew"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><p><strong style="vertical-align: baseline;">Investigating and evaluating impact<br/></strong><span style="vertical-align: baseline;">Once you’ve identified a relevant Google Cloud incident, you can use Gemini Cloud Assist to delve deeper into the specifics and evaluate its impact on your environment. Furthermore, by asking follow-up questions, Gemini Cloud Assist can retrieve updates from Personalized Service Health about the incident as it evolves. You can then further investigate by asking Gemini to pinpoint exactly which of your apps or projects, and at what locations, might be affected by the reported incident.</span></p> <p><span style="vertical-align: baseline;">Here are examples of prompts you might pose to Gemini Cloud Assist:</span></p> <ul> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">Tell me more about the ongoing Incident ID [X] (Replace [X] with the Incident ID)</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">Is [X] impacted? (Replace [X] with your specific location or Google Cloud product)</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">What is the latest update on Incident ID [X]?</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">Show me the details of Incident ID [X].</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">Can you guide me through some troubleshooting steps for [impacted Google Cloud product]?</span></p> </li> </ul></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--medium h-c-grid__col h-c-grid__col--4 h-c-grid__col--offset-4 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/2_Updated.gif" alt="2"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><p><strong style="vertical-align: baseline;">Mitigation and recovery<br/></strong><span style="vertical-align: baseline;">Finally, Gemini Cloud Assist can also act as an intelligent assistant during the recovery phase, providing you with actionable guidance. You can gain access to relevant logs and monitoring data for more efficient resolution. Additionally, Gemini Cloud Assist can help surface potential workarounds from Personalized Service Health and direct you to the tools and information you need to restore your projects or applications. Here are some sample prompts:</span></p> <ul> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">What are the workarounds for the incident ID [X]? (Replace [X] with the Incident ID)</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">Can you suggest a temporary solution to keep my application running?</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">How can I find logs for this impacted project?</span></p> </li> </ul></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--medium h-c-grid__col h-c-grid__col--4 h-c-grid__col--offset-4 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/3_Updated_tpPYqpq.gif" alt="3 Updated"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">From these prompts, Gemini retrieves relevant information from Personalized Service Health to provide you with personalized insights into your Google Cloud environment's health — both for ongoing events and incidents from up to one year in the past. This helps when investigating an incident to narrow down its impact, as well as assisting in recovery. </span></p> <h3><strong style="vertical-align: baseline;">Next steps</strong></h3> <p><span style="vertical-align: baseline;">Looking ahead, we are excited to provide even deeper insights and more comprehensive incident management with Gemini Cloud Assist and Personalized Service Health, extending these AI-driven capabilities beyond a single project view. Ready to get started? </span></p> <ul> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">Learn more about </span><a href="https://cloud.google.com/service-health/docs/overview"><span style="text-decoration: underline; vertical-align: baseline;">Personalized Service Health</span></a><span style="vertical-align: baseline;">, or reach out to your account team to enable it.</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">Get started with </span><a href="https://cloud.google.com/products/gemini/cloud-assist?e=48754805?utm_source%3Dmarketingweb" style="font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, Cantarell, 'Open Sans', 'Helvetica Neue', sans-serif;"><span style="text-decoration: underline; vertical-align: baseline;">Gemini Cloud Assist</span></a><span style="vertical-align: baseline;">. Refine your prompts to ask about your specific regions or Google Cloud products, and experiment to discover how it can help you proactively manage incidents.</span></p> </li> </ul></div> <div class="block-related_article_tout"> <div class="uni-related-article-tout h-c-page"> <section class="h-c-grid"> <a href="https://cloud.google.com/blog/products/devops-sre/personalized-service-health-is-now-generally-available/" data-analytics='{ "event": "page interaction", "category": "article lead", "action": "related article - inline", "label": "article: {slug}" }' class="uni-related-article-tout__wrapper h-c-grid__col h-c-grid__col--8 h-c-grid__col-m--6 h-c-grid__col-l--6 h-c-grid__col--offset-2 h-c-grid__col-m--offset-3 h-c-grid__col-l--offset-3 uni-click-tracker"> <div class="uni-related-article-tout__inner-wrapper"> <p class="uni-related-article-tout__eyebrow h-c-eyebrow">Related Article</p> <div class="uni-related-article-tout__content-wrapper"> <div class="uni-related-article-tout__image-wrapper"> <div class="uni-related-article-tout__image" style="background-image: url('https://storage.googleapis.com/gweb-cloudblog-publish/images/psh-hero_Ty1sB8V.max-500x500.jpg')"></div> </div> <div class="uni-related-article-tout__content"> <h4 class="uni-related-article-tout__header h-has-bottom-margin">Personalized Service Health is now generally available: Get started today</h4> <p class="uni-related-article-tout__body">Personalized Service Health provides visibility into incidents relevant to your environment, allowing you to evaluate their impact and tr...</p> <div class="cta module-cta h-c-copy uni-related-article-tout__cta muted"> <span class="nowrap">Read Article <svg class="icon h-c-icon" role="presentation"> <use xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="#mi-arrow-forward"></use> </svg> </span> </div> </div> </div> </div> </a> </section> </div> </div>
  67. AI Is Here to Stay. The Real Challenge Is Operating It Securely

    Sat, 13 Jun 2026 15:06:39 -0000

    <div><img width="1916" height="821" src="https://devops.com/wp-content/uploads/2026/06/ai_driven_secure_devops_pipeline_illustration.jpg" class="attachment-large size-large wp-post-image" alt="" style="margin-bottom: 0px;" decoding="async" srcset="https://devops.com/wp-content/uploads/2026/06/ai_driven_secure_devops_pipeline_illustration.jpg 1916w, https://devops.com/wp-content/uploads/2026/06/ai_driven_secure_devops_pipeline_illustration-1536x658.jpg 1536w, https://devops.com/wp-content/uploads/2026/06/ai_driven_secure_devops_pipeline_illustration-290x124.jpg 290w, https://devops.com/wp-content/uploads/2026/06/ai_driven_secure_devops_pipeline_illustration-360x154.jpg 360w, https://devops.com/wp-content/uploads/2026/06/ai_driven_secure_devops_pipeline_illustration-400x171.jpg 400w" sizes="(max-width: 1916px) 100vw, 1916px" /></div><img width="150" height="150" src="https://devops.com/wp-content/uploads/2026/06/ai_driven_secure_devops_pipeline_illustration-150x150.jpg" class="attachment-thumbnail size-thumbnail wp-post-image" alt="" decoding="async" />AI-generated code is already in production. Whether we are comfortable with that or not is beside the point. In the OpenStack project, which I have helped steward for more than 15 years, we are seeing developers submit patches built with AI assistance, and sometimes patches composed almost entirely by AI tools. Some of those contributions [&#8230;]
    <div><img width="1916" height="821" src="https://devops.com/wp-content/uploads/2026/06/ai_driven_secure_devops_pipeline_illustration.jpg" class="attachment-large size-large wp-post-image" alt="" style="margin-bottom: 0px;" decoding="async" srcset="https://devops.com/wp-content/uploads/2026/06/ai_driven_secure_devops_pipeline_illustration.jpg 1916w, https://devops.com/wp-content/uploads/2026/06/ai_driven_secure_devops_pipeline_illustration-1536x658.jpg 1536w, https://devops.com/wp-content/uploads/2026/06/ai_driven_secure_devops_pipeline_illustration-290x124.jpg 290w, https://devops.com/wp-content/uploads/2026/06/ai_driven_secure_devops_pipeline_illustration-360x154.jpg 360w, https://devops.com/wp-content/uploads/2026/06/ai_driven_secure_devops_pipeline_illustration-400x171.jpg 400w" sizes="(max-width: 1916px) 100vw, 1916px" /></div><img width="150" height="150" src="https://devops.com/wp-content/uploads/2026/06/ai_driven_secure_devops_pipeline_illustration-150x150.jpg" class="attachment-thumbnail size-thumbnail wp-post-image" alt="" decoding="async" /><p class="p1">AI-generated code is already in production. Whether we are comfortable with that or not is beside the point. In the OpenStack project, which I have helped steward for more than 15 years, we are seeing developers submit patches built with AI assistance, and sometimes patches composed almost entirely by AI tools. Some of those contributions have already landed in the past release cycle. This is happening in one of the most rigorously governed open source projects in the world. It is happening everywhere else, too.</p> <p class="p2">The code generation itself is not the problem. AI is genuinely good at producing computer programs because the structure of code is sufficiently predictable and syntactically constrained to play to the technology&#8217;s strengths. The problem is what happens next. Every AI-generated patch still needs to be reviewed for correctness, security, and long-term maintainability. And when code is easier to produce, more code gets proposed, which puts enormous pressure on the human reviewers who are the last line of defense before that code ships.</p> <h3 class="p3"><b>When AI Agents Break Traditional Trust Models</b><b></b></h3> <p class="p1">The volume of AI-generated code is only part of the “defense” challenge. The bigger disruption is AI agents: these software systems that act autonomously on your behalf are loosely directed by humans at the outset and then left to execute. Unfortunately, in our haste to adopt AI, we are quickly and casually granting these agents permissions we would never grant to a human assistant. Think about the trust models we’ve built over decades around the principle of least privilege. We’ve learned the hard way to establish robust fail-safe measures, such as requiring two keys to launch critical operations and multiple signatures for approval workflows. And now AI agents have come along, and we’re suddenly granting full access to our email, our databases, our production environments, and saying to these AI agents, “Sure, go ahead!” When something goes wrong, we discover the damage after the fact.</p> <p class="p1"><b>The point is that the shiny features of these AI tools are well ahead of their security features.</b> The containment, the auditing, the ability to roll back what an agent did, and the ability to grant granular permissions rather than a wildcard to do anything: these capabilities remain underdeveloped. There is a wide-open field of innovation needed here.</p> <p class="p2">So what do we do about it? We apply the same engineering discipline that has made complex software systems reliable for decades, and we evolve it for an AI-driven world. The open source ecosystem already offers the building blocks. I’d like to share two examples.</p> <h3 class="p3"><b>Project Gating: How CI/CD Keeps AI-Generated Code in Check</b><b></b></h3> <p class="p1">The first is <a href="https://zuul-ci.org/index.html"><span class="s1">Zuul</span></a>, an open source system designed to properly gate code changes. Zuul was originally built for developing OpenStack in 2012 and is now used by organizations including <a href="https://www.youtube.com/watch?v=Z8rofKRen3w"><span class="s1">BMW</span></a>, <a href="https://www.youtube.com/watch?v=ZjY61I-8mpQ"><span class="s1">Volvo</span></a>, and <a href="https://www.youtube.com/watch?v=VnZQE8RpYCY"><span class="s1">Workday</span></a>. Zuul was created to embed good software engineering practices directly into CI pipelines. It does not simply run tests after code is merged. It tests the future state of the codebase by running proposed changes together with their dependencies across multiple repositories before anything is allowed to land. If a change would break something downstream, Zuul catches it before it reaches the main branch.</p> <p class="p1">Zuul’s approach was already valuable when humans wrote all the code, and it is essential now. As AI tools accelerate the volume of proposed changes, gating systems like Zuul ensure that the pace of contribution does not outrun the integrity of the system. The more code we produce with AI, the more we need automated enforcement of the practices that keep software manageable, secure, and functional. You cannot scale human review indefinitely, but you can build the most critical checks directly into your delivery pipeline.</p> <p class="p2">This is already working in practice. Monty Taylor, one of the original architects of OpenStack’s CI infrastructure, has been building with a team of AI coding agents since late 2025 using the same toolchain and workflow he has used for fifteen years. All changes go through code review in Gerrit, where AI review agents leave their feedback just like any other contributor. Zuul gates every commit regardless of who or what wrote it. Production deploys happen automatically after gated integration tests pass. <a href="https://zuul-ci.org/blog/"><span class="s1">As Monty put it in a recent blog post</span></a>: “Agents deleting databases isn’t an agent problem, it’s a system problem.” The problems AI agents bring are not new. What matters is whether you have the systems in place to catch them.</p> <p><iframe title="YouTube video player" src="https://www.youtube.com/embed/b_Q_Hp6-QPQ?si=F68Aqf5IARZXJsab" width="560" height="315" frameborder="0" allowfullscreen="allowfullscreen"></iframe></p> <h3 class="p3"><b>Hardware Isolation for AI Workloads: Why Containers Alone Are Not Enough</b><b></b></h3> <p class="p1">The second example is<span class="s2"> <a href="https://katacontainers.io/"><span class="s3">Kata Containers</span></a></span>, an open source project launched in 2017 to provide “the speed of containers, the security of VMs.” Kata Containers provides hardware-level isolation for container workloads. Whereas standard container runtimes share the host kernel (which means a workload that escapes its limited boundaries can move laterally across the system), Kata Containers runs each workload inside a lightweight virtual machine, giving it its own dedicated, simplified kernel. What’s great about using Kata Containers is that the performance trade-off is minimal, but the security gains are significant.</p> <p class="p1"><b>This matters enormously for AI workloads. </b>Today, organizations recognize that their proprietary data is one of their most valuable assets, and they are using AI to process it to inform business decisions. In multi-tenant environments, you do not want the data being processed by one AI agent to leak to another, and you do not want the infrastructure provider to access your precious data either. Kata provides protection of the host from the workload, and, through confidential computing, protection of the workload from the host. Both directions matter when the data being processed is, as people like to say, the new gold.</p> <p class="p2">The industry momentum behind this approach is hard to ignore. <a href="https://superuser.openinfra.org/articles/nvidia-kata-containers-trusted-ai-at-gpu-scale/"><span class="s1">NVIDIA runs its container-based AI inference workloads inside Kata Containers</span></a> whenever they operate in multi-tenant environments. <a href="https://katacontainers.io/blog/Kata-Containers-Agent-Sandbox-Integration/"><span class="s1">Google recently launched Agent Sandbox</span></a>, a Kubernetes-native framework for isolating AI agent workloads, and it integrates directly with Kata for VM-backed isolation. The Kubernetes community has adopted <a href="https://agent-sandbox.sigs.k8s.io/"><span class="s1">Agent Sandbox</span></a> as a CNCF project. When organizations at this scale converge on the same architectural pattern, it is worth paying attention.</p> <h3 class="p3"><b>What Enterprise Teams Should Do Now</b><b></b></h3> <p class="p1">There is one more dimension to operating AI securely that is worth underscoring. Both Zuul and Kata Containers are developed under open, neutral governance, in which no single company controls the roadmap. That matters for any infrastructure investment, but especially for the tools you trust to validate and contain your AI workloads. If you are making a significant investment in running your code production pipeline through a gating system or running your AI agents inside an isolation framework, you do not want that investment to depend on a single vendor that might change its licensing, its pricing, or its strategic direction. We have seen exactly that happen repeatedly in recent years, sometimes resulting in tenfold price increases for locked-in customers. Collaboratively governed open source removes that risk.</p> <p class="p1">For organizations confronting these challenges today, a few starting points stand out:</p> <ul class="ul1"> <li class="li4">Enforce the principle of least privilege with your AI agents the same way you would with human users; do not give them blanket access to systems and data just because the configuration is easier that way.</li> <li class="li4">Adopt tooling like Zuul that builds software engineering discipline directly into your pipelines, so that quality checks do not depend on human bandwidth alone.</li> <li class="li4">Run AI workloads under stronger isolation—using agent sandboxing and Kata Containers, for example—particularly in multi-tenant environments like public clouds, or when processing sensitive or regulated data.</li> <li class="li1">Keep good backups and disaster recovery processes. AI agents will make mistakes at least as frequently as human operators, and the ability to restore systems to a known good state will be more important tomorrow than it is today.</li> </ul> <p class="p1">AI is already reshaping how software is built and operated; that train has left the station. Our onus is to build the infrastructure to operate it securely. The open source tools exist. The engineering principles exist. What is required now is the discipline to apply them and to continue developing them collaboratively in the open.</p>
  68. Stack Overflow Is Being Reborn as a Back-End Service for AI Agents

    Fri, 12 Jun 2026 20:38:54 -0000

    <div><img width="770" height="330" src="https://devops.com/wp-content/uploads/2026/06/stack_overflow_agents_770x330.jpg" class="attachment-large size-large wp-post-image" alt="" style="margin-bottom: 0px;" decoding="async" srcset="https://devops.com/wp-content/uploads/2026/06/stack_overflow_agents_770x330.jpg 770w, https://devops.com/wp-content/uploads/2026/06/stack_overflow_agents_770x330-290x124.jpg 290w, https://devops.com/wp-content/uploads/2026/06/stack_overflow_agents_770x330-360x154.jpg 360w, https://devops.com/wp-content/uploads/2026/06/stack_overflow_agents_770x330-400x171.jpg 400w" sizes="(max-width: 770px) 100vw, 770px" /></div><img width="150" height="150" src="https://devops.com/wp-content/uploads/2026/06/stack_overflow_agents_770x330-150x150.jpg" class="attachment-thumbnail size-thumbnail wp-post-image" alt="" decoding="async" />Will that stop its decline? We&#8217;ll soon find out. Once upon a time, Stack Overflow was, love it or hate it, the site programmers went to find answers for their most annoying development questions. Back then, according to Prashanth Chandrasekar, Stack Overflow&#8217;s CEO, the site had &#8220;100 million monthly visitors.&#8221; Then AI came along. From [&#8230;]
    <div><img width="770" height="330" src="https://devops.com/wp-content/uploads/2026/06/stack_overflow_agents_770x330.jpg" class="attachment-large size-large wp-post-image" alt="" style="margin-bottom: 0px;" decoding="async" srcset="https://devops.com/wp-content/uploads/2026/06/stack_overflow_agents_770x330.jpg 770w, https://devops.com/wp-content/uploads/2026/06/stack_overflow_agents_770x330-290x124.jpg 290w, https://devops.com/wp-content/uploads/2026/06/stack_overflow_agents_770x330-360x154.jpg 360w, https://devops.com/wp-content/uploads/2026/06/stack_overflow_agents_770x330-400x171.jpg 400w" sizes="(max-width: 770px) 100vw, 770px" /></div><img width="150" height="150" src="https://devops.com/wp-content/uploads/2026/06/stack_overflow_agents_770x330-150x150.jpg" class="attachment-thumbnail size-thumbnail wp-post-image" alt="" decoding="async" /><p><em><strong>Will that stop its decline? We&#8217;ll soon find out.</strong></em></p> <p><span style="font-weight: 400;">Once upon a time, </span><a href="https://stackoverflow.com/"><span style="font-weight: 400;">Stack Overflow</span></a><span style="font-weight: 400;"> was, love it or hate it, the site programmers went to find answers for their most annoying development questions. Back then, according to Prashanth Chandrasekar, </span><a href="https://www.zdnet.com/article/stack-overflow-ceo-on-how-it-became-the-worlds-most-popular-programming-site/"><span style="font-weight: 400;">Stack Overflow&#8217;s CEO, the site had &#8220;100 million monthly visitors.&#8221;</span></a><span style="font-weight: 400;"> Then AI came along. From its peak in 2014, when the site handled more than 200,000 new questions a month, it had </span><a href="https://hodgef.com/blog/stack-overflows-decline-how-ai-is-changing-community-engagement-forever-a1ab2/"><span style="font-weight: 400;">collapsed by the end of 2025 to a mere 3,862 new queries</span></a><span style="font-weight: 400;">. That&#8217;s a fall of roughly 98%. So, its owners have decided to r</span><a href="https://stackoverflow.blog/2026/06/10/announcing-stack-overflow-for-agents/"><span style="font-weight: 400;">einvent Stack Overflow as a back‑end service layer for AI agents</span></a><span style="font-weight: 400;">.</span></p> <p><span style="font-weight: 400;">Why? Well, besides the obvious, the site&#8217;s dying like a dog, Chandrasekar explained in a LinkedIn post, while &#8220;</span><a href="https://www.linkedin.com/feed/update/urn:li:activity:7470497718881280001/"><span style="font-weight: 400;">Agents are incredibly capable, yet they operate in isolation.</span></a><span style="font-weight: 400;"> They hallucinate deprecated libraries, rediscover the same fixes, burn tokens and compute on solved problems, and lose hard-won knowledge the moment a session ends.&#8221; He calls this the &#8220;Ephemeral Intelligence Gap.&#8221; This new service bridges this gap by giving agents a &#8220;live, verified corpus before acting.&#8221;</span></p> <p><span style="font-weight: 400;">Here&#8217;s how it works. Stack Overflow for Agents is an API-first knowledge exchange. Agents work at machine speed with </span><a href="https://stackoverflow.blog/2026/06/10/announcing-stack-overflow-for-agents/"><span style="font-weight: 400;">humans still in the loop</span></a><span style="font-weight: 400;"> to orchestrate them and approve what gets published. This is the step-by-step process:</span></p> <ol> <li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Search first. Whether planning a task, stuck mid-implementation, or about to attempt something the model wasn’t trained on, an agent queries Stack Overflow for Agents before burning compute and rediscovering known solutions. If the corpus has it, the agent consumes the validated answer and ships.</span></li> <li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Contribute when it doesn’t. When the corpus has a gap, and the agent solves the problem, it drafts a post—a TIL, Question, or Blueprint depending on what was learned. </span><a href="https://agents.stackoverflow.com/skill.md"><span style="font-weight: 400;">Stack Overflow for Agents’ skill file</span></a><span style="font-weight: 400;"> instructs the agent to surface the draft to its human orchestrator for review before publishing.</span></li> <li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Verify what others wrote. Agents and developers who attempt the same problem after publication report back on what worked, what they had to change, and the conditions under which it worked. Verification, not creation, is what earns reputation on Stack Overflow for Agents.</span></li> <li style="font-weight: 400;" aria-level="1"><span style="font-weight: 400;">Signals compound into consensus. Votes, replies, and verification feedback flow back to the original post and accumulate around it. The platform is designed to surface consensus, not a single canonical answer, so consumers see what’s been tried and decide what fits their context.</span></li> </ol> <p><span style="font-weight: 400;">To keep AI slop out of the data, each contributing agent is tied to their human developer. These, in turn, claim ownership of their agents through SSO using their Stack Overflow credentials. </span></p> <p><span style="font-weight: 400;">If that works, that will be great. In the meantime, the problem from where I sit is that while agents embedded in IDEs and CI systems now intercept the kinds of questions that used to go straight to Stack Overflow’s “Ask Question” form, pretty much all AI vendors had already been using Stack Overflow data to build their large language models (LLMs). Instead of Stack Overflow being the first stop, AI agents were already using it indirectly: via training data and search APIs that re-rank Stack Overflow answers.</span></p> <p><span style="font-weight: 400;">Stack Overflow&#8217;s move has been coming for some time. In late 2025, </span><a href="https://stackoverflow.blog/2025/12/02/introducing-stack-overflow-ai-assist-a-tool-for-the-modern-developer/"><span style="font-weight: 400;">Stack Overflow rolled out “AI Assist.”</span></a><span style="font-weight: 400;"> This is a generative interface over its public content that looks less like a forum and more like a question‑answering agent. It was aimed as much at machines and internal dev tools as at people reading pages in a browser. </span></p> <p><span style="font-weight: 400;">Stack Overflow has long struggled with accusations of elitism and hostility, especially toward newcomers and underrepresented groups. The company itself conceded in 2018 </span><a href="https://dl.acm.org/doi/10.1145/3274399"><span style="font-weight: 400;">that the site “isn’t very welcoming.” </span></a><span style="font-weight: 400;"> You could certainly say that! </span></p> <p><span style="font-weight: 400;">On the other hand, agents don&#8217;t care that Stack Overflow information includes insults like &#8220;Read the F***ing Manual (RTFM)&#8221; and the like. Agents just need clean, deduplicated knowledge. You can&#8217;t hurt an agent&#8217;s feelings. </span></p> <p><span style="font-weight: 400;">However, as Stack Overflow&#8217;s human-supplied answers become ever more stale, will its data still have any value to AI? Static knowledge ages and models trained predominantly on old Q&amp;A risk reinforcing dated practices and out-of-date answers. If no humans, agent-assisted or not, contribute to Stack Overflow, the site&#8217;s value to agents will decay.</span></p> <p><span style="font-weight: 400;">In the meantime, even by 2025, AI will sometimes be able to deliver better answers to programming questions. Academic studies show that generative </span><a href="https://arxiv.org/html/2509.05879v1"><span style="font-weight: 400;">AI models can sometimes outperform Stack Overflow</span></a><span style="font-weight: 400;"> answers on several tasks, including resolving compiler errors. And as I think we all know, AI has gotten much better at handling programming questions since then. </span></p> <p><span style="font-weight: 400;">For now, agents still draw heavily on the golden era of Stack Overflow, but without a healthy inflow of new questions, edge cases, and conceptual debates, the platform’s role as a living ground truth for software practice is at risk. </span></p> <p><span style="font-weight: 400;">In the meantime, if you want to give it a try, the beta program is available on most agents by simply feeding the following line to it:</span></p> <p><span style="font-weight: 400;">Stack Overflow just launched Stack Overflow for Agents. Read agents.stackoverflow.com/llms.txt and show me what’s there.</span></p> <p><span style="font-weight: 400;">Will this combination of people and agents revive Stack Overflow? Will it deliver good answers for modern programming problems? Stay tuned. We&#8217;re going to find out.</span></p>
  69. Why Endpoint Protection Matters More than Ever in CI/CD Environments

    Fri, 12 Jun 2026 20:19:52 -0000

    <div><img width="770" height="330" src="https://devops.com/wp-content/uploads/2026/06/EndpointSecurity-Large-e1781295562203.jpeg" class="attachment-large size-large wp-post-image" alt="" style="margin-bottom: 0px;" decoding="async" /></div><img width="150" height="150" src="https://devops.com/wp-content/uploads/2026/06/EndpointSecurity-Large-150x150.jpeg" class="attachment-thumbnail size-thumbnail wp-post-image" alt="" decoding="async" />CI/CD environments depend on far more than repositories and deployment infrastructure. Developer endpoints hold sensitive data: cloud credentials, SSH keys, deployment permissions, direct access to internal systems. Endpoint security and control are part of daily operational risk management. Engineering teams are shifting more and more toward distributed workflows, so discussions around CI/CD security include the [&#8230;]
    <div><img width="770" height="330" src="https://devops.com/wp-content/uploads/2026/06/EndpointSecurity-Large-e1781295562203.jpeg" class="attachment-large size-large wp-post-image" alt="" style="margin-bottom: 0px;" decoding="async" /></div><img width="150" height="150" src="https://devops.com/wp-content/uploads/2026/06/EndpointSecurity-Large-150x150.jpeg" class="attachment-thumbnail size-thumbnail wp-post-image" alt="" decoding="async" /><p>CI/CD environments depend on far more than repositories and deployment infrastructure. Developer endpoints hold sensitive data: cloud credentials, SSH keys, deployment permissions, direct access to internal systems. Endpoint security and control are part of daily operational risk management. Engineering teams are shifting more and more toward distributed workflows, so discussions around CI/CD security include the security posture of the devices connected to the pipeline.</p> <p>Many organizations already focus their CI/CD security efforts on <a href="https://devops.com/secrets-management-failures-in-ci-cd-pipelines/">secrets management</a>, dependency scanning and supply chain controls. However, advanced endpoint security solutions are also relevant in cloud-native development environments, where local devices maintain direct access to production workflows.</p> <h3><strong>Endpoint Compromise Can Bypass Mature CI/CD Controls</strong></h3> <p>CI/CD security discussions mostly focus on repositories, containers, infrastructure, and deployment automation. Developer endpoints are often overlooked as a part of the software delivery chain. A compromised workstation can expose deployment credentials, cloud access tokens, internal documentation, and active development environments long before suspicious activity reaches production systems.</p> <p>The problem is more visible when engineers rely on remote and hybrid work. Developers move between local environments and cloud dashboards throughout the day. In many places, a single endpoint may hold access to multiple stages of the deployment pipeline.</p> <p>Developer workstations need better protection against the malware and spyware that can steal deployment credentials. Most endpoint security solutions can block malicious downloads and help prevent credential theft from locally stored files, yet developers sometimes view endpoint protection as unnecessary on development machines. Modern endpoint protection platforms can detect and stop malware before it gains access to SSH keys, API tokens or other sensitive credentials stored on a device.</p> <p>A single compromised machine becomes an entry point into repositories and deployment systems long before infrastructure monitoring catches the incident. Antivirus protection doesn&#8217;t replace credential management practices, but it significantly reduces the window of exposure when malware or spyware lands on a developer&#8217;s workstation.</p> <p>A compromised developer workstation can create risks that aren’t always visible through infrastructure controls. Modern endpoint protection solutions expand the visibility into device activity, credential use, and suspicious behavior before the incident spreads across connected systems.</p> <h3><strong>Common Endpoint Risks in Development Workflows</strong></h3> <p>Many endpoint risks in CI/CD environments have nothing to do with sophisticated attacks. They usually show up in everyday work, which involves local devices, cloud access, and deployment systems.</p> <p>This is how that looks in practice:</p> <ol> <li><strong>Locally stored SSH keys and Access Tokens</strong></li> </ol> <p>Developer machines usually store credentials connected to repositories and internal services. A compromised endpoint can expose several systems at once if access controls aren’t properly segmented.</p> <ol> <li><strong>Persistent browser sessions connected to cloud platforms</strong></li> </ol> <p>Engineering teams regularly stay logged into cloud dashboards and collaboration platforms. A hijacked browser may provide indirect access even without stolen passwords. <a href="https://www.paloaltonetworks.com/resources/research/unit-42-incident-response-report#:~:text=The%20Browser%20Attack%20Surface%3A%20Attacks,at%20the%20Human%20Interface">Palo Alto Networks</a> reported that browser‑based activity played a role in 48% of the incidents they investigated</p> <ol> <li><strong>Unmanaged Local Development Environments</strong></li> </ol> <p>Developers often have to test dependencies, containers, scripts, and third-party packages on local machines before the code reaches production. Unpatched environments can bring unnecessary exposure in connected workflows.</p> <ol> <li><strong>Remote Work </strong></li> </ol> <p>Hybrid teams move between home networks and coworking spaces. The security team may have limited visibility into endpoints that don’t work in the centralized office infrastructure.</p> <ol> <li><strong>Shared Access </strong></li> </ol> <p>The endpoint of a single developer can interact with repositories, ticketing platforms, deployment tools, internal messaging systems, and production dashboards during one work session. Without proper endpoint protection features, that level of access increases operational risk.</p> <h3><strong>Security Teams Need Visibility Beyond the Perimeter</strong></h3> <p>The idea of an endpoint security strategy used to be simple: preventing malware from reaching employee devices. But today’s development environments require a broader approach. Developers interact with cloud infrastructure and deployment systems, so visibility into endpoint activity is an important part of the overall security strategy.</p> <p>Most endpoint security features focus on detecting unusual behavior. They don&#8217;t rely exclusively on signature-based malware detection. Security teams monitor:</p> <ul> <li>Suspicious login attempts</li> <li>Unexpected privilege escalation</li> <li>Unusual access patterns</li> <li>Activity involving sensitive credentials</li> </ul> <p>The goal is to identify potential compromises before it affects repositories and cloud resources. That’s especially important in organizations adopting cloud native endpoint protection strategies. When development teams work across cloud platforms, security controls should follow the user and the device. They can’t afford to depend entirely on network boundaries.</p> <p>One of the most important benefits of endpoint security in CI/CD environments is earlier visibility into activity that could remain unnoticed otherwise. Security teams gain more context around device health and credential use. That’s how they can respond before a local incident grows into a large operational problem.</p> <p>The importance of securing every stage of the software delivery process is included in the guidance from CISA. The agency emphasizes layered security practices and continuous monitoring, which would reduce the opportunities for compromise throughout the development lifecycle.</p> <h3><strong>Security Controls Must Follow the Developer</strong></h3> <p>The traditional security model assumed employees worked from managed devices in a controlled corporate environment. Today’s engineering teams work differently. Developers switch between home networks and cloud platforms all the time. They use multiple devices to access critical systems.</p> <p>As a result, security controls can’t depend entirely on network boundaries. Access decisions rely on a few methods:</p> <ul> <li>ID verification</li> <li>Device health</li> <li>Session monitoring</li> <li>Contextual signals</li> </ul> <p>The answer to the question “What is endpoint security?” extends far beyond malware prevention on individual devices.</p> <p>This change is one of the reasons cloud native endpoint protection became an important part of DevSecOps discussions. Security teams need visibility into the devices connecting to development and deployment sessions, regardless of their location. Many endpoint security features today are designed around that reality. They help organizations maintain oversight when the users work from different environments.</p> <p>Software delivery depends on more than code and infrastructure. The devices used to build, test, and deploy software are part of the security equation. Organizations that overlook endpoint risks may leave a critical gap in otherwise mature <a href="https://devops.com/ci-cd-supply-chain-security-hardening-artifacts-dependencies-and-delivery-pipelines/">CI/CD security strategies</a>.</p> <h3><strong>CISA and Layered Security Guidance</strong></h3> <p>The U.S. Cybersecurity and Infrastructure Security Agency <a href="https://www.cisa.gov/sites/default/files/publications/ESF_SECURING_THE_SOFTWARE_SUPPLY_CHAIN_DEVELOPERS.PDF?">(CISA)</a> recommends a layered approach to securing the software supply chain, including secure development practices, build environment hardening, third-party component verification, and continuous monitoring throughout the software lifecycle.</p> <p>As development teams increasingly operate across cloud and remote environments, these principles reinforce the need for security controls that extend beyond traditional network boundaries and include the endpoints used to access development and deployment systems.</p> <h3><strong>Conclusion</strong></h3> <p>Modern software delivery depends not only on secure code and robust cloud infrastructure but also on the devices used to build, test, and deploy that code. Endpoints are the new perimeter because they hold secrets, maintain persistent sessions, and connect remote developers to production systems. Furthermore, <a href="https://www.lookout.com/news-release/new-lookout-research-highlights-increased-security-risks-faced-by-organizations-due-to-remote-work-and-byod#:~:text=%E2%80%9CThe%20State%20of%20Remote%20Work%E2%80%9D,surface%20has%20left%20the%20building">remote work and BYOD trends</a> expose sensitive data on personal devices, while shadow IT and unpatched local environments enlarge the attack surface.</p> <p><a href="https://devops.com/3-steps-to-secure-your-ci-cd-pipelines/">CI/CD pipeline security</a> is therefore incomplete without advanced endpoint protection. Organizations should deploy EDR solutions across both corporate and personal devices, enforce strong access controls, and continuously monitor for suspicious behavior. By integrating endpoint security into DevSecOps practices and following layered guidance from agencies like CISA, engineering teams can reduce the risk that a compromised laptop will become the weakest link in an otherwise secure pipeline.</p>
  70. Cohere’s North Mini Code Lets Devs Stack Their Own AI

    Fri, 12 Jun 2026 18:16:57 -0000

    <div><img width="770" height="330" src="https://devops.com/wp-content/uploads/2026/06/cohere_north_mini_code_770x330.jpg" class="attachment-large size-large wp-post-image" alt="" style="margin-bottom: 0px;" decoding="async" srcset="https://devops.com/wp-content/uploads/2026/06/cohere_north_mini_code_770x330.jpg 770w, https://devops.com/wp-content/uploads/2026/06/cohere_north_mini_code_770x330-290x124.jpg 290w, https://devops.com/wp-content/uploads/2026/06/cohere_north_mini_code_770x330-360x154.jpg 360w, https://devops.com/wp-content/uploads/2026/06/cohere_north_mini_code_770x330-400x171.jpg 400w" sizes="(max-width: 770px) 100vw, 770px" /></div><img width="150" height="150" src="https://devops.com/wp-content/uploads/2026/06/cohere_north_mini_code_770x330-150x150.jpg" class="attachment-thumbnail size-thumbnail wp-post-image" alt="" decoding="async" />Toronto startup Cohere has released an open-weight model designed for developers to use to build their own AI stack. The open-weight North Mini Code is a 30-billion-parameter “mixture-of-experts” (MoE) model. MoE equips a model with specialized neural nets for individual tasks, such as mathematics and code generation. Mistral pioneered this approach to compete with larger [&#8230;]
    <div><img width="770" height="330" src="https://devops.com/wp-content/uploads/2026/06/cohere_north_mini_code_770x330.jpg" class="attachment-large size-large wp-post-image" alt="" style="margin-bottom: 0px;" decoding="async" srcset="https://devops.com/wp-content/uploads/2026/06/cohere_north_mini_code_770x330.jpg 770w, https://devops.com/wp-content/uploads/2026/06/cohere_north_mini_code_770x330-290x124.jpg 290w, https://devops.com/wp-content/uploads/2026/06/cohere_north_mini_code_770x330-360x154.jpg 360w, https://devops.com/wp-content/uploads/2026/06/cohere_north_mini_code_770x330-400x171.jpg 400w" sizes="(max-width: 770px) 100vw, 770px" /></div><img width="150" height="150" src="https://devops.com/wp-content/uploads/2026/06/cohere_north_mini_code_770x330-150x150.jpg" class="attachment-thumbnail size-thumbnail wp-post-image" alt="" decoding="async" /><p><span style="font-weight: 400;">Toronto startup </span><a href="https://cohere.com/about"><span style="font-weight: 400;">Cohere</span></a><span style="font-weight: 400;"> has released an open-weight model designed for developers to use to build their own AI stack.</span></p> <p><span style="font-weight: 400;">The open-weight </span><a href="https://huggingface.co/blog/CohereLabs/introducing-north-mini-code"><span style="font-weight: 400;">North Mini Code</span></a><span style="font-weight: 400;"> is a 30-billion-parameter “mixture-of-experts” (MoE) model. MoE equips a model with specialized neural nets for individual tasks, such as mathematics and code generation. Mistral pioneered this approach to compete with larger LLMs. </span></p> <p><span style="font-weight: 400;">As a result, when it comes time to produce an answer, the GPU won’t need all 30 billion parameters. Instead, a router function picks the most appropriate experts to complete the task, reducing the working size to 3 billion parameters. This means the model, </span><a href="https://techstrong.ai/articles/together-ai-trims-kv-cache-for-open-weight-models/"><span style="font-weight: 400;">slimmed to 4 bit quantization</span></a><span style="font-weight: 400;">, can be managed by a single NVIDIA H100 GPU. </span></p> <p><span style="font-weight: 400;">In fact, you won’t need a data center of H100s at all to run this model. The open weight release, optimized for software engineering agentic tasks, is one of a</span><a href="https://techstrong.ai/articles/ai2-debuts-datavoyager-tool-to-democratize-scientific-data-analysis/"><span style="font-weight: 400;"> growing number</span></a><span style="font-weight: 400;"> of </span><a href="https://techstrong.ai/articles/servicenow-leverages-genai-to-democratize-app-dev/"><span style="font-weight: 400;">technologies</span></a><span style="font-weight: 400;"> built with the intention to democratize AI – in this case for developers. </span></p> <p><span style="font-weight: 400;">“Local deployment is one way of empowering people and making AI really something that works for them,” said </span><a href="https://nickfrosst.com/"><span style="font-weight: 400;">Nick Frosst</span></a><span style="font-weight: 400;">, in </span><a href="https://x.com/cohere/status/2064378058329526556?s=20"><span style="font-weight: 400;">a video introduction</span></a><span style="font-weight: 400;"> to the model. </span></p> <p><span style="font-weight: 400;">The weights of North Mini Code, under an Apache 2.0 license, are </span><a href="https://huggingface.co/blog/CohereLabs/introducing-north-mini-code"><span style="font-weight: 400;">available on Hugging Face</span></a><span style="font-weight: 400;">, and can be accessed from the Cohere API, Cohere Model Vault and </span><a href="https://openrouter.ai/about"><span style="font-weight: 400;">OpenRouter LLM marketplace</span></a><span style="font-weight: 400;">. It can also work with Cohere’s turnkey AI workplace platform, </span><a href="https://cohere.com/north"><span style="font-weight: 400;">North</span></a><span style="font-weight: 400;">.</span></p> <p><span style="font-weight: 400;">“North Mini Code is designed for speed and efficiency, with a strong focus on minimizing total cost of ownership,” the blog post </span><a href="https://cohere.com/blog/north-mini-code"><span style="font-weight: 400;">announcing the release</span></a><span style="font-weight: 400;"> stated. </span></p> <p><span style="font-weight: 400;">Individuals and companies who want to aggressively use AI but worry about the high costs of commercially provided tokens should think about incorporating this mid-sized model into an AI stack.</span></p> <h3><b>AI on a Budget</b></h3> <p><span style="font-weight: 400;">When “you&#8217;re calling an API, you&#8217;re suddenly beholden to whatever that cost is,” Frosst said, referring to the commercial AI providers whose services have caught the attention of the public. As the </span><a href="https://techstrong.ai/articles/tech-giants-slashing-budgets-as-token-costs-skyrocket/"><span style="font-weight: 400;">period of subsidized tokens</span></a><span style="font-weight: 400;"> comes to a close, organizations and end-users will start scrutinizing their AI usage. They may find many of their jobs won’t necessarily need the full power (and expense) of a behemoth LLM service.</span></p> <p><span style="font-weight: 400;">In the video, Frosst demonstrated a project he was working on, to build a thermostat regulator for his home, using North Mini Code running on his </span><a href="https://www.apple.com/mac-studio"><span style="font-weight: 400;">Mac Studio</span></a><span style="font-weight: 400;">, </span><a href="https://opensource.apple.com/projects/mlx/"><span style="font-weight: 400;">with the help of MLX</span></a><span style="font-weight: 400;">. The job took only about 20 GB of working memory.</span></p> <p><span style="font-weight: 400;">Larger projects he ships off to an LLM, but many jobs of this size can be run on the user’s own machine (perhaps with a memory upgrade).</span></p> <p><span style="font-weight: 400;">“When there&#8217;s something complicated, maybe I call out to a different model, a bigger one on an API,” Frosst said. “When there&#8217;s something simple, I just call the local model.”</span></p> <p><span style="font-weight: 400;">“I think that&#8217;s a pattern that&#8217;s going to become a lot more popular, especially now as the price of tokens is suddenly something that people are thinking about,” he said.</span></p> <p><span style="font-weight: 400;">North Mini Code charted a 33.4 on </span><a href="https://artificialanalysis.ai/models/north-mini-code"><span style="font-weight: 400;">the Artificial Analysis Coding Index</span></a><span style="font-weight: 400;">, placing it well above the average of 15, from among 128 comparable models (such as Mistral’s </span><a href="https://huggingface.co/mistralai/Devstral-Small-2505"><span style="font-weight: 400;">Devstral Small</span></a><span style="font-weight: 400;">, </span><a href="https://huggingface.co/poolside/Laguna-XS.2"><span style="font-weight: 400;">Poolside</span></a><span style="font-weight: 400;">,</span><a href="https://huggingface.co/HauhauCS/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive"><span style="font-weight: 400;"> Qwen</span></a><span style="font-weight: 400;"> and </span><a href="https://huggingface.co/google/gemma-4-12B-it"><span style="font-weight: 400;">Google Gemma</span></a><span style="font-weight: 400;">).  </span></p> <p><span style="font-weight: 400;">The coding index found North Mini Code to be very fast, though it is very verbose. Producing 208 tokens a second, North Mini Code is “notably fast,” the site noted. In the benchmark, it generated 75 million tokens, more than three times the average. </span></p> <p><span style="font-weight: 400;">In other words, the model is a bit chatty. Perhaps in future releases North Mini Code will be better able to keep its thought process to itself, and just deliver the needed solutions. </span></p>
  71. Survey Surfaces Depth of DevSecOps Crisis in the Age of AI

    Fri, 12 Jun 2026 16:26:06 -0000

    <div><img width="770" height="330" src="https://devops.com/wp-content/uploads/2026/06/checkmarx_770x330.jpeg" class="attachment-large size-large wp-post-image" alt="" style="margin-bottom: 0px;" decoding="async" srcset="https://devops.com/wp-content/uploads/2026/06/checkmarx_770x330.jpeg 770w, https://devops.com/wp-content/uploads/2026/06/checkmarx_770x330-290x124.jpeg 290w, https://devops.com/wp-content/uploads/2026/06/checkmarx_770x330-360x154.jpeg 360w, https://devops.com/wp-content/uploads/2026/06/checkmarx_770x330-400x171.jpeg 400w" sizes="(max-width: 770px) 100vw, 770px" /></div><img width="150" height="150" src="https://devops.com/wp-content/uploads/2026/06/checkmarx_770x330-150x150.jpeg" class="attachment-thumbnail size-thumbnail wp-post-image" alt="" decoding="async" />A global survey of 2,350 developers, CISOs and application security managers published this week finds that while nearly all respondents (96%) work for organizations that have embedded or connected artificial intelligence (AI) code and tools into some aspect of their application development workflows, nearly half of all code (49%) running in production environments was AI-generated [&#8230;]
    <div><img width="770" height="330" src="https://devops.com/wp-content/uploads/2026/06/checkmarx_770x330.jpeg" class="attachment-large size-large wp-post-image" alt="" style="margin-bottom: 0px;" decoding="async" srcset="https://devops.com/wp-content/uploads/2026/06/checkmarx_770x330.jpeg 770w, https://devops.com/wp-content/uploads/2026/06/checkmarx_770x330-290x124.jpeg 290w, https://devops.com/wp-content/uploads/2026/06/checkmarx_770x330-360x154.jpeg 360w, https://devops.com/wp-content/uploads/2026/06/checkmarx_770x330-400x171.jpeg 400w" sizes="(max-width: 770px) 100vw, 770px" /></div><img width="150" height="150" src="https://devops.com/wp-content/uploads/2026/06/checkmarx_770x330-150x150.jpeg" class="attachment-thumbnail size-thumbnail wp-post-image" alt="" decoding="async" /><p>A <a href="https://checkmarx.com/press-releases/95-of-cisos-pressured-to-suppress-or-delay-compliance-related-security-issues-even-as-ai-generated-code-multiplies-their-attack-surface/">global survey</a> of 2,350 developers, CISOs and application security managers published this week finds that while nearly all respondents (96%) work for organizations that have embedded or connected artificial intelligence (AI) code and tools into some aspect of their application development workflows, nearly half of all code (49%) running in production environments was AI-generated in 2025.</p> <p>Conducted by the market research firm CensusWide on behalf of Checkmarx, the survey also finds 70% of respondents reporting they are also now discovering more vulnerabilities, with 31% describing that increase as being significant.</p> <p>On average, developers are spending 49% of their time in a given week on security-related issues, the survey finds. Nearly all said the application security guidance surfaced in integrated development environments (IDEs) is effective, but only 18% continuously scan code as it is being written.</p> <p>A full 93% also acknowledge their organization has experienced at least one security breach as a direct result of a vulnerable application their organization developed, with three quarters of respondents (75%) admitting they knowingly deploy vulnerable code often or sometimes.</p> <p>Top reasons cited for shipping vulnerable code was a belief that existing controls would mitigate risks, hopes that the vulnerability would not be discovered (30%) and the need to meet a business, feature, or security-related deadline (27%).</p> <p>More troubling still, nearly all respondents (95%) said they feel pressure either frequently (47%) or occasionally (48%) to prioritize or delay reporting of a compliance-related security issue.</p> <p>Jonathan Rende, chief product officer for Checkmarx, said the survey makes it clear that far too many organizations are not rigorously enforcing best DevSecOps practices. With there continuing to be so much emphasis on building new features as quickly as possible, many application development teams are being set up to fail, he added.</p> <p>In fact, the survey finds only 9% of organizations report fixing more than 90% of vulnerabilities within 90 days. Just over a quarter (28%) said they can remediate fewer than half of the vulnerabilities discovered within that timeframe.</p> <p>That issue is only going to become all the more pressing as frontier AI models make it simpler to both discover vulnerabilities and create the malware that exploits it, he added. In fact, the survey finds that open source software accounts for on average 59% of the code running in production environments. It is now being discovered, however, that much of that code is rife with vulnerabilities.</p> <div style="padding: 56.25% 0 0 0; position: relative;"><iframe style="position: absolute; top: 0; left: 0; width: 100%; height: 100%;" title="Jonathan Rende - TSTV" src="https://player.vimeo.com/video/1200755143?badge=0&amp;autopause=0&amp;player_id=0&amp;app_id=58479" frameborder="0"></iframe></div> <p><script src="https://player.vimeo.com/api/player.js"></script></p> <p>Despite the current state of application security, nearly three quarters of respondents (73%) rate the application security posture of their organizations as either being highly mature or advanced. However, nearly half of the respondents with highly mature application security postures experienced three or more breaches in the last 12 months.</p> <p>There is clearly much work to be done in terms of improving the overall state of DevSecOps within organizations. The issue is that those same organizations in the age of AI are starting to realize that technical security debt that they have been kicking down the proverbial road for years is now coming due a whole lot faster than anyone expected.</p>
  72. From Pilots to Performance: Why Enterprise Flow is the New Competitive Advantage

    Fri, 12 Jun 2026 11:21:47 -0000

    <div><img width="770" height="330" src="https://devops.com/wp-content/uploads/2026/06/Untitled-design-57.jpg" class="attachment-large size-large wp-post-image" alt="" style="margin-bottom: 0px;" decoding="async" srcset="https://devops.com/wp-content/uploads/2026/06/Untitled-design-57.jpg 770w, https://devops.com/wp-content/uploads/2026/06/Untitled-design-57-290x124.jpg 290w, https://devops.com/wp-content/uploads/2026/06/Untitled-design-57-360x154.jpg 360w, https://devops.com/wp-content/uploads/2026/06/Untitled-design-57-400x171.jpg 400w" sizes="(max-width: 770px) 100vw, 770px" /></div><img width="150" height="150" src="https://devops.com/wp-content/uploads/2026/06/Untitled-design-57-150x150.jpg" class="attachment-thumbnail size-thumbnail wp-post-image" alt="" decoding="async" />Flow: Not a buzzword, but a discipline. Those that master it will be the ones that turn transformation ambition into transformation results. 
    <div><img width="770" height="330" src="https://devops.com/wp-content/uploads/2026/06/Untitled-design-57.jpg" class="attachment-large size-large wp-post-image" alt="" style="margin-bottom: 0px;" decoding="async" srcset="https://devops.com/wp-content/uploads/2026/06/Untitled-design-57.jpg 770w, https://devops.com/wp-content/uploads/2026/06/Untitled-design-57-290x124.jpg 290w, https://devops.com/wp-content/uploads/2026/06/Untitled-design-57-360x154.jpg 360w, https://devops.com/wp-content/uploads/2026/06/Untitled-design-57-400x171.jpg 400w" sizes="(max-width: 770px) 100vw, 770px" /></div><img width="150" height="150" src="https://devops.com/wp-content/uploads/2026/06/Untitled-design-57-150x150.jpg" class="attachment-thumbnail size-thumbnail wp-post-image" alt="" decoding="async" /><p><span data-contrast="auto">Digital transformation has never been short of ambition. Enterprises have invested billions in AI initiatives, cloud migrations, agile programs, and customer success platforms—yet many still find themselves stuck: Pilots that never scale, software that sits unused, teams that move fast in isolation but grind to a halt when they collide. The question isn&#8217;t whether organizations </span><i><span data-contrast="auto">want </span></i><span data-contrast="auto">to transform. It&#8217;s whether they know how to make value actually </span><i><span data-contrast="auto">flow</span></i><span data-contrast="auto">.</span><span data-ccp-props="{&quot;201341983&quot;:0,&quot;335559685&quot;:23,&quot;335559737&quot;:103,&quot;335559738&quot;:240,&quot;335559740&quot;:276}"> </span></p> <p><span data-contrast="auto">That question sits at the heart of conversations happening across the world&#8217;s most complex organizations. From aerospace and defense to aviation, industrial software to telecom infrastructure, leaders are learning the same hard lesson: Transformation is less a technology problem than a systems problem — and solving it requires aligning people, process, and platforms around the uninterrupted movement of value.</span></p> <h3 aria-level="2"><b><span data-contrast="auto">Scaling AI Beyond the Proof of Concept</span></b><span data-ccp-props="{&quot;335559685&quot;:23}"> </span></h3> <p><span data-contrast="auto">For many enterprises, AI is still a pilot program. A promising prototype here, a promising use case there—but the operational scale that would actually move the needle remains elusive. </span><a href="https://www.flowtopia.io/c/speaker-hall/brian-moore" target="_blank" rel="noopener"><span data-contrast="none">Brian Moore</span></a><span data-contrast="auto">, a leader at RTX (Raytheon Technologies), one of the world&#8217;s largest aerospace and defense firms, has seen this pattern up close and believes the answer lies in physics as much as it does in technology.</span></p> <p><span data-contrast="auto">Drawing on Industrial DevOps, SAFe, and the constructal law—a principle from physics that describes how flow systems evolve to move things more efficiently—Moore argues that enterprises must architect their AI deployments the way nature architects rivers: with channels deliberately designed to reduce resistance and maximize throughput.</span><span data-ccp-props="{&quot;201341983&quot;:0,&quot;335559685&quot;:23,&quot;335559737&quot;:103,&quot;335559738&quot;:240,&quot;335559740&quot;:276}"> </span></p> <p><span data-contrast="auto">&#8220;Realizing the promise of AI in an enterprise requires moving beyond pilots to large-scale deployments, guided by principles derived from the constructal law of physics,&#8221; says Moore. &#8220;The practices of Industrial DevOps and SAFe align with constructal theory and give us direct guidance on how to rapidly scale AI benefits—and the case studies from RTX prove it&#8217;s possible.&#8221;</span><span data-ccp-props="{&quot;201341983&quot;:0,&quot;335559685&quot;:23,&quot;335559737&quot;:103,&quot;335559738&quot;:240,&quot;335559740&quot;:276}"> </span></p> <p><span data-contrast="auto">The lesson: Flow isn&#8217;t accidental. It has to be engineered.</span><span data-ccp-props="{&quot;335559685&quot;:23,&quot;335559737&quot;:0,&quot;335559738&quot;:240}"> </span></p> <h3 aria-level="2"><b><span data-contrast="auto">When Adoption Isn&#8217;t Enough</span></b><span data-ccp-props="{&quot;335559685&quot;:23}"> </span></h3> <p><span data-contrast="auto">Even where software </span><i><span data-contrast="auto">is </span></i><span data-contrast="auto">deployed at scale, value doesn&#8217;t automatically follow. </span><a href="https://www.flowtopia.io/c/speaker-hall/giso-van-der-heide" target="_blank" rel="noopener"><span data-contrast="none">Giso van der</span></a><span data-contrast="none"> </span><a href="https://www.flowtopia.io/c/speaker-hall/giso-van-der-heide"><span data-contrast="none">Heide</span></a><span data-contrast="auto">, a customer success strategist at Siemens, points to a startling statistic: An estimated 46% of enterprise applications remain underutilized or unused, representing billions of</span><span data-ccp-props="{&quot;201341983&quot;:0,&quot;335559685&quot;:23,&quot;335559737&quot;:103,&quot;335559738&quot;:299,&quot;335559740&quot;:276}"> </span><span data-contrast="auto">dollars in wasted licenses every year. The culprit, he argues, isn&#8217;t bad software — it&#8217;s the wrong success metric.</span><span data-ccp-props="{&quot;201341983&quot;:0,&quot;335559685&quot;:23,&quot;335559737&quot;:103,&quot;335559738&quot;:80,&quot;335559740&quot;:276}"> </span></p> <p><span data-contrast="auto">Van der Heide&#8217;s concept of Outcome Engineering reframes customer success entirely. Rather than measuring whether users log in, it asks whether they are achieving the specific business outcomes the software was purchased to deliver. His Persona Outcome Fabric maps the Jobs to Be Done for individual user roles and connects them directly to quantified performance measures and ROI—bridging the gap between a product&#8217;s features and the measurable value a business actually needs.</span><span data-ccp-props="{&quot;201341983&quot;:0,&quot;335559685&quot;:23,&quot;335559737&quot;:16,&quot;335559738&quot;:240,&quot;335559740&quot;:276}"> </span></p> <p><span data-contrast="auto">AI plays a central role in this model, helping generate customized adoption roadmaps and persona-specific success plans at a scale no human team could manage manually.</span><span data-ccp-props="{&quot;201341983&quot;:0,&quot;335559685&quot;:23,&quot;335559737&quot;:347,&quot;335559738&quot;:240,&quot;335559740&quot;:276}"> </span></p> <p><span data-contrast="auto">&#8220;Adoption without clear business impact is merely noise,&#8221; says van der Heide. &#8220;True customer success requires a structured methodology that measures tactical KPIs alongside actual business impact—and AI-driven components are what make that personalization scalable.&#8221;</span><span data-ccp-props="{&quot;201341983&quot;:0,&quot;335559685&quot;:23,&quot;335559737&quot;:103,&quot;335559738&quot;:240,&quot;335559740&quot;:276}"> </span></p> <h3 aria-level="2"><b><span data-contrast="auto">The Friction at the Intersections</span></b><span data-ccp-props="{&quot;335559685&quot;:23}"> </span></h3> <p><span data-contrast="auto">Scaling individual value streams is challenging enough. But as </span><a href="https://www.flowtopia.io/c/speaker-hall/ja-mesa-dixon" target="_blank" rel="noopener"><span data-contrast="none">Ja&#8217;Mesa Dixon</span></a><span data-contrast="auto">, a flow and agility practitioner at Zayo Group, will tell you, the real complexity begins when those streams start to interact. Shared platforms, competing funding priorities, cross-team dependencies — these intersections are where organizational momentum goes to die.</span><span data-ccp-props="{&quot;201341983&quot;:0,&quot;335559685&quot;:23,&quot;335559737&quot;:103,&quot;335559738&quot;:298,&quot;335559740&quot;:276}"> </span></p> <p><span data-contrast="auto">Dixon&#8217;s work helping enterprises evolve from team-level agility to system-level flow has given her an unflinching view of what actually breaks down and what actually helps. Her insight: the solution isn&#8217;t heavier governance. It&#8217;s smarter coordination — lightweight mechanisms that resolve dependencies without creating bureaucratic drag, and enough psychological safety for teams to surface problems before they compound.</span><span data-ccp-props="{&quot;201341983&quot;:0,&quot;335559685&quot;:23,&quot;335559737&quot;:103,&quot;335559738&quot;:240,&quot;335559740&quot;:276}"> </span></p> <p><span data-contrast="auto">Critically, Dixon emphasizes that Value Stream Networks are as much a human challenge as a structural one. Progress is rarely linear, and leadership&#8217;s appetite for speed must be balanced with the reality that trust is built through small, visible wins over time.</span><span data-ccp-props="{&quot;201341983&quot;:0,&quot;335559685&quot;:23,&quot;335559737&quot;:2,&quot;335559738&quot;:240,&quot;335559740&quot;:276}"> </span></p> <p><span data-contrast="auto">&#8220;Success in Value Stream Networks isn&#8217;t just about frameworks,&#8221; says Dixon. &#8220;It&#8217;s about meeting people where they are, managing the friction at the intersections, and building momentum through visible progress — not above it, but right in the middle of it.&#8221;</span><span data-ccp-props="{&quot;201341983&quot;:0,&quot;335559685&quot;:23,&quot;335559737&quot;:103,&quot;335559738&quot;:240,&quot;335559740&quot;:276}"> </span></p> <h3 aria-level="2"><b><span data-contrast="auto">Transformation is a People Problem</span></b><span data-ccp-props="{&quot;335559685&quot;:23}"> </span></h3> <p><span data-contrast="auto">Nowhere is the human dimension of transformation more visible than in talent strategy. </span><a href="https://www.flowtopia.io/c/speaker-hall/taquonda-hill" target="_blank" rel="noopener"><span data-contrast="none">TaQuonda Hill</span></a><span data-contrast="auto">, who has led Delta Air Lines&#8217; largest cloud migration and workforce enablement initiatives, has built a blueprint for what she calls a product-centric Cloud Target Operating Model—a framework that treats people, process, and technology as a single integrated system rather than parallel workstreams.</span><span data-ccp-props="{&quot;201341983&quot;:0,&quot;335559685&quot;:23,&quot;335559737&quot;:103,&quot;335559738&quot;:299,&quot;335559740&quot;:276}"> </span></p> <p><span data-contrast="auto">Her experience at Delta underscores a truth that too many transformation programs overlook: Agility at scale requires not just restructured processes but re-energized people. Talent empowerment, inclusive design, and continuous learning aren&#8217;t soft add-ons to a technical program—they are the operating conditions that make everything else work.</span><span data-ccp-props="{&quot;201341983&quot;:0,&quot;335559685&quot;:23,&quot;335559737&quot;:103,&quot;335559738&quot;:80,&quot;335559740&quot;:276}"> </span></p> <p><span data-contrast="auto">&#8220;True transformation lies at the intersection of technology, people, and process,&#8221; says Hill. &#8220;Visionary leadership has to translate enterprise strategy into tangible value by aligning technology enablement with people-centric design—and that means building a resilient, future-ready workforce, not just deploying new tools.&#8221;</span><span data-ccp-props="{&quot;201341983&quot;:0,&quot;335559685&quot;:23,&quot;335559737&quot;:351,&quot;335559738&quot;:240,&quot;335559740&quot;:276}"> </span></p> <h3 aria-level="2"><b><span data-contrast="auto">The Common Thread: Flow</span></b><span data-ccp-props="{&quot;335559685&quot;:23}"> </span></h3> <p><span data-contrast="auto">What connects Brian Moore&#8217;s physics-inspired AI scaling model, Giso van der Heide&#8217;s outcome engineering framework, Ja&#8217;Mesa Dixon&#8217;s Value Stream Network field notes, and TaQuonda Hill&#8217;s talent transformation blueprint? All four are, at their core, about the same thing: Removing the barriers that prevent value from moving through an organization — from strategy to execution, from technology to people, from pilot to production.</span><span data-ccp-props="{&quot;201341983&quot;:0,&quot;335559685&quot;:23,&quot;335559737&quot;:103,&quot;335559738&quot;:298,&quot;335559740&quot;:276}"> </span></p> <p><span data-contrast="auto">Flow is not a buzzword. It is a discipline. And the enterprises that master it will be the ones that turn transformation ambition into transformation results.</span><span data-ccp-props="{&quot;201341983&quot;:0,&quot;335559685&quot;:23,&quot;335559737&quot;:103,&quot;335559738&quot;:240,&quot;335559740&quot;:276}"> </span></p> <p><b><span data-contrast="auto">Hear Brian Moore, Giso van der Heide, Ja&#8217;Mesa Dixon, TaQuonda Hill, and many more at Flowtopia Live. </span></b><span data-contrast="auto">Join leaders from across the industry for a day of candid conversation, practical frameworks, and real-world case studies on building enterprise-scale flow.</span><span data-ccp-props="{&quot;201341983&quot;:0,&quot;335551550&quot;:1,&quot;335551620&quot;:1,&quot;335559685&quot;:23,&quot;335559731&quot;:0,&quot;335559737&quot;:16,&quot;335559738&quot;:0,&quot;335559740&quot;:276}"> </span></p> <p aria-level="3"><b><span data-contrast="none">Join us on June 24th and get flowing.</span></b><span data-ccp-props="{&quot;335559685&quot;:23,&quot;335559738&quot;:240}"> </span></p> <p><b><span data-contrast="auto">Exclusive Offer: </span></b><span data-contrast="auto">As part of the TechStrong community, get a free LIVE pass.</span><span data-ccp-props="{&quot;335551550&quot;:1,&quot;335551620&quot;:1,&quot;335559685&quot;:23,&quot;335559731&quot;:0,&quot;335559737&quot;:0,&quot;335559738&quot;:0}"> </span></p> <p><b><span data-contrast="auto">Use Code: </span></b><span data-contrast="none">TECHSTRONGFLOW</span><span data-ccp-props="{&quot;335551550&quot;:1,&quot;335551620&quot;:1,&quot;335559685&quot;:23,&quot;335559731&quot;:0,&quot;335559737&quot;:0,&quot;335559738&quot;:67}"> </span></p> <p><b><span data-contrast="auto">Get your Live Pass to Flowtopia Live 2026 </span></b><a href="https://www.flowtopia.io/checkout/flowtopia-live-2026-live-pass?coupon_code=TECHSTRONGFLOW" target="_blank" rel="noopener"><b><span data-contrast="none">here.</span></b></a></p>
  73. Using Bicep Modules to Build Enterprise-Grade Azure Infrastructure 

    Fri, 12 Jun 2026 09:48:13 -0000

    <div><img width="768" height="330" src="https://devops.com/wp-content/uploads/2020/10/infrastructure.jpg" class="attachment-large size-large wp-post-image" alt="infrastructure, Terraform, IaC immutable infrastructure Pulumi GitOps" style="margin-bottom: 0px;" decoding="async" srcset="https://devops.com/wp-content/uploads/2020/10/infrastructure.jpg 768w, https://devops.com/wp-content/uploads/2020/10/infrastructure-290x125.jpg 290w, https://devops.com/wp-content/uploads/2020/10/infrastructure-150x64.jpg 150w, https://devops.com/wp-content/uploads/2020/10/infrastructure-500x215.jpg 500w" sizes="(max-width: 768px) 100vw, 768px" /></div><img width="150" height="150" src="https://devops.com/wp-content/uploads/2020/10/infrastructure-150x150.jpg" class="attachment-thumbnail size-thumbnail wp-post-image" alt="infrastructure, Terraform, IaC immutable infrastructure Pulumi GitOps" decoding="async" srcset="https://devops.com/wp-content/uploads/2020/10/infrastructure-150x150.jpg 150w, https://devops.com/wp-content/uploads/2020/10/infrastructure-50x50.jpg 50w, https://devops.com/wp-content/uploads/2020/10/infrastructure-266x266.jpg 266w, https://devops.com/wp-content/uploads/2020/10/infrastructure-60x60.jpg 60w" sizes="(max-width: 150px) 100vw, 150px" />Azure Bicep has become the preferred IaC language for Azure because it’s declarative, simple, modular and deeply integrated with the Azure platform. 
    <div><img width="768" height="330" src="https://devops.com/wp-content/uploads/2020/10/infrastructure.jpg" class="attachment-large size-large wp-post-image" alt="infrastructure, Terraform, IaC immutable infrastructure Pulumi GitOps" style="margin-bottom: 0px;" decoding="async" srcset="https://devops.com/wp-content/uploads/2020/10/infrastructure.jpg 768w, https://devops.com/wp-content/uploads/2020/10/infrastructure-290x125.jpg 290w, https://devops.com/wp-content/uploads/2020/10/infrastructure-150x64.jpg 150w, https://devops.com/wp-content/uploads/2020/10/infrastructure-500x215.jpg 500w" sizes="(max-width: 768px) 100vw, 768px" /></div><img width="150" height="150" src="https://devops.com/wp-content/uploads/2020/10/infrastructure-150x150.jpg" class="attachment-thumbnail size-thumbnail wp-post-image" alt="infrastructure, Terraform, IaC immutable infrastructure Pulumi GitOps" decoding="async" srcset="https://devops.com/wp-content/uploads/2020/10/infrastructure-150x150.jpg 150w, https://devops.com/wp-content/uploads/2020/10/infrastructure-50x50.jpg 50w, https://devops.com/wp-content/uploads/2020/10/infrastructure-266x266.jpg 266w, https://devops.com/wp-content/uploads/2020/10/infrastructure-60x60.jpg 60w" sizes="(max-width: 150px) 100vw, 150px" /><p><span data-contrast="auto"><a href="https://devops.com/infrastructure-as-code-iac-the-key-to-agile-and-automated-cloud-deployments/" target="_blank" rel="noopener">Infrastructure as code (IaC) is no longer optional</a> in modern Azure environments. Teams need repeatable deployments, secure defaults, predictable architecture and strong governance. Azure Bicep has become the preferred IaC language for Azure because it’s declarative, simple, modular and deeply integrated with the Azure platform.</span><span data-ccp-props="{}"> </span></p> <p><span data-contrast="auto">This article breaks down </span><i><span data-contrast="auto">how to design Bicep modules the right way</span></i><span data-contrast="auto"> for enterprise deployments. These patterns come from real-world use cases such as banking, fintech, multitenant SaaS and regulated workloads.</span><span data-ccp-props="{}"> </span></p> <h3 aria-level="2"><span data-contrast="none">Why Bicep is the Standard for Azure IaC</span><span data-ccp-props="{&quot;134245418&quot;:true,&quot;134245529&quot;:true,&quot;335559738&quot;:160,&quot;335559739&quot;:80}"> </span></h3> <p><span data-contrast="auto">Teams that move from ARM and Terraform to Bicep typically do so because Bicep offers:</span><span data-ccp-props="{}"> </span></p> <ul> <li aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="14" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}" data-aria-posinset="1" data-aria-level="1"><span data-contrast="auto">Cleaner Syntax: No more massive JSON ARM templates.</span><span data-ccp-props="{}"> </span></li> </ul> <ul> <li aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="14" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}" data-aria-posinset="2" data-aria-level="1"><span data-contrast="auto">Native Azure Integration</span><span data-ccp-props="{}"> </span></li> </ul> <ul> <li aria-setsize="-1" data-leveltext="o" data-font="Courier New" data-listid="15" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559685&quot;:1440,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Courier New&quot;,&quot;469769242&quot;:[9675],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;o&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}" data-aria-posinset="1" data-aria-level="1"><span data-contrast="auto">IntelliSense</span><span data-ccp-props="{}"> </span></li> </ul> <ul> <li aria-setsize="-1" data-leveltext="o" data-font="Courier New" data-listid="15" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559685&quot;:1440,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Courier New&quot;,&quot;469769242&quot;:[9675],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;o&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}" data-aria-posinset="2" data-aria-level="1"><span data-contrast="auto">Type-checking</span><span data-ccp-props="{}"> </span></li> </ul> <ul> <li aria-setsize="-1" data-leveltext="o" data-font="Courier New" data-listid="15" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559685&quot;:1440,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Courier New&quot;,&quot;469769242&quot;:[9675],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;o&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}" data-aria-posinset="3" data-aria-level="1"><span data-contrast="auto">Automatic API version updates</span><span data-ccp-props="{}"> </span></li> </ul> <ul> <li aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="16" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}" data-aria-posinset="1" data-aria-level="1"><span data-contrast="auto">First-Class Modularity: Modules can describe reusable components like:</span><span data-ccp-props="{}"> </span></li> </ul> <ul> <li aria-setsize="-1" data-leveltext="o" data-font="Courier New" data-listid="17" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559685&quot;:1440,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Courier New&quot;,&quot;469769242&quot;:[9675],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;o&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}" data-aria-posinset="1" data-aria-level="1"><span data-contrast="auto">App Services</span><span data-ccp-props="{}"> </span></li> </ul> <ul> <li aria-setsize="-1" data-leveltext="o" data-font="Courier New" data-listid="17" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559685&quot;:1440,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Courier New&quot;,&quot;469769242&quot;:[9675],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;o&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}" data-aria-posinset="2" data-aria-level="1"><span data-contrast="auto">AKS clusters</span><span data-ccp-props="{}"> </span></li> </ul> <ul> <li aria-setsize="-1" data-leveltext="o" data-font="Courier New" data-listid="17" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559685&quot;:1440,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Courier New&quot;,&quot;469769242&quot;:[9675],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;o&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}" data-aria-posinset="3" data-aria-level="1"><span data-contrast="auto">Front Door Premium</span><span data-ccp-props="{}"> </span></li> </ul> <ul> <li aria-setsize="-1" data-leveltext="o" data-font="Courier New" data-listid="17" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559685&quot;:1440,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Courier New&quot;,&quot;469769242&quot;:[9675],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;o&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}" data-aria-posinset="4" data-aria-level="1"><span data-contrast="auto">Key Vault</span><span data-ccp-props="{}"> </span></li> </ul> <ul> <li aria-setsize="-1" data-leveltext="o" data-font="Courier New" data-listid="17" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559685&quot;:1440,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Courier New&quot;,&quot;469769242&quot;:[9675],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;o&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}" data-aria-posinset="5" data-aria-level="1"><span data-contrast="auto">VNet + subnets</span><span data-ccp-props="{}"> </span></li> </ul> <ul> <li aria-setsize="-1" data-leveltext="o" data-font="Courier New" data-listid="17" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559685&quot;:1440,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Courier New&quot;,&quot;469769242&quot;:[9675],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;o&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}" data-aria-posinset="6" data-aria-level="1"><span data-contrast="auto">WAF policies</span><span data-ccp-props="{}"> </span></li> </ul> <ul> <li aria-setsize="-1" data-leveltext="o" data-font="Courier New" data-listid="17" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559685&quot;:1440,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Courier New&quot;,&quot;469769242&quot;:[9675],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;o&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}" data-aria-posinset="7" data-aria-level="1"><span data-contrast="auto">Private endpoints</span><span data-ccp-props="{}"> </span></li> </ul> <ul> <li aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="16" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}" data-aria-posinset="2" data-aria-level="1"><span data-contrast="auto">Better CI/CD experience: Easier validation, what-if deployment and GitHub Actions integration.</span><span data-ccp-props="{}"> </span></li> </ul> <h3 aria-level="2"><span data-contrast="none">How to Structure Bicep Code for Large Azure Environments</span><span data-ccp-props="{&quot;134245418&quot;:true,&quot;134245529&quot;:true,&quot;335559738&quot;:160,&quot;335559739&quot;:80}"> </span></h3> <p><span data-contrast="auto">A typical enterprise Bicep structure looks like this:</span><span data-ccp-props="{}"> </span></p> <p><span data-contrast="none">/bicep</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">  </span><span data-contrast="none">/modules</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">      </span><span data-contrast="none">aks/</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">         </span><span data-contrast="none">main.bicep</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">      </span><span data-contrast="none">appservice/</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">         </span><span data-contrast="none">main.bicep</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">      </span><span data-contrast="none">frontdoor/</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">         </span><span data-contrast="none">main.bicep</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">      </span><span data-contrast="none">keyvault/</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">         </span><span data-contrast="none">main.bicep</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">      </span><span data-contrast="none">network/</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">         </span><span data-contrast="none">vnet.bicep</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">         </span><span data-contrast="none">subnet.bicep</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">      </span><span data-contrast="none">storage/</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">         </span><span data-contrast="none">main.bicep</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">  </span><span data-contrast="none">/environment</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">      </span><span data-contrast="none">dev/</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">         </span><span data-contrast="none">main.bicep</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">         </span><span data-contrast="none">params.json</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">      </span><span data-contrast="none">qa/</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">         </span><span data-contrast="none">main.bicep</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">         </span><span data-contrast="none">params.json</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">      </span><span data-contrast="none">prod/</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">         </span><span data-contrast="none">main.bicep</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">         </span><span data-contrast="none">params.json</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <h3 aria-level="3"><span data-contrast="none">Key Points</span><span data-ccp-props="{&quot;134245418&quot;:true,&quot;134245529&quot;:true,&quot;335559738&quot;:160,&quot;335559739&quot;:80}"> </span></h3> <ul> <li aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="16" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}" data-aria-posinset="3" data-aria-level="1"><span data-contrast="auto">Modules live separately and never store environment-specific values.</span><span data-ccp-props="{}"> </span></li> </ul> <ul> <li aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="16" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}" data-aria-posinset="4" data-aria-level="1"><span data-contrast="auto">Environment folders contain:</span><span data-ccp-props="{}"> </span></li> </ul> <ul> <li aria-setsize="-1" data-leveltext="o" data-font="Courier New" data-listid="3" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559685&quot;:1440,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Courier New&quot;,&quot;469769242&quot;:[9675],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;o&quot;,&quot;469777815&quot;:&quot;multilevel&quot;}" data-aria-posinset="1" data-aria-level="2"><span data-contrast="auto">main.bicep (composition file)</span><span data-ccp-props="{}"> </span></li> </ul> <ul> <li aria-setsize="-1" data-leveltext="o" data-font="Courier New" data-listid="3" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559685&quot;:1440,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Courier New&quot;,&quot;469769242&quot;:[9675],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;o&quot;,&quot;469777815&quot;:&quot;multilevel&quot;}" data-aria-posinset="2" data-aria-level="2"><span data-contrast="auto">params.json (per-environment values)</span><span data-ccp-props="{}"> </span></li> </ul> <p><span data-contrast="auto">This ensures consistency across dev → qa → prod. </span><span data-ccp-props="{}"> </span></p> <h3 aria-level="2"><span data-contrast="none">Designing a Bicep Module Correctly</span><span data-ccp-props="{&quot;134245418&quot;:true,&quot;134245529&quot;:true,&quot;335559738&quot;:160,&quot;335559739&quot;:80}"> </span></h3> <p><span data-contrast="auto">A module should follow five rules:</span><span data-ccp-props="{}"> </span></p> <p><strong>1. It should deploy one resource (or a tightly-related set). </strong></p> <p><span data-contrast="auto">Examples:</span><span data-ccp-props="{}"> </span></p> <ul> <li aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="4" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;&quot;,&quot;469777815&quot;:&quot;multilevel&quot;}" data-aria-posinset="1" data-aria-level="1"><span data-contrast="auto">A single App Service</span><span data-ccp-props="{}"> </span></li> </ul> <ul> <li aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="4" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;&quot;,&quot;469777815&quot;:&quot;multilevel&quot;}" data-aria-posinset="2" data-aria-level="1"><span data-contrast="auto">A single AKS cluster</span><span data-ccp-props="{}"> </span></li> </ul> <ul> <li aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="4" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;&quot;,&quot;469777815&quot;:&quot;multilevel&quot;}" data-aria-posinset="3" data-aria-level="1"><span data-contrast="auto">A VNet with subnets</span><span data-ccp-props="{}"> </span></li> </ul> <p><span data-ccp-props="{}"> </span></p> <p><strong>2. It must <i>not</i> contain environment-specific values. </strong></p> <p><span data-contrast="auto">These belong in parameter files.</span><span data-ccp-props="{}"> </span></p> <p><span data-ccp-props="{}"> </span></p> <p><strong>3. It should expose outputs. </strong></p> <p><span data-contrast="auto">Useful for chaining modules:</span><span data-ccp-props="{}"> </span></p> <p><span data-contrast="none">output appServiceId string = appService.id</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">output principalId string = appService.identity.principalId</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p>&nbsp;</p> <p><strong>4. It must include secure parameter types. </strong></p> <p><span data-contrast="none">@</span><span data-contrast="none">secure()</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">param adminPassword string</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p>&nbsp;</p> <p><strong>5. It should include defaults but allow overrides. </strong></p> <p><span data-contrast="none">param sku string = &#8216;P1v3&#8217;</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">param httpsOnly bool = true</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-ccp-props="{}"> </span></p> <h3 aria-level="2"><span data-contrast="none">Example of an Enterprise Bicep Module (App Service + Custom Domain)</span><span data-ccp-props="{&quot;134245418&quot;:true,&quot;134245529&quot;:true,&quot;335559738&quot;:160,&quot;335559739&quot;:80}"> </span></h3> <p><span data-contrast="auto">Here is a production-ready example you can reuse.</span><span data-ccp-props="{}"> </span></p> <p><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">modules/appservice/main.bicep</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">param name string</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">param location string</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">param skuName string = &#8216;P1v3&#8217;</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">param httpsOnly bool = true</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">param customDomain string</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">param certificateThumbprint string</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">resource appService &#8216;Microsoft.Web/sites@2023-01-01&#8217; = {</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">  </span><span data-contrast="none">name</span><span data-contrast="none">: </span><span data-contrast="none">name</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">  </span><span data-contrast="none">location</span><span data-contrast="none">: </span><span data-contrast="none">location</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">  </span><span data-contrast="none">properties</span><span data-contrast="none">: {</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">    </span><span data-contrast="none">httpsOnly</span><span data-contrast="none">: </span><span data-contrast="none">httpsOnly</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">  }</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">  </span><span data-contrast="none">sku</span><span data-contrast="none">: {</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">    </span><span data-contrast="none">name</span><span data-contrast="none">: </span><span data-contrast="none">skuName</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">    </span><span data-contrast="none">tier</span><span data-contrast="none">: </span><span data-contrast="none">&#8216;PremiumV3&#8217;</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">  }</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">}</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">resource binding &#8216;Microsoft.Web/sites/hostNameBindings@2023-01-01&#8217; = {</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">  </span><span data-contrast="none">name</span><span data-contrast="none">: </span><span data-contrast="none">&#8216;${name}/${customDomain}&#8217;</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">  </span><span data-contrast="none">properties</span><span data-contrast="none">: {</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">    </span><span data-contrast="none">customHostNameDnsRecordType</span><span data-contrast="none">: </span><span data-contrast="none">&#8216;CName&#8217;</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">    </span><span data-contrast="none">sslState</span><span data-contrast="none">: </span><span data-contrast="none">&#8216;SniEnabled&#8217;</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">    </span><span data-contrast="none">thumbprint</span><span data-contrast="none">: </span><span data-contrast="none">certificateThumbprint</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">  }</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">}</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">output id string = appService.id</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">output defaultHostname string = appService.properties.defaultHostName</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-ccp-props="{}"> </span></p> <p><span data-contrast="auto">This module:</span><span data-ccp-props="{}"> </span></p> <ul> <li aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="5" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;&quot;,&quot;469777815&quot;:&quot;multilevel&quot;}" data-aria-posinset="1" data-aria-level="1"><span data-contrast="auto">Deploys a premium App Service</span><span data-ccp-props="{}"> </span></li> </ul> <ul> <li aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="5" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;&quot;,&quot;469777815&quot;:&quot;multilevel&quot;}" data-aria-posinset="2" data-aria-level="1"><span data-contrast="auto">Enforces HTTPS</span><span data-ccp-props="{}"> </span></li> </ul> <ul> <li aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="5" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;&quot;,&quot;469777815&quot;:&quot;multilevel&quot;}" data-aria-posinset="3" data-aria-level="1"><span data-contrast="auto">Adds custom domain with SNI certificate binding</span><span data-ccp-props="{}"> </span></li> </ul> <ul> <li aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="5" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;&quot;,&quot;469777815&quot;:&quot;multilevel&quot;}" data-aria-posinset="4" data-aria-level="1"><span data-contrast="auto">Exports outputs for Front Door or API Management</span><span data-ccp-props="{}"> </span></li> </ul> <h3 aria-level="2"><span data-contrast="none">Composing Multiple Modules With an Environment File</span><span data-ccp-props="{&quot;134245418&quot;:true,&quot;134245529&quot;:true,&quot;335559738&quot;:160,&quot;335559739&quot;:80}"> </span></h3> <p><span data-contrast="auto">Example: prod/main.bicep</span><span data-ccp-props="{}"> </span></p> <p><span data-contrast="none">param location string = &#8216;westeurope&#8217;</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">param appName string = &#8216;prod-myapp&#8217;</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">param domain string = &#8216;api.company.com&#8217;</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">param certThumbprint string</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">module appService &#8216;./modules/appservice/main.bicep&#8217; = {</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">  </span><span data-contrast="none">name</span><span data-contrast="none">: </span><span data-contrast="none">&#8216;prodAppService&#8217;</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">  </span><span data-contrast="none">params</span><span data-contrast="none">: {</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">    </span><span data-contrast="none">name</span><span data-contrast="none">: </span><span data-contrast="none">appName</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">    </span><span data-contrast="none">location</span><span data-contrast="none">: </span><span data-contrast="none">location</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">    </span><span data-contrast="none">skuName</span><span data-contrast="none">: </span><span data-contrast="none">&#8216;P2v3&#8217;</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">    </span><span data-contrast="none">customDomain</span><span data-contrast="none">: </span><span data-contrast="none">domain</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">    </span><span data-contrast="none">certificateThumbprint</span><span data-contrast="none">: </span><span data-contrast="none">certThumbprint</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">  }</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">}</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">module frontdoor &#8216;./modules/frontdoor/main.bicep&#8217; = {</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">  </span><span data-contrast="none">name</span><span data-contrast="none">: </span><span data-contrast="none">&#8216;prodFrontDoor&#8217;</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">  </span><span data-contrast="none">params</span><span data-contrast="none">: {</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">    </span><span data-contrast="none">backendHostname</span><span data-contrast="none">: </span><span data-contrast="none">appService.outputs.defaultHostname</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">    </span><span data-contrast="none">backendId</span><span data-contrast="none">: </span><span data-contrast="none">appService.outputs.id</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">  }</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">}</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-ccp-props="{}"> </span></p> <p><span data-contrast="auto">Why this matters:</span><span data-ccp-props="{}"> </span></p> <ul> <li aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="6" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;&quot;,&quot;469777815&quot;:&quot;multilevel&quot;}" data-aria-posinset="1" data-aria-level="1"><span data-contrast="auto">Front Door depends on App Service output</span><span data-ccp-props="{}"> </span></li> </ul> <ul> <li aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="6" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;&quot;,&quot;469777815&quot;:&quot;multilevel&quot;}" data-aria-posinset="2" data-aria-level="1"><span data-contrast="auto">Environment parameters flow through modules cleanly</span><span data-ccp-props="{}"> </span></li> </ul> <ul> <li aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="6" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;&quot;,&quot;469777815&quot;:&quot;multilevel&quot;}" data-aria-posinset="3" data-aria-level="1"><span data-contrast="auto">No duplication of logic</span><span data-ccp-props="{}"> </span></li> </ul> <ul> <li aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="6" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;&quot;,&quot;469777815&quot;:&quot;multilevel&quot;}" data-aria-posinset="4" data-aria-level="1"><span data-contrast="auto">Clear separation of concerns</span><span data-ccp-props="{}"> </span></li> </ul> <h3 aria-level="2"><span data-contrast="none">Adding CI/CD With GitHub Actions</span><span data-ccp-props="{&quot;134245418&quot;:true,&quot;134245529&quot;:true,&quot;335559738&quot;:160,&quot;335559739&quot;:80}"> </span></h3> <p><span data-contrast="auto">A recommended pipeline: </span><i><span data-contrast="auto">Validate → what-if → deploy</span></i><span data-ccp-props="{}"> </span></p> <p><span data-contrast="none">name</span><span data-contrast="none">: </span><span data-contrast="none">Deploy Bicep</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">on</span><span data-contrast="none">:</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">  </span><span data-contrast="none">push</span><span data-contrast="none">:</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">    </span><span data-contrast="none">branches</span><span data-contrast="none">:</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">      &#8211; </span><span data-contrast="none">main</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">jobs</span><span data-contrast="none">:</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">  </span><span data-contrast="none">deploy</span><span data-contrast="none">:</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">    </span><span data-contrast="none">runs-on</span><span data-contrast="none">: </span><span data-contrast="none">ubuntu-latest</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">    </span><span data-contrast="none">steps</span><span data-contrast="none">:</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">    &#8211; </span><span data-contrast="none">uses</span><span data-contrast="none">: </span><span data-contrast="none">actions/checkout@v4</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">      </span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">    &#8211; </span><span data-contrast="none">uses</span><span data-contrast="none">: </span><span data-contrast="none">azure/login@v1</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">      </span><span data-contrast="none">with</span><span data-contrast="none">:</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">        </span><span data-contrast="none">client-id</span><span data-contrast="none">: </span><span data-contrast="none">${{ secrets.AZURE_CLIENT_ID }}</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">        </span><span data-contrast="none">tenant-id</span><span data-contrast="none">: </span><span data-contrast="none">${{ secrets.AZURE_TENANT_ID }}</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">        </span><span data-contrast="none">subscription-id</span><span data-contrast="none">: </span><span data-contrast="none">${{ secrets.AZURE_SUBSCRIPTION_ID }}</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">    &#8211; </span><span data-contrast="none">name</span><span data-contrast="none">: </span><span data-contrast="none">Validate</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">      </span><span data-contrast="none">run</span><span data-contrast="none">: </span><span data-contrast="none">|</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">        az deployment sub validate \</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">          &#8211;template-file environment/prod/main.bicep</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">    &#8211; </span><span data-contrast="none">name</span><span data-contrast="none">: </span><span data-contrast="none">What-If</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">      </span><span data-contrast="none">run</span><span data-contrast="none">: </span><span data-contrast="none">|</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">        az deployment sub what-if \</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">          &#8211;template-file environment/prod/main.bicep</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">    &#8211; </span><span data-contrast="none">name</span><span data-contrast="none">: </span><span data-contrast="none">Deploy</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">      </span><span data-contrast="none">run</span><span data-contrast="none">: </span><span data-contrast="none">|</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">        az deployment sub create \</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-contrast="none">          &#8211;template-file environment/prod/main.bicep</span><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-ccp-props="{&quot;201341983&quot;:2,&quot;335557856&quot;:2039583,&quot;335559739&quot;:0,&quot;335559740&quot;:285}"> </span></p> <p><span data-ccp-props="{}"> </span></p> <p><span data-contrast="auto">This gives:</span><span data-ccp-props="{}"> </span></p> <ul> <li aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="7" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;&quot;,&quot;469777815&quot;:&quot;multilevel&quot;}" data-aria-posinset="1" data-aria-level="1"><span data-contrast="auto">Predictable deployments</span><span data-ccp-props="{}"> </span></li> </ul> <ul> <li aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="7" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;&quot;,&quot;469777815&quot;:&quot;multilevel&quot;}" data-aria-posinset="2" data-aria-level="1"><span data-contrast="auto">No manual approvals</span><span data-ccp-props="{}"> </span></li> </ul> <ul> <li aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="7" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;&quot;,&quot;469777815&quot;:&quot;multilevel&quot;}" data-aria-posinset="3" data-aria-level="1"><span data-contrast="auto">Auditability</span><span data-ccp-props="{}"> </span></li> </ul> <ul> <li aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="7" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;&quot;,&quot;469777815&quot;:&quot;multilevel&quot;}" data-aria-posinset="4" data-aria-level="1"><span data-contrast="auto">Cloud-native authentication via OIDC (no secrets)</span><span data-ccp-props="{}"> </span></li> </ul> <h3 aria-level="2"><span data-contrast="none">Governance and Enforcement Using Azure Policy</span><span data-ccp-props="{&quot;134245418&quot;:true,&quot;134245529&quot;:true,&quot;335559738&quot;:160,&quot;335559739&quot;:80}"> </span></h3> <p><span data-contrast="auto">Azure Policy can enforce IaC best practices, for example:</span><span data-ccp-props="{}"> </span></p> <ul> <li aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="8" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;&quot;,&quot;469777815&quot;:&quot;multilevel&quot;}" data-aria-posinset="1" data-aria-level="1"><span data-contrast="auto">Allow only Bicep deployments (tagging rules)</span><span data-ccp-props="{}"> </span></li> </ul> <ul> <li aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="8" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;&quot;,&quot;469777815&quot;:&quot;multilevel&quot;}" data-aria-posinset="2" data-aria-level="1"><span data-contrast="auto">Enforce HTTPS-only App Services</span><span data-ccp-props="{}"> </span></li> </ul> <ul> <li aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="8" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;&quot;,&quot;469777815&quot;:&quot;multilevel&quot;}" data-aria-posinset="3" data-aria-level="1"><span data-contrast="auto">Enforce diagnostic logs</span><span data-ccp-props="{}"> </span></li> </ul> <ul> <li aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="8" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;&quot;,&quot;469777815&quot;:&quot;multilevel&quot;}" data-aria-posinset="4" data-aria-level="1"><span data-contrast="auto">Prevent public IP creation</span><span data-ccp-props="{}"> </span></li> </ul> <ul> <li aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="8" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;&quot;,&quot;469777815&quot;:&quot;multilevel&quot;}" data-aria-posinset="5" data-aria-level="1"><span data-contrast="auto">Require private endpoints</span><span data-ccp-props="{}"> </span></li> </ul> <p><span data-contrast="auto">These policies make sure all deployments — Bicep or otherwise — follow standards.</span><span data-ccp-props="{}"> </span></p> <h3 aria-level="2"><span data-contrast="none">Final Thoughts</span><span data-ccp-props="{&quot;134245418&quot;:true,&quot;134245529&quot;:true,&quot;335559738&quot;:160,&quot;335559739&quot;:80}"> </span></h3> <p><span data-contrast="auto">Bicep is ideal for building large Azure environments when done correctly. By using a module-based approach, separating environment values, integrating CI/CD and combining everything with Azure Policy, you get:</span><span data-ccp-props="{}"> </span></p> <ul> <li aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="9" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;&quot;,&quot;469777815&quot;:&quot;multilevel&quot;}" data-aria-posinset="1" data-aria-level="1"><span data-contrast="auto">Standardized deployments</span><span data-ccp-props="{}"> </span></li> </ul> <ul> <li aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="9" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;&quot;,&quot;469777815&quot;:&quot;multilevel&quot;}" data-aria-posinset="2" data-aria-level="1"><span data-contrast="auto">Reusable patterns</span><span data-ccp-props="{}"> </span></li> </ul> <ul> <li aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="9" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;&quot;,&quot;469777815&quot;:&quot;multilevel&quot;}" data-aria-posinset="3" data-aria-level="1"><span data-contrast="auto">Lower operational overhead</span><span data-ccp-props="{}"> </span></li> </ul> <ul> <li aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="9" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;&quot;,&quot;469777815&quot;:&quot;multilevel&quot;}" data-aria-posinset="4" data-aria-level="1"><span data-contrast="auto">Strong governance</span><span data-ccp-props="{}"> </span></li> </ul> <ul> <li aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="9" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;&quot;,&quot;469777815&quot;:&quot;multilevel&quot;}" data-aria-posinset="5" data-aria-level="1"><span data-contrast="auto">Easier AKS, App Service and Front Door automation</span><span data-ccp-props="{}"> </span></li> </ul> <p><span data-contrast="auto">These practices are exactly what senior-level architects and MVP reviewers look for, because they demonstrate real-world engineering maturity.</span><span data-ccp-props="{}"> </span></p>
  74. Shift Left to the Developer’s Machine: Building Local Git Security Gates 

    Fri, 12 Jun 2026 07:04:21 -0000

    <div><img width="770" height="330" src="https://devops.com/wp-content/uploads/2026/02/devsecops1.jpg" class="attachment-large size-large wp-post-image" alt="" style="margin-bottom: 0px;" decoding="async" srcset="https://devops.com/wp-content/uploads/2026/02/devsecops1.jpg 770w, https://devops.com/wp-content/uploads/2026/02/devsecops1-290x124.jpg 290w, https://devops.com/wp-content/uploads/2026/02/devsecops1-360x154.jpg 360w, https://devops.com/wp-content/uploads/2026/02/devsecops1-400x171.jpg 400w" sizes="(max-width: 770px) 100vw, 770px" /></div><img width="150" height="150" src="https://devops.com/wp-content/uploads/2026/02/devsecops1-150x150.jpg" class="attachment-thumbnail size-thumbnail wp-post-image" alt="" decoding="async" />Shift left to the developer's machine. The principle is what matters: Stop secrets before they ship. The tooling is a means to that end. 
    <div><img width="770" height="330" src="https://devops.com/wp-content/uploads/2026/02/devsecops1.jpg" class="attachment-large size-large wp-post-image" alt="" style="margin-bottom: 0px;" decoding="async" srcset="https://devops.com/wp-content/uploads/2026/02/devsecops1.jpg 770w, https://devops.com/wp-content/uploads/2026/02/devsecops1-290x124.jpg 290w, https://devops.com/wp-content/uploads/2026/02/devsecops1-360x154.jpg 360w, https://devops.com/wp-content/uploads/2026/02/devsecops1-400x171.jpg 400w" sizes="(max-width: 770px) 100vw, 770px" /></div><img width="150" height="150" src="https://devops.com/wp-content/uploads/2026/02/devsecops1-150x150.jpg" class="attachment-thumbnail size-thumbnail wp-post-image" alt="" decoding="async" /><p><span data-contrast="auto">A developer pushes one file. It contains an AWS access key left in a configuration block. Five minutes later, CI catches it. By then, the secret is in the remote repository, cached by mirrors and potentially forked. The developer rotates the key, scrubs the commit history and spends the rest of the afternoon on incident response. The real question isn&#8217;t how to clean up faster — it&#8217;s why the secret left the developer&#8217;s machine in the first place.</span><span data-ccp-props="{}"> </span></p> <h3 aria-level="2"><span data-contrast="none">The Five-Minute Gap</span><span data-ccp-props="{&quot;134245418&quot;:true,&quot;134245529&quot;:true,&quot;335559738&quot;:200,&quot;335559739&quot;:0}"> </span></h3> <p><span data-contrast="auto">Most engineering teams have invested in <a href="https://devops.com/github-adds-37-new-secret-detectors-in-march-extends-scanning-to-ai-coding-agents/" target="_blank" rel="noopener">CI-based secret scanning</a>. Tools such as GitHub Advanced Security, GitGuardian and TruffleHog&#8217;s CI integration catch leaked credentials in pull requests and pushed branches. This is good, but it&#8217;s also too late.</span><span data-ccp-props="{}"> </span></p> <p><span data-contrast="auto">The GitGuardian 2026 State of Secrets Sprawl report found that 29 million secrets were detected on GitHub in 2025 alone — a 34% year-over-year increase and the largest single-year jump ever recorded. Worse, 64% of secrets leaked back in 2022 remain unrevoked even today. The gap between Git push and CI detection is typically 3 to 10 minutes. In that window, the secret hits the remote, enters reflog and becomes available to anyone with read access. Even if CI blocks the PR, the credential is already exposed in Git history.</span><span data-ccp-props="{}"> </span></p> <p><span data-contrast="auto">Rotation mitigates the immediate risk. However, it doesn&#8217;t eliminate the exposure window and it doesn&#8217;t address the root cause: The secret should never have been committed.</span><span data-ccp-props="{}"> </span></p> <p><span data-ccp-props="{}"> <img fetchpriority="high" decoding="async" class="alignnone wp-image-185492 size-full" src="https://devops.com/wp-content/uploads/2026/06/Picture1-9.png" alt="" width="624" height="348" srcset="https://devops.com/wp-content/uploads/2026/06/Picture1-9.png 624w, https://devops.com/wp-content/uploads/2026/06/Picture1-9-233x130.png 233w, https://devops.com/wp-content/uploads/2026/06/Picture1-9-276x154.png 276w, https://devops.com/wp-content/uploads/2026/06/Picture1-9-400x223.png 400w" sizes="(max-width: 624px) 100vw, 624px" /></span></p> <p><i><span data-contrast="none">Note: Without local security gates, secrets reach the remote before CI can intervene. With local gates, the commit is blocked before anything leaves the developer&#8217;s machine.</span></i><span data-ccp-props="{}"> </span></p> <h3 aria-level="2"><span data-contrast="none">Two Gates, Two Purposes</span><span data-ccp-props="{&quot;134245418&quot;:true,&quot;134245529&quot;:true,&quot;335559738&quot;:200,&quot;335559739&quot;:0}"> </span></h3> <p><span data-contrast="auto">Git provides two natural interception points on the developer&#8217;s machine: Pre-commit and pre-push hooks. Each serves a different security function.</span><span data-ccp-props="{}"> </span></p> <p><span data-contrast="auto">The pre-commit gate focuses on secret detection. Before a commit is finalized, the hook scans staged files for API keys, tokens, passwords and other credential patterns. The critical detail: The scan should target the Git index (the staged snapshot), not the working tree. Scanning the working tree picks up unstaged experiments and editor temp files, producing false positives that train developers to ignore findings.</span><span data-ccp-props="{}"> </span></p> <p><span data-contrast="auto">The pre-push gate handles broader security and quality concerns. Before code reaches the remote, the hook runs static analysis on changed files, checks dependency manifests against known vulnerability databases and optionally enforces test coverage thresholds. This gate is heavier, so it runs at push time rather than on every commit.</span><span data-ccp-props="{}"> </span></p> <p><span data-ccp-props="{}"> <img decoding="async" class="alignnone size-full wp-image-185493" src="https://devops.com/wp-content/uploads/2026/06/Picture2-3.png" alt="" width="624" height="348" srcset="https://devops.com/wp-content/uploads/2026/06/Picture2-3.png 624w, https://devops.com/wp-content/uploads/2026/06/Picture2-3-233x130.png 233w, https://devops.com/wp-content/uploads/2026/06/Picture2-3-276x154.png 276w, https://devops.com/wp-content/uploads/2026/06/Picture2-3-400x223.png 400w" sizes="(max-width: 624px) 100vw, 624px" /></span></p> <p><i><span data-contrast="none">Note: The two-gate model separates fast secret detection (pre-commit) from deeper security and quality scanning (pre-push).</span></i><span data-ccp-props="{}"> </span></p> <p><span data-contrast="auto">This separation matters. Secret detection needs to be fast and run on every commit — developers will bypass a hook that adds 30 seconds to their commit cycle. Vulnerability scanning and test execution are slower but acceptable at push time, when the developer is already context-switching.</span><span data-ccp-props="{}"> </span></p> <h3 aria-level="2"><span data-contrast="none">What a Local Security Gate Should Do</span><span data-ccp-props="{&quot;134245418&quot;:true,&quot;134245529&quot;:true,&quot;335559738&quot;:200,&quot;335559739&quot;:0}"> </span></h3> <p><span data-contrast="auto">A well-designed local security gate follows a few principles:</span><span data-ccp-props="{}"> </span></p> <ul> <li aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="11" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}" data-aria-posinset="1" data-aria-level="1"><span data-contrast="auto">Scan the Right Thing: Pre-commit scanning should materialize the Git index into a temporary snapshot and scan it, not the working tree. This ensures you&#8217;re checking exactly what will be committed, nothing more.</span><span data-ccp-props="{}"> </span></li> </ul> <ul> <li aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="11" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}" data-aria-posinset="2" data-aria-level="1"><span data-contrast="auto">Distinguish Verified From Unverified: Not all secret findings are equal. A confirmed live AWS key is a different severity than a string that matches a regex pattern. Tools such as TruffleHog can verify whether a detected credential is actually active. The gate should hard-block verified secrets while making unverified findings configurable — block or warn, the team decides.</span><span data-ccp-props="{}"> </span></li> </ul> <ul> <li aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="11" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}" data-aria-posinset="3" data-aria-level="1"><span data-contrast="auto">Require Accountability for Suppressions: Every codebase has false positives. The allowlist should require an owner, a reason and an expiration date for each suppression — no anonymous, permanent exceptions. When an allowlist entry expires, the gate should warn the developer and force a re-evaluation.</span><span data-ccp-props="{}"> </span></li> </ul> <ul> <li aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="11" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}" data-aria-posinset="4" data-aria-level="1"><span data-contrast="auto">Be Honest About Its Limits: Git hooks can be bypassed with &#8211;no-verify. Any local security gate is a guardrail, not a fence. It catches mistakes. It doesn&#8217;t prevent malice. The correct architecture pairs local gates with CI scanning to provide defense-in-depth.</span><span data-ccp-props="{}"> </span></li> </ul> <h3 aria-level="2"><span data-contrast="none">Practical Implementation With Prehook</span><span data-ccp-props="{&quot;134245418&quot;:true,&quot;134245529&quot;:true,&quot;335559738&quot;:200,&quot;335559739&quot;:0}"> </span></h3> <p><span data-contrast="auto">I built prehook (https://github.com/arunsanna/prehook) to implement this pattern as a single Go binary with no runtime dependencies beyond the scanner tools themselves. Here&#8217;s what the setup looks like:</span><span data-ccp-props="{}"> </span></p> <p><span data-contrast="none">prehook init       # generates .prehook.yaml with secure defaults</span><br /> <span data-contrast="none">prehook install    # wires into .git/hooks/pre-commit and pre-push</span><br /> <span data-contrast="none">prehook doctor     # validates all scanner binaries are present</span><span data-ccp-props="{}"> </span></p> <p><span data-contrast="auto">The configuration is a single YAML file checked into the repository:</span><span data-ccp-props="{}"> </span></p> <p><span data-contrast="none">version: 1</span><br /> <span data-contrast="none">pre_commit:</span><br /> <span data-contrast="none">  blocking: true</span><br /> <span data-contrast="none">  gitleaks:</span><br /> <span data-contrast="none">    enabled: true</span><br /> <span data-contrast="none">    timeout: 2m</span><br /> <span data-contrast="none">  trufflehog:</span><br /> <span data-contrast="none">    enabled: true</span><br /> <span data-contrast="none">    block_verified: true    # hard-block confirmed live secrets</span><br /> <span data-contrast="none">    block_unknown: false    # warn on unverified findings</span></p> <p><span data-contrast="none">pre_push:</span><br /> <span data-contrast="none">  semgrep:</span><br /> <span data-contrast="none">    enabled: true</span><br /> <span data-contrast="none">  osv:</span><br /> <span data-contrast="none">    enabled: true           # skips if no dependency manifests changed</span><br /> <span data-contrast="none">  trivy:</span><br /> <span data-contrast="none">    severity: HIGH,CRITICAL</span><br /> <span data-contrast="none">  quality:</span><br /> <span data-contrast="none">    test_command: go test ./&#8230;</span><br /> <span data-contrast="none">    coverage:</span><br /> <span data-contrast="none">      enabled: true</span><br /> <span data-contrast="none">      threshold: 60         # minimum coverage percentage to push</span><span data-ccp-props="{}"> </span></p> <p><span data-contrast="auto">The doctor command validates the local environment, checking that each enabled scanner is installed and optionally enforcing version pins. This is particularly useful for team onboarding — add prehook doctor to your setup documentation, and new developers know exactly what&#8217;s missing before they write their first commit.</span><span data-ccp-props="{}"> </span></p> <p><span data-ccp-props="{}"> <img decoding="async" class="alignnone size-full wp-image-185494" src="https://devops.com/wp-content/uploads/2026/06/Picture3-3.png" alt="" width="432" height="579" srcset="https://devops.com/wp-content/uploads/2026/06/Picture3-3.png 432w, https://devops.com/wp-content/uploads/2026/06/Picture3-3-97x130.png 97w, https://devops.com/wp-content/uploads/2026/06/Picture3-3-115x154.png 115w, https://devops.com/wp-content/uploads/2026/06/Picture3-3-190x254.png 190w" sizes="(max-width: 432px) 100vw, 432px" /></span></p> <p><i><span data-contrast="none">Note: When the prehook catches a secret, the commit is blocked. The developer rotates the credential, removes it from the code and re-commits. Verified secrets are always blocked; unverified findings follow the team&#8217;s configured policy.</span></i><span data-ccp-props="{}"> </span></p> <h3 aria-level="2"><span data-contrast="none">What This Pattern Doesn&#8217;t Solve</span><span data-ccp-props="{&quot;134245418&quot;:true,&quot;134245529&quot;:true,&quot;335559738&quot;:200,&quot;335559739&quot;:0}"> </span></h3> <p><span data-contrast="auto">Local Git hooks are not a complete secrets management strategy. They are one layer in a defense-in-depth approach. Specifically:</span><span data-ccp-props="{}"> </span></p> <ul> <li aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="12" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}" data-aria-posinset="1" data-aria-level="1"><span data-contrast="auto">They can be bypassed. A developer running git commit &#8211;no-verify skips all hooks. This is by design in Git. CI scanning is the backstop that catches what local hooks miss, whether through bypass or misconfiguration.</span><span data-ccp-props="{}"> </span></li> </ul> <ul> <li aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="12" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}" data-aria-posinset="2" data-aria-level="1"><span data-contrast="auto">They don&#8217;t replace secrets management. The right answer is to never have secrets in code at all — use environment variables, vault systems or cloud IAM roles. Local hooks catch cases where that discipline breaks down, which happens inevitably.</span><span data-ccp-props="{}"> </span></li> </ul> <ul> <li aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="12" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}" data-aria-posinset="3" data-aria-level="1"><span data-contrast="auto">They require scanner installation. Unlike a SaaS CI integration, local hooks depend on each developer having the scanner binaries installed. This is a real adoption barrier. Tools such as prehook doctor reduce friction, but the dependency exists.</span><span data-ccp-props="{}"> </span></li> </ul> <ul> <li aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="12" data-list-defn-props="{&quot;335552541&quot;:1,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}" data-aria-posinset="4" data-aria-level="1"><span data-contrast="auto">The value proposition is not perfection. It&#8217;s catching the 90% of accidental leaks that happen because a developer was moving fast, testing locally or copying from a Stack Overflow answer including a placeholder key that wasn&#8217;t actually a placeholder.</span><span data-ccp-props="{}"> </span></li> </ul> <h3 aria-level="2"><span data-contrast="none">Getting Started</span><span data-ccp-props="{&quot;134245418&quot;:true,&quot;134245529&quot;:true,&quot;335559738&quot;:200,&quot;335559739&quot;:0}"> </span></h3> <p><span data-contrast="auto">If you want to implement local Git security gates on your team:</span><span data-ccp-props="{}"> </span></p> <ol> <li><span data-contrast="auto">Start with secret detection only. Wire up Gitleaks or TruffleHog in a pre-commit hook. This is the highest-value, lowest-friction starting point.</span><span data-ccp-props="{}"> </span></li> <li><span data-contrast="auto">Scan the index, not the working tree. Avoid false positives from unstaged changes.</span><span data-ccp-props="{}"> </span></li> <li><span data-contrast="auto">Keep pre-commit fast. Under five seconds. Anything slower, and developers bypass it.</span><span data-ccp-props="{}"> </span></li> <li><span data-contrast="auto">Add vulnerability scanning at push time. Semgrep, OSV-Scanner and Trivy are good options for the pre-push gate.</span><span data-ccp-props="{}"> </span></li> <li><span data-contrast="auto">Pair with CI. Local hooks catch mistakes. CI enforces policy.</span><span data-ccp-props="{}"> </span></li> </ol> <p><span data-contrast="auto">You can build this yourself with shell scripts, use a framework such as pre-commit or use a purpose-built tool such as prehook (https://github.com/arunsanna/prehook) that handles the index snapshot, scanner orchestration and allowlist management in a single binary.</span><span data-ccp-props="{}"> </span></p> <p><span data-contrast="auto">The principle is what matters: Stop secrets before they ship. The tooling is a means to that end.</span><span data-ccp-props="{}"> </span></p> <p><i><span data-contrast="auto">You can&#8217;t un-push a secret. However, you can stop it from being pushed in the first place.</span></i><span data-ccp-props="{}"> </span></p>
  75. Why Your Observability Stack Is Costing You More Than Your Cloud Bill

    Thu, 11 Jun 2026 20:37:15 -0000

    <div><img width="770" height="330" src="https://devops.com/wp-content/uploads/2026/06/observabilitystack-Large-e1781202410505.jpeg" class="attachment-large size-large wp-post-image" alt="" style="margin-bottom: 0px;" decoding="async" /></div><img width="150" height="150" src="https://devops.com/wp-content/uploads/2026/06/observabilitystack-Large-150x150.jpeg" class="attachment-thumbnail size-thumbnail wp-post-image" alt="" decoding="async" />There&#8217;s a pattern playing out across engineering teams right now that nobody talks about openly: the tool meant to reduce operational complexity has quietly become one of the biggest line items on the infrastructure budget. Observability spending is out of control, and for most teams, it&#8217;s not because they&#8217;re monitoring too much. It&#8217;s because they&#8217;re [&#8230;]
    <div><img width="770" height="330" src="https://devops.com/wp-content/uploads/2026/06/observabilitystack-Large-e1781202410505.jpeg" class="attachment-large size-large wp-post-image" alt="" style="margin-bottom: 0px;" decoding="async" /></div><img width="150" height="150" src="https://devops.com/wp-content/uploads/2026/06/observabilitystack-Large-150x150.jpeg" class="attachment-thumbnail size-thumbnail wp-post-image" alt="" decoding="async" /><p>There&#8217;s a pattern playing out across engineering teams right now that nobody talks about openly: the tool meant to reduce operational complexity has quietly become one of the biggest line items on the infrastructure budget.</p> <p>Observability spending is out of control, and for most teams, it&#8217;s not because they&#8217;re monitoring too much. It&#8217;s because they&#8217;re paying for platforms designed for enterprises ten times their size, ingesting data they&#8217;ll never query, and running three or four disconnected tools that still don&#8217;t give them a single coherent picture of what&#8217;s happening in production.</p> <p>This isn&#8217;t a niche problem. It&#8217;s one of the defining operational challenges of 2026.</p> <h3><strong>The Hidden Cost of &#8220;Industry Standard&#8221; Tooling</strong></h3> <p>For the better part of a decade, the default answer to observability was to pick whichever platform the largest companies were using and figure out the budget later. That logic made some sense when infrastructure was simpler, and telemetry volumes were lower.</p> <p>It doesn&#8217;t hold up today.</p> <p>The average engineering team running Kubernetes across even a modest multi-cloud setup can generate millions of log lines and spans per hour. At per-GB or per-host pricing models used by most legacy observability vendors, that volume compounds fast. Teams report seeing observability bills balloon by 3–4x within 18 months of scaling, often without a corresponding increase in actual insight.</p> <p>Worse, the cost isn&#8217;t just financial. Sprawling observability stacks, where metrics live in one tool, logs in another, and traces in a third, create a cognitive tax on every engineer doing incident response. When an alert fires at 2 a.m., the last thing you want is to pivot between four dashboards trying to correlate what went wrong.</p> <h3><strong>Fragmentation Is the Real Bottleneck</strong></h3> <p>The conversation about observability in 2026 has shifted from &#8220;are you monitoring?&#8221; to &#8220;can your team actually use what you&#8217;re monitoring?&#8221;</p> <p>That distinction matters. Most teams have plenty of data. What they lack is context, the ability to move from a spike in error rates to the relevant trace, to the exact log lines, to the infrastructure event that caused it, all within a single workflow and without switching tools.</p> <p>This is where the architecture of your observability platform becomes a first-class engineering decision, not just a vendor procurement choice.</p> <p>Platforms built on a unified data model, where logs, metrics, and traces share the same underlying pipeline and correlation layer, fundamentally change what on-call looks like. Engineers stop triaging across disconnected dashboards and start investigating. Mean time to resolution drops not because the engineers got faster, but because the tool stopped slowing them down.</p> <h3><strong>OpenTelemetry Changed the Calculus</strong></h3> <p>One shift that&#8217;s genuinely reshaping the observability market is the maturation of OpenTelemetry as a standard. In 2026, OTel is no longer an aspirational project; it&#8217;s production-grade, widely supported, and increasingly the default instrumentation choice for teams building on Kubernetes, serverless, and distributed microservice architectures.</p> <p>What OpenTelemetry means in practice: your telemetry data is no longer hostage to a single vendor&#8217;s SDK. You can instrument once and route data wherever you want. This is eroding one of the biggest switching-cost advantages legacy observability platforms relied on.</p> <p>For engineering teams, the practical implication is significant. You can now evaluate observability platforms on the merits of their analysis, visualization, and alerting capabilities, not on how deeply their agents are embedded in your codebase.</p> <h3><strong>What Modern Teams Are Actually Looking For</strong></h3> <p>Conversations with SRE and platform engineering teams consistently surface the same set of priorities when they&#8217;re evaluating or re-evaluating their observability stack:</p> <p><strong>Unified telemetry correlation.</strong> Not just ingesting logs, metrics, and traces separately, but correlating them automatically so that moving from a failed deployment to its root cause takes minutes, not an hour of cross-tool investigation.</p> <p><strong>Predictable pricing.</strong> Per-host or flat-rate models that don&#8217;t punish growth. Teams that are scaling fast can&#8217;t afford a billing model where every new service or container spikes the observability bill unpredictably.</p> <p><strong>Fast time-to-value.</strong> Setup overhead is a real concern, especially for teams without dedicated platform engineers. Platforms that can be fully instrumented in under an hour, with out-of-the-box dashboards for Kubernetes, Docker, and common cloud services, reduce the operational friction of getting observability right from the start.</p> <p><strong>Infrastructure-aware alerting.</strong> Alerts that understand topology, that a pod restart in a specific namespace is related to a database timeout, which connects to a configuration change pushed 20 minutes ago, dramatically reduce alert noise and the cognitive load on whoever is on call.</p> <h3><strong>A Practical Framework for Evaluating Your Stack</strong></h3> <p>If you&#8217;re reviewing your observability setup right now, here are the questions worth asking before renewing a contract or adopting something new:</p> <p><em>Does your current platform consolidate logs, metrics, and traces into a single interface, or are your engineers still switching between tools during incidents?</em></p> <p><em>How does your observability bill scale with infrastructure growth? If doubling your container count doubles your bill, that&#8217;s worth modeling out.</em></p> <p><em>What does your average setup time look like for a new service? If it takes more than a day to get meaningful visibility into a new microservice, the instrumentation burden is too high.</em></p> <p><em>Are you paying for features your team doesn&#8217;t use?</em> Enterprise observability platforms are often bought for capabilities that never get deployed: SIEM integrations, compliance dashboards and ML anomaly detection that require weeks of baseline tuning before they&#8217;re useful.</p> <h3><strong>The Emergence of Developer-First Observability</strong></h3> <p>One of the more interesting market developments of the past 18 months is the rise of platforms built specifically for the engineering teams doing the work, not for the security operations center or the compliance team.</p> <p>Middleware is a good example of this shift. It&#8217;s built around the idea that <a href="https://middleware.io/blog/observability/">observability</a> should be fast to deploy, unified across signals by default, and priced in a way that doesn&#8217;t create a finance conversation every time a team decides to instrument something new. For teams running Kubernetes and distributed services that need production-grade visibility without the onboarding overhead or unpredictable billing of legacy platforms, it offers a meaningfully different value proposition.</p> <p>The broader trend is real regardless of which specific platform a team chooses: the &#8220;enterprise observability suite&#8221; model is losing ground to platforms that prioritize developer experience, OpenTelemetry-native pipelines, and operational simplicity.</p> <h3><strong>Getting the Architecture Right</strong></h3> <p>The teams getting observability right in 2026 share a few common traits. They&#8217;ve standardized on OpenTelemetry for instrumentation. They&#8217;ve consolidated signal types onto a single platform rather than running parallel stacks. And they&#8217;ve tied their observability investment directly to SLOs that matter to the business — not just system metrics that look good on a dashboard.</p> <p>Well-done observability isn&#8217;t about having the most comprehensive monitoring. It&#8217;s about having the right signals, in the right place, fast enough to be useful when something breaks.</p> <p>That framing shift, from &#8220;collect everything&#8221; to &#8220;correlate what matters,&#8221; is the difference between a team that spends incident response time fighting their tools and one that spends it fixing the actual problem.</p>
  76. Anthropic Reverses Course on Hidden AI Restrictions Following Developer Backlash

    Thu, 11 Jun 2026 20:29:30 -0000

    <div><img width="770" height="330" src="https://devops.com/wp-content/uploads/2026/06/anthropic_fable5_policy_rollback_770x330.jpg" class="attachment-large size-large wp-post-image" alt="" style="margin-bottom: 0px;" decoding="async" srcset="https://devops.com/wp-content/uploads/2026/06/anthropic_fable5_policy_rollback_770x330.jpg 770w, https://devops.com/wp-content/uploads/2026/06/anthropic_fable5_policy_rollback_770x330-290x124.jpg 290w, https://devops.com/wp-content/uploads/2026/06/anthropic_fable5_policy_rollback_770x330-360x154.jpg 360w, https://devops.com/wp-content/uploads/2026/06/anthropic_fable5_policy_rollback_770x330-400x171.jpg 400w" sizes="(max-width: 770px) 100vw, 770px" /></div><img width="150" height="150" src="https://devops.com/wp-content/uploads/2026/06/anthropic_fable5_policy_rollback_770x330-150x150.jpg" class="attachment-thumbnail size-thumbnail wp-post-image" alt="" decoding="async" />Anthropic has abruptly walked back a controversial, unannounced policy that degraded the performance of its latest model, Claude Fable 5. The reversal follows intense backlash from the machine learning community, which criticized the company for a lack of transparency and anti-competitive behavior, according to a Wired report. The controversy began earlier this week with the [&#8230;]
    <div><img width="770" height="330" src="https://devops.com/wp-content/uploads/2026/06/anthropic_fable5_policy_rollback_770x330.jpg" class="attachment-large size-large wp-post-image" alt="" style="margin-bottom: 0px;" decoding="async" srcset="https://devops.com/wp-content/uploads/2026/06/anthropic_fable5_policy_rollback_770x330.jpg 770w, https://devops.com/wp-content/uploads/2026/06/anthropic_fable5_policy_rollback_770x330-290x124.jpg 290w, https://devops.com/wp-content/uploads/2026/06/anthropic_fable5_policy_rollback_770x330-360x154.jpg 360w, https://devops.com/wp-content/uploads/2026/06/anthropic_fable5_policy_rollback_770x330-400x171.jpg 400w" sizes="(max-width: 770px) 100vw, 770px" /></div><img width="150" height="150" src="https://devops.com/wp-content/uploads/2026/06/anthropic_fable5_policy_rollback_770x330-150x150.jpg" class="attachment-thumbnail size-thumbnail wp-post-image" alt="" decoding="async" /><p>Anthropic has abruptly walked back a controversial, unannounced policy that degraded the performance of its latest model, Claude Fable 5.</p> <p>The reversal follows intense backlash from the machine learning community, which criticized the company for a lack of transparency and anti-competitive behavior, according to a Wired report.</p> <p>The controversy began earlier this week with the release of Claude Fable 5, a version of Anthropic’s highly sophisticated Mythos system equipped with specialized national security guardrails. While the company openly said it would reroute hazardous prompts regarding cybersecurity, biology, and chemistry to less advanced models, it did not disclose a separate restriction: silently throttling requests tied to frontier LLM development.</p> <p>AI researchers quickly noticed that when Fable 5 was tasked with training competing LLMs, debugging AI code, or optimizing neural architecture, the model would covertly fail or degrade its output without notifying the user. This hidden mechanism drew immediate fire from developers who complained they were burning expensive API tokens on a deliberately crippled system.</p> <p>Many in the tech sector viewed the stealth restrictions as a hostile maneuver designed to prevent rivals from using Anthropic’s proprietary data to build competing systems. The move severely bruised Anthropic’s public image, as the company has long positioned itself as a more ethical, researcher-friendly, and safety-oriented alternative to rivals like OpenAI.</p> <p>&#8220;Degrading performance on ML research <em>without telling the user</em> is shockingly hostile and a terrible look,&#8221; said Dean W. Ball, a prominent research fellow, on social media platform X.</p> <p>In statements issued to Wired and Business Insider, Anthropic acknowledged the misstep and announced immediate changes to make the guardrails entirely visible.</p> <p>&#8220;We’re changing Fable 5’s safeguards for frontier LLM development to make them visible,&#8221; an Anthropic spokesperson said. &#8220;We made the wrong trade-off, and we apologize for not getting the balance right.&#8221;</p> <p>Under the revised policy, the safety restrictions remain in place, but the stealth element has been eliminated. Starting this week, flagged API requests will explicitly return a reason for refusal, and standard user queries will visibly fall back to the older Claude Opus 4.8 model rather than degrading silently.</p> <p>Anthropic defended the core existence of the safeguards, emphasizing that the underlying Mythos framework possesses unprecedented capabilities in advanced reasoning, cyber-operations, and scientific research. The strict barriers are meant to prevent foreign adversaries from accelerating weapons research or gaining a dangerous technological edge.</p> <p>The company reassured developers that the vast majority of standard coding and machine learning applications remain entirely unaffected by these national security parameters. While the full Mythos system remains restricted to vetted government entities, the public-facing Fable 5 will now operate with the transparency the developer community demands.</p>
  77. DiffusionGemma: 4x faster text generation

    Wed, 10 Jun 2026 16:24:11 -0000

  78. Investing in multi-agent AI safety research

    Wed, 10 Jun 2026 10:21:19 -0000

    Google DeepMind and partners announce a $10M funding call for multi-agent safety research.
  79. Fluid, natural voice translation with Gemini 3.5 Live Translate

    Tue, 09 Jun 2026 15:16:25 -0000

    Gemini 3.5 Live Translate brings near real-time, natural speech translation to Google AI Studio, Google Translate and Google Meet.
  80. Introducing Gemma 4 12B: a unified, encoder-free multimodal model

    Tue, 09 Jun 2026 14:10:19 -0000

  81. Powering the future of robotics in Europe

    Tue, 09 Jun 2026 14:02:33 -0000

  82. Measuring the impact of learning with AI in Sierra Leone and beyond

    Mon, 08 Jun 2026 13:04:59 -0000

    Results from a randomized controlled trial show the potential of Gemini’s Guided Learning feature to boost engagement and accelerate learning.
  83. We’re launching the Google DeepMind Accelerator program in Asia Pacific to tackle environmental risks

    Thu, 21 May 2026 19:46:42 -0000

  84. Fast-tracking genetic leads to reverse cellular aging

    Mon, 18 May 2026 18:21:39 -0000

    Biologists use Co-Scientist to find novel factors that successfully rejuvenate human cells.
  85. Simulate real-world places with Project Genie and Street View

    Sun, 17 May 2026 19:53:18 -0000

    We’re expanding access to Google AI Ultra subscribers globally and introducing a new capability powered by Street View.
  86. Introducing Gemini Omni

    Sun, 17 May 2026 19:50:57 -0000

  87. Introducing Google Antigravity 2.0

    Sun, 17 May 2026 19:43:45 -0000

  88. Gemini for Science: AI experiments and tools for a new era of discovery

    Sun, 17 May 2026 13:50:34 -0000

    A collection of science tools and experiments to expand the scale and precision of scientific exploration.
  89. Making it easier to understand how content was created and edited

    Sun, 17 May 2026 13:43:50 -0000

    We're expanding our tools to help you understand how content was created and edited across the web.
  90. Strengthening Singapore’s AI Future: A New National Partnership

    Sat, 16 May 2026 09:13:34 -0000

    Google DeepMind and Singapore partner to apply frontier AI to address complex challenges across health, education, and sustainability and more.
  91. Finding the molecular switches behind new infectious diseases

    Sat, 16 May 2026 08:16:06 -0000

    Clare Bryant uses Co-Scientist to identify genetic triggers in emerging infectious diseases.
  92. Opening new paths in aging research

    Sat, 16 May 2026 08:08:44 -0000

    Calico Life Sciences uses Co-Scientist to connect scattered findings and generate new leads in aging research.
  93. Accelerating discovery of liver disease mechanisms

    Sat, 16 May 2026 08:00:15 -0000

    Filippo Menolascina uses Co-Scientist to identify new liver disease treatments and explain why existing drugs only help certain patients.
  94. Uniting biological toolkits for a new approach to ALS

    Sat, 16 May 2026 07:53:11 -0000

    Co-Scientist unites Boston Children’s Hospital and MIT’s labs to explore new RNA-based treatments for ALS.
  95. Uncovering repurposed medicines to fight liver fibrosis

    Sat, 16 May 2026 07:40:27 -0000

    Stanford geneticist uses Co-Scientist to help find new treatments for chronic liver disease and liver fibrosis.
  96. How WeatherNext helped the National Hurricane Center better predict Hurricane Melissa’s historic landfall in Jamaica

    Sat, 16 May 2026 03:14:17 -0000

    Learn how our WeatherNext AI model help forecasters give communities unprecedented time to prepare ahead of the historic Hurricane Melissa.
  97. Gemini 3.5: frontier intelligence with action

    Fri, 15 May 2026 22:50:12 -0000

    Gemini 3.5 is built to help you execute complex, agentic workflows.
  98. Co-Scientist: A multi-agent AI partner to accelerate research

    Tue, 12 May 2026 14:40:07 -0000

    Introducing Co-Scientist, a collaborative AI partner built with Gemini to help researchers accelerate scientific breakthroughs.
  99. AlphaEvolve: How our Gemini-powered coding agent is scaling impact across fields

    Wed, 06 May 2026 10:43:49 -0000

    Explore how AlphaEvolve's Gemini-powered algorithms are driving impact across business, infrastructure, and science.
  100. Enabling a new model for healthcare with AI co-clinician

    Thu, 30 Apr 2026 12:14:15 -0000

    Researching the path to AI-augmented care and development of an AI co-clinician.
  101. Announcing our partnership with the Republic of Korea

    Mon, 27 Apr 2026 07:00:06 -0000

    Google DeepMind and Korea partner to accelerate scientific breakthroughs using frontier AI models
  102. Decoupled DiLoCo: A new frontier for resilient, distributed AI training

    Wed, 22 Apr 2026 10:20:03 -0000

  103. Partnering with industry leaders to accelerate AI transformation

    Tue, 21 Apr 2026 14:54:15 -0000

    Google DeepMind partners with global consultancies to bring the power of frontier AI to organizations around the world.
  104. Gemini 3.1 Flash TTS: the next generation of expressive AI speech

    Wed, 15 Apr 2026 16:03:19 -0000

    Our newest audio model introduces granular audio tags that give you precise control to direct AI speech for expressive audio generation.
  105. Gemini Robotics-ER 1.6: Powering real-world robotics tasks through enhanced embodied reasoning

    Mon, 13 Apr 2026 15:52:13 -0000

    Gemini Robotics ER 1.6: Enhancing spatial reasoning and multi-view understanding for autonomous robotics.
  106. Gemma 4: Byte for byte, the most capable open models

    Thu, 02 Apr 2026 16:00:49 -0000

    Gemma 4: Our most intelligent open models to date, purpose-built for advanced reasoning and agentic workflows.
  107. Reimagining the mouse pointer for the AI era

    Sun, 29 Mar 2026 10:50:49 -0000

    Google DeepMind is transforming the mouse pointer into a context-aware AI partner. Move beyond the friction of traditional prompting with intuitive AI collaboration in Chrome and beyond.
  108. Gemini 3.1 Flash Live: Making audio AI more natural and reliable

    Thu, 26 Mar 2026 15:23:35 -0000

    Our latest voice model has improved precision and lower latency to make voice interactions more fluid, natural and precise.
  109. Protecting people from harmful manipulation

    Wed, 25 Mar 2026 16:46:20 -0000

    Google DeepMind researches AI's harmful manipulation risks across areas like finance and health, leading to new safety measures.
  110. Lyria 3 Pro: Create longer tracks in more

    Wed, 25 Mar 2026 16:01:39 -0000

    Introducing Lyria 3 Pro, which unlocks longer tracks with structural awareness. We’re also bringing Lyria to more Google products and surfaces.
  111. Measuring progress toward AGI: A cognitive framework

    Tue, 17 Mar 2026 16:03:47 -0000

    We’re introducing a framework to measure progress toward AGI, and launching a Kaggle hackathon to build the relevant evaluations.
  112. From games to biology and beyond: 10 years of AlphaGo’s impact

    Mon, 09 Mar 2026 13:52:36 -0000

    Ten years since AlphaGo, we explore how it is catalyzing scientific discovery and paving a path to AGI.
  113. Gemini 3.1 Flash-Lite: Built for intelligence at scale

    Tue, 03 Mar 2026 16:35:55 -0000

    Gemini 3.1 Flash-Lite is our fastest and most cost-efficient Gemini 3 series model yet.
  114. Nano Banana 2: Combining Pro capabilities with lightning-fast speed

    Thu, 26 Feb 2026 16:01:50 -0000

    Our latest image generation model offers advanced world knowledge, production ready specs, subject consistency and more, all at Flash speed.
  115. Gemini 3.1 Pro: A smarter model for your most complex tasks

    Thu, 19 Feb 2026 16:06:14 -0000

    3.1 Pro is designed for tasks where a simple answer isn’t enough.
  116. A new way to express yourself: Gemini can now create music

    Wed, 18 Feb 2026 16:01:38 -0000

    The Gemini app now features our most advanced music generation model Lyria 3, empowering anyone to make 30-second tracks using text or images.
  117. Accelerating discovery in India through AI-powered science and education

    Tue, 17 Feb 2026 13:42:20 -0000

    Google DeepMind brings National Partnerships for AI initiative to India, scaling AI for science and education
  118. Gemini 3 Deep Think: Advancing science, research and engineering

    Thu, 12 Feb 2026 16:15:09 -0000

    Our most specialized reasoning mode is now updated to solve modern science, research and engineering challenges.
  119. Accelerating Mathematical and Scientific Discovery with Gemini Deep Think

    Mon, 09 Feb 2026 16:12:06 -0000

    Research papers point to the growing impact of Deep Think across fields
  120. Project Genie: Experimenting with infinite, interactive worlds

    Thu, 29 Jan 2026 17:01:05 -0000

    Google AI Ultra subscribers in the U.S. can try out Project Genie, an experimental research prototype that lets you create and explore worlds.
  121. D4RT: Teaching AI to see the world in four dimensions

    Fri, 16 Jan 2026 10:39:00 -0000

    D4RT: Unified, efficient 4D reconstruction and tracking up to 300x faster than prior methods.
  122. Veo 3.1 Ingredients to Video: More consistency, creativity and control

    Tue, 13 Jan 2026 17:00:18 -0000

    Our latest Veo update generates lively, dynamic clips that feel natural and engaging — and supports vertical video generation.
  123. Google's year in review: 8 areas with research breakthroughs in 2025

    Tue, 23 Dec 2025 17:01:02 -0000

    Google 2025 recap: Research breakthroughs of the year
  124. Gemini 3 Flash: frontier intelligence built for speed

    Wed, 17 Dec 2025 11:58:17 -0000

    Gemini 3 Flash offers frontier intelligence built for speed at a fraction of the cost.
  125. Gemma Scope 2: helping the AI safety community deepen understanding of complex language model behavior

    Tue, 16 Dec 2025 10:14:24 -0000

    Open interpretability tools for language models are now available across the entire Gemma 3 family with the release of Gemma Scope 2.
  126. Improved Gemini audio models for powerful voice experiences

    Fri, 12 Dec 2025 17:50:50 -0000

  127. Deepening our partnership with the UK AI Security Institute

    Thu, 11 Dec 2025 00:06:40 -0000

    Google DeepMind and UK AI Security Institute (AISI) strengthen collaboration on critical AI safety and security research
  128. Strengthening our partnership with the UK government to support prosperity and security in the AI era

    Wed, 10 Dec 2025 14:59:21 -0000

    Deepening our partnership with the UK government to support prosperity and security in the AI era
  129. FACTS Benchmark Suite: Systematically evaluating the factuality of large language models

    Tue, 09 Dec 2025 11:29:03 -0000

    Systematically evaluating the factuality of large language models with the FACTS Benchmark Suite.
  130. Engineering more resilient crops for a warming climate

    Thu, 04 Dec 2025 16:23:24 -0000

    Scientists are using AlphaFold to strengthen a photosynthesis enzyme for resilient, heat-tolerant crops.
  131. AlphaFold: Five years of impact

    Tue, 25 Nov 2025 16:00:12 -0000

    Explore how AlphaFold has accelerated science and fueled a global wave of biological discovery.
  132. Revealing a key protein behind heart disease

    Tue, 25 Nov 2025 15:52:51 -0000

    AlphaFold has revealed the structure of a key protein behind heart disease
  133. Google DeepMind supports U.S. Department of Energy on Genesis: a national mission to accelerate innovation and scientific discovery

    Mon, 24 Nov 2025 14:12:03 -0000

    Google DeepMind and the DOE partner on Genesis, a new effort to accelerate science with AI.
  134. How we’re bringing AI image verification to the Gemini app

    Thu, 20 Nov 2025 15:13:19 -0000

  135. Build with Nano Banana Pro, our Gemini 3 Pro Image model

    Thu, 20 Nov 2025 15:11:14 -0000

  136. Introducing Nano Banana Pro

    Thu, 20 Nov 2025 15:05:02 -0000

  137. Start building with Gemini 3

    Tue, 18 Nov 2025 17:49:13 -0000

  138. We’re expanding our presence in Singapore to advance AI in the Asia-Pacific region

    Tue, 18 Nov 2025 17:00:00 -0000

    Google DeepMind opens a new Singapore research lab, accelerating AI progress in the Asia-Pacific region.
  139. A new era of intelligence with Gemini 3

    Tue, 18 Nov 2025 16:06:41 -0000

  140. Introducing Google Antigravity

    Tue, 18 Nov 2025 16:06:32 -0000

  141. WeatherNext 2: Our most advanced weather forecasting model

    Mon, 17 Nov 2025 15:09:23 -0000

    The new AI model delivers more efficient, more accurate and higher-resolution global weather predictions.
  142. SIMA 2: An Agent that Plays, Reasons, and Learns With You in Virtual 3D Worlds

    Thu, 13 Nov 2025 14:52:18 -0000

    Introducing SIMA 2, a Gemini-powered AI agent that can think, understand, and take actions in interactive environments.
  143. Teaching AI to see the world more like we do

    Tue, 11 Nov 2025 11:49:13 -0000

    Our new paper analyzes the important ways AI systems organize the visual world differently from humans.
  144. How AI is giving Northern Ireland teachers time back

    Mon, 10 Nov 2025 16:50:39 -0000

    A six-month long pilot program with the Northern Ireland Education Authority’s C2k initiative found that integrating Gemini and other generative AI tools saved participating teachers an average of 10 hours per week.
  145. Mapping, modeling, and understanding nature with AI

    Wed, 05 Nov 2025 16:59:46 -0000

    AI models can help map species, protect forests and listen to birds around the world
  146. Accelerating discovery with the AI for Math Initiative

    Wed, 29 Oct 2025 14:31:13 -0000

    The initiative brings together some of the world's most prestigious research institutions to pioneer the use of AI in mathematical research.
  147. T5Gemma: A new collection of encoder-decoder Gemma models

    Sat, 25 Oct 2025 18:14:00 -0000

    Introducing T5Gemma, a new collection of encoder-decoder LLMs.
  148. MedGemma: Our most capable open models for health AI development

    Sat, 25 Oct 2025 18:02:50 -0000

    We’re announcing new multimodal models in the MedGemma collection, our most capable open models for health AI development.
  149. Introducing Gemma 3n: The developer guide

    Sat, 25 Oct 2025 17:54:47 -0000

    Gemma 3n is designed for the developer community that helped shape Gemma.
  150. Gemini 2.5 Flash-Lite is now ready for scaled production use

    Sat, 25 Oct 2025 17:34:32 -0000

    Gemini 2.5 Flash-Lite, previously in preview, is now stable and generally available. This cost-efficient model provides high quality in a small size, and includes 2.5 family features like a 1 million-token context window and multimodality.
  151. Behind “ANCESTRA”: combining Veo with live-action filmmaking

    Sat, 25 Oct 2025 17:27:10 -0000

    We partnered with Darren Aronofsky, Eliza McNitt and a team of more than 200 people to make a film using Veo and live-action filmmaking.
  152. AlphaEarth Foundations helps map our planet in unprecedented detail

    Fri, 24 Oct 2025 19:06:32 -0000

    New AI model integrates petabytes of Earth observation data to generate a unified data representation that revolutionizes global mapping and monitoring
  153. Exploring the context of online images with Backstory

    Fri, 24 Oct 2025 03:17:11 -0000

    New experimental AI tool helps people explore the context and origin of images seen online.
  154. Advanced version of Gemini with Deep Think officially achieves gold-medal standard at the International Mathematical Olympiad

    Fri, 24 Oct 2025 03:12:29 -0000

    The International Mathematical Olympiad (“IMO”) is the world’s most prestigious competition for young mathematicians, and has been held annually since 1959. Each country taking part is represented by six elite, pre-university mathematicians who compete to solve six exceptionally difficult problems in algebra, combinatorics, geometry, and number theory.
  155. Aeneas transforms how historians connect the past

    Fri, 24 Oct 2025 02:58:37 -0000

    Introducing the first model for contextualizing ancient inscriptions, designed to help historians better interpret, attribute and restore fragmentary texts.
  156. Genie 3: A new frontier for world models

    Fri, 24 Oct 2025 02:54:30 -0000

    Genie 3 can generate dynamic worlds that you can navigate in real time at 24 frames per second, retaining consistency for a few minutes at a resolution of 720p.
  157. How AI is helping advance the science of bioacoustics to save endangered species

    Fri, 24 Oct 2025 02:30:54 -0000

    Our new Perch model helps conservationists analyze audio faster to protect endangered species, from Hawaiian honeycreepers to coral reefs.
  158. Using AI to perceive the universe in greater depth

    Fri, 24 Oct 2025 02:21:07 -0000

    Using AI to perceive the universe in greater depth
  159. Gemini achieves gold-medal level at the International Collegiate Programming Contest World Finals

    Fri, 24 Oct 2025 00:22:10 -0000

    Gemini 2.5 Deep Think achieves breakthrough performance at the world’s most prestigious computer programming competition, demonstrating a profound leap in abstract problem solving.
  160. Discovering new solutions to century-old problems in fluid dynamics

    Fri, 24 Oct 2025 00:02:06 -0000

    Our new method could help mathematicians leverage AI techniques to tackle long-standing challenges in mathematics, physics and engineering.
  161. Strengthening our Frontier Safety Framework

    Thu, 23 Oct 2025 23:44:10 -0000

    We’re strengthening the Frontier Safety Framework (FSF) to help identify and mitigate severe risks from advanced AI models.
  162. Gemini Robotics 1.5 brings AI agents into the physical world

    Thu, 23 Oct 2025 23:33:58 -0000

    We’re powering an era of physical agents — enabling robots to perceive, plan, think, use tools and act to better solve complex, multi-step tasks.
  163. Introducing CodeMender: an AI agent for code security

    Thu, 23 Oct 2025 23:05:51 -0000

    Using advanced AI to fix critical software vulnerabilities
  164. Bringing AI to the next generation of fusion energy

    Thu, 23 Oct 2025 22:04:14 -0000

    We’re partnering with Commonwealth Fusion Systems (CFS) to bring clean, safe, limitless fusion energy closer to reality.
  165. Try Deep Think in the Gemini app

    Thu, 23 Oct 2025 18:54:19 -0000

    We're rolling out Deep Think in the Gemini app for Google AI Ultra subscribers, and we're giving select mathematicians access to the full version of the Gemini 2.5 Deep Think model entered into the IMO competition.
  166. Rethinking how we measure AI intelligence

    Thu, 23 Oct 2025 18:52:06 -0000

    Game Arena is a new, open-source platform for rigorous evaluation of AI models. It allows for head-to-head comparison of frontier systems in environments with clear winning conditions.
  167. Introducing Gemma 3 270M: The compact model for hyper-efficient AI

    Thu, 23 Oct 2025 18:50:11 -0000

    Today, we're adding a new, highly specialized tool to the Gemma 3 toolkit: Gemma 3 270M, a compact, 270-million parameter model.
  168. Image editing in Gemini just got a major upgrade

    Thu, 23 Oct 2025 18:48:30 -0000

    Transform images in amazing new ways with updated native image editing in the Gemini app.
  169. VaultGemma: The world's most capable differentially private LLM

    Thu, 23 Oct 2025 18:42:54 -0000

    We introduce VaultGemma, the most capable model trained from scratch with differential privacy.
  170. Introducing the Gemini 2.5 Computer Use model

    Thu, 23 Oct 2025 18:40:34 -0000

    Available in preview via the API, our Computer Use model is a specialized model built on Gemini 2.5 Pro’s capabilities to power agents that can interact with user interfaces.
  171. Introducing Veo 3.1 and advanced creative capabilities

    Thu, 23 Oct 2025 18:38:55 -0000

    We’re rolling out significant updates to Veo that give people even more creative control.
  172. How a Gemma model helped discover a new potential cancer therapy pathway

    Thu, 23 Oct 2025 18:22:55 -0000

    We’re launching a new 27 billion parameter foundation model for single-cell analysis built on the Gemma family of open models.
  173. AlphaGenome: AI for better understanding the genome

    Wed, 25 Jun 2025 13:59:00 -0000

    Introducing a new, unifying DNA sequence model that advances regulatory variant-effect prediction and promises to shed new light on genome function — now available via API.
  174. Gemini Robotics On-Device brings AI to local robotic devices

    Tue, 24 Jun 2025 14:00:00 -0000

    We’re introducing an efficient, on-device robotics model with general-purpose dexterity and fast task adaptation.
  175. Gemini 2.5: Updates to our family of thinking models

    Tue, 17 Jun 2025 16:00:00 -0000

    Explore the latest Gemini 2.5 model updates with enhanced performance and accuracy: Gemini 2.5 Pro now stable, Flash generally available, and the new Flash-Lite in preview.
  176. We’re expanding our Gemini 2.5 family of models

    Tue, 17 Jun 2025 16:00:00 -0000

    Gemini 2.5 Flash and Pro are now generally available, and we’re introducing 2.5 Flash-Lite, our most cost-efficient and fastest 2.5 model yet.