Pipes Feed Preview: Towards Data Science & The New Stack & DevOps & SRE & DevOps.com & Google DeepMind News

  1. The Death of the “Everything Prompt”: Google’s Move Toward Structured AI

    Mon, 09 Feb 2026 13:00:00 -0000

    <p>How the new Interactions API enables deep-reasoning, stateful, agentic workflows.</p> <p>The post <a href="https://towardsdatascience.com/the-death-of-the-everything-prompt-googles-move-toward-structured-ai/">The Death of the “Everything Prompt”: Google’s Move Toward Structured AI</a> appeared first on <a href="https://towardsdatascience.com">Towards Data Science</a>.</p>
  2. What I Am Doing to Stay Relevant as a Senior Analytics Consultant in 2026

    Sat, 07 Feb 2026 15:00:00 -0000

    <p>Learn how to work with AI, while strengthening your unique human skills that technology cannot replace</p> <p>The post <a href="https://towardsdatascience.com/what-i-am-doing-to-stay-relevant-as-a-senior-analytics-consultant-in-2026/">What I Am Doing to Stay Relevant as a Senior Analytics Consultant in 2026</a> appeared first on <a href="https://towardsdatascience.com">Towards Data Science</a>.</p>
  3. Pydantic Performance: 4 Tips on How to Validate Large Amounts of Data Efficiently

    Fri, 06 Feb 2026 15:00:00 -0000

    <p>The real value lies in writing clearer code and using your tools right</p> <p>The post <a href="https://towardsdatascience.com/pydantic-performance-4-tips-on-how-to-validate-large-amounts-of-data-efficiently/">Pydantic Performance: 4 Tips on How to Validate Large Amounts of Data Efficiently</a> appeared first on <a href="https://towardsdatascience.com">Towards Data Science</a>.</p>
  4. Prompt Fidelity: Measuring How Much of Your Intent an AI Agent Actually Executes

    Fri, 06 Feb 2026 12:00:00 -0000

    <p>How much of your AI agent's output is real data versus confident guesswork?</p> <p>The post <a href="https://towardsdatascience.com/prompt-fidelity-measuring-how-much-of-your-intent-an-ai-agent-actually-executes/">Prompt Fidelity: Measuring How Much of Your Intent an AI Agent Actually Executes</a> appeared first on <a href="https://towardsdatascience.com">Towards Data Science</a>.</p>
  5. TDS Newsletter: Vibe Coding Is Great. Until It’s Not.

    Thu, 05 Feb 2026 17:09:00 -0000

    <p>Sorting through the good, bad, and ambiguous aspects of vibe coding</p> <p>The post <a href="https://towardsdatascience.com/tds-newsletter-vibe-coding-is-great-until-its-not/">TDS Newsletter: Vibe Coding Is Great. Until It&#8217;s Not.</a> appeared first on <a href="https://towardsdatascience.com">Towards Data Science</a>.</p>
  6. Mechanistic Interpretability: Peeking Inside an LLM

    Thu, 05 Feb 2026 15:00:00 -0000

    <p>Are the human-like cognitive abilities of LLMs real or fake? How does information travel through the neural network? Is there hidden knowledge inside an LLM?</p> <p>The post <a href="https://towardsdatascience.com/mechanistic-interpretability-peeking-inside-an-llm/">Mechanistic Interpretability: Peeking Inside an LLM</a> appeared first on <a href="https://towardsdatascience.com">Towards Data Science</a>.</p>
  7. Why Is My Code So Slow? A Guide to Py-Spy Python Profiling

    Thu, 05 Feb 2026 13:30:00 -0000

    <p>Stop guessing and start diagnosing performance issues using Py-Spy</p> <p>The post <a href="https://towardsdatascience.com/why-is-my-code-so-slow-a-guide-to-py-spy-python-profiling/">Why Is My Code So Slow? A Guide to Py-Spy Python Profiling</a> appeared first on <a href="https://towardsdatascience.com">Towards Data Science</a>.</p>
  8. The Rule Everyone Misses: How to Stop Confusing loc and iloc in Pandas

    Thu, 05 Feb 2026 12:00:00 -0000

    <p>A simple mental model to remember when each one works (with examples that finally click).</p> <p>The post <a href="https://towardsdatascience.com/stop-confusing-loc-and-iloc-in-pandas-the-rule-everyone-misses/">The Rule Everyone Misses: How to Stop Confusing loc and iloc in Pandas</a> appeared first on <a href="https://towardsdatascience.com">Towards Data Science</a>.</p>
  9. AWS vs. Azure: A Deep Dive into Model Training – Part 2

    Wed, 04 Feb 2026 16:30:00 -0000

    <p>This article covers how Azure ML's persistent, workspace-centric compute resources differ from AWS SageMaker's on-demand, job-specific approach. Additionally, we explored environment customization options, from Azure's curated environments and custom environments to SageMaker's three level of customizations.</p> <p>The post <a href="https://towardsdatascience.com/aws-vs-azure-a-deep-dive-into-model-training-part-2/">AWS vs. Azure: A Deep Dive into Model Training – Part 2</a> appeared first on <a href="https://towardsdatascience.com">Towards Data Science</a>.</p>
  10. How to Work Effectively with Frontend and Backend Code

    Wed, 04 Feb 2026 15:00:00 -0000

    <p>Learn how to be an effective full-stack engineer with Claude Code</p> <p>The post <a href="https://towardsdatascience.com/how-to-effectively-work-with-frontend-and-backend-code/">How to Work Effectively with Frontend and Backend Code</a> appeared first on <a href="https://towardsdatascience.com">Towards Data Science</a>.</p>
  11. How to Build Your Own Custom LLM Memory Layer from Scratch

    Wed, 04 Feb 2026 13:30:00 -0000

    <p>Step-by-step guide to building autonomous memory retrieval systems</p> <p>The post <a href="https://towardsdatascience.com/how-to-build-your-own-custom-llm-memory-layer-from-scratch/">How to Build Your Own Custom LLM Memory Layer from Scratch</a> appeared first on <a href="https://towardsdatascience.com">Towards Data Science</a>.</p>
  12. Plan–Code–Execute: Designing Agents That Create Their Own Tools

    Wed, 04 Feb 2026 12:00:00 -0000

    <p>The case against pre-built tools in Agentic Architectures</p> <p>The post <a href="https://towardsdatascience.com/plan-code-execute-designing-agents-that-create-their-own-tools/">Plan–Code–Execute: Designing Agents That Create Their Own Tools</a> appeared first on <a href="https://towardsdatascience.com">Towards Data Science</a>.</p>
  13. Routing in a Sparse Graph: a Distributed Q-Learning Approach

    Tue, 03 Feb 2026 16:30:00 -0000

    <p>Distributed agents need only decide one move ahead.</p> <p>The post <a href="https://towardsdatascience.com/routing-in-a-sparse-graph-a-distributed-q-learning-approach/">Routing in a Sparse Graph: a Distributed Q-Learning Approach</a> appeared first on <a href="https://towardsdatascience.com">Towards Data Science</a>.</p>
  14. YOLOv2 & YOLO9000 Paper Walkthrough: Better, Faster, Stronger

    Tue, 03 Feb 2026 15:00:00 -0000

    <p>From YOLOv1 to YOLOv2: prior box, k-means, Darknet-19, passthrough layer, and more</p> <p>The post <a href="https://towardsdatascience.com/yolov2-yolo9000-paper-walkthrough-better-faster-stronger/">YOLOv2 &#038; YOLO9000 Paper Walkthrough: Better, Faster, Stronger</a> appeared first on <a href="https://towardsdatascience.com">Towards Data Science</a>.</p>
  15. Creating a Data Pipeline to Monitor Local Crime Trends

    Tue, 03 Feb 2026 13:30:00 -0000

    <p>A walkthough of creating an ETL pipeline to extract local crime data and visualize it in Metabase.</p> <p>The post <a href="https://towardsdatascience.com/creating-a-data-pipeline-to-monitor-local-crime-trends/">Creating a Data Pipeline to Monitor Local Crime Trends</a> appeared first on <a href="https://towardsdatascience.com">Towards Data Science</a>.</p>
  16. The Proximity of the Inception Score as an Evaluation Criterion

    Tue, 03 Feb 2026 12:00:00 -0000

    <p>The neighborhood of synthetic data</p> <p>The post <a href="https://towardsdatascience.com/the-proximity-of-the-inception-score-as-an-evaluation-criterion/">The Proximity of the Inception Score as an Evaluation Criterion</a> appeared first on <a href="https://towardsdatascience.com">Towards Data Science</a>.</p>
  17. Building Systems That Survive Real Life

    Mon, 02 Feb 2026 17:00:00 -0000

    <p>Sara Nobrega on the transition from data science to AI engineering, using LLMs as a bridge to DevOps, and the one engineering skill junior data scientists need to stay competitive.</p> <p>The post <a href="https://towardsdatascience.com/building-systems-that-survive-real-life/">Building Systems That Survive Real Life</a> appeared first on <a href="https://towardsdatascience.com">Towards Data Science</a>.</p>
  18. Silicon Darwinism: Why Scarcity Is the Source of True Intelligence

    Mon, 02 Feb 2026 13:00:00 -0000

    <p>We are confusing “size” with “smart.” The next leap in artificial intelligence will not come from a larger data center, but from a more constrained environment.</p> <p>The post <a href="https://towardsdatascience.com/silicon-darwinism-why-scarcity-is-the-source-of-true-intelligence/">Silicon Darwinism: Why Scarcity Is the Source of True Intelligence</a> appeared first on <a href="https://towardsdatascience.com">Towards Data Science</a>.</p>
  19. Distributed Reinforcement Learning for Scalable High-Performance Policy Optimization

    Sun, 01 Feb 2026 15:00:00 -0000

    <p>Leveraging massive parallelism, asynchronous updates, and multi-machine training to match and exceed human-level performance</p> <p>The post <a href="https://towardsdatascience.com/distributed-reinforcement-learning-for-scalable-high-performance-policy-optimization/">Distributed Reinforcement Learning for Scalable High-Performance Policy Optimization</a> appeared first on <a href="https://towardsdatascience.com">Towards Data Science</a>.</p>
  20. How to Apply Agentic Coding to Solve Problems

    Sat, 31 Jan 2026 15:00:00 -0000

    <p>Learn how to efficiently solve problems with coding agents</p> <p>The post <a href="https://towardsdatascience.com/how-to-apply-agentic-coding-to-solve-problem/">How to Apply Agentic Coding to Solve Problems</a> appeared first on <a href="https://towardsdatascience.com">Towards Data Science</a>.</p>
  21. Is Open Source in Trouble?

    Mon, 09 Feb 2026 19:00:21 -0000

    <img width="1024" height="575" src="https://cdn.thenewstack.io/media/2026/02/d4d2efd7-img_9488-12-1-1024x575.png" class="webfeedsFeaturedVisual wp-post-image wp-stateless-item" alt="" style="display: block; margin: auto; margin-bottom: 20px;max-width: 100%;" link_thumbnail="" decoding="async" fetchpriority="high" data-image-size="large" data-stateless-media-bucket="cdn.thenewstack.io" data-stateless-media-name="media/2026/02/d4d2efd7-img_9488-12-1.png" /><p>BRUSSELS &#8212; First, the bad. I would argue that current open source practices and usage are not sustainable, or at</p> <p>The post <a href="https://thenewstack.io/is-open-source-in-trouble/">Is Open Source in Trouble?</a> appeared first on <a href="https://thenewstack.io">The New Stack</a>.</p>
    Open source sustainability requires corporate action, not charity: Igalia engineer proposes concrete pledges to compensate unpaid maintainers
  22. Chainguard’s AI-powered factory hits 500 million builds

    Mon, 09 Feb 2026 15:19:29 -0000

    <img width="1024" height="581" src="https://cdn.thenewstack.io/media/2026/02/4deae394-abdillah-studio-d_rokuiilxe-unsplash-1024x581.jpg" class="webfeedsFeaturedVisual wp-post-image wp-stateless-item" alt="A stylized vector illustration of a modern, automated factory assembly line rendered in shades of teal and blue. The image features several robotic arms positioned over a conveyor belt carrying boxes and components, set against a backdrop of industrial pipes and structural scaffolding, representing a high-tech &quot;software factory&quot; environment." style="display: block; margin: auto; margin-bottom: 20px;max-width: 100%;" link_thumbnail="" decoding="async" data-image-size="large" data-stateless-media-bucket="cdn.thenewstack.io" data-stateless-media-name="media/2026/02/4deae394-abdillah-studio-d_rokuiilxe-unsplash-scaled.jpg" /><p>Just a week after announced Chainguard Factory 2.0, the company has hit a major milestone that demonstrates the scale of</p> <p>The post <a href="https://thenewstack.io/chainguard-500-million-builds/">Chainguard&#8217;s AI-powered factory hits 500 million builds</a> appeared first on <a href="https://thenewstack.io">The New Stack</a>.</p>
    Chainguard&#039;s software factory, based on its DriftlessAF open source agentic reconciliation, has surpassed 500 million unique container build manifests.
  23. How AI coding makes developers 56% faster and 19% slower

    Mon, 09 Feb 2026 12:00:51 -0000

    <img width="1024" height="684" src="https://cdn.thenewstack.io/media/2026/02/1f3764f9-christina-wocintechchat-com-m-fvgecvtjlbq-unsplash-1024x684.jpg" class="webfeedsFeaturedVisual wp-post-image wp-stateless-item" alt="" style="display: block; margin: auto; margin-bottom: 20px;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" data-image-size="large" data-stateless-media-bucket="cdn.thenewstack.io" data-stateless-media-name="media/2026/02/1f3764f9-christina-wocintechchat-com-m-fvgecvtjlbq-unsplash-scaled.jpg" /><p>There&#8217;s a growing body of research around AI coding assistants with a confusing range of conflicting results. This is to</p> <p>The post <a href="https://thenewstack.io/how-ai-coding-makes-developers-56-faster-and-19-slower/">How AI coding makes developers 56% faster and 19% slower</a> appeared first on <a href="https://thenewstack.io">The New Stack</a>.</p>
    The flow rate of change means we don&#039;t have all the answers.
  24. IDEcline: How the world’s most powerful coding tools became second-class citizens overnight

    Sun, 08 Feb 2026 17:00:20 -0000

    <img width="1024" height="687" src="https://cdn.thenewstack.io/media/2026/02/2deb66ef-gemini_generated_image_d78lg2d78lg2d78l-1024x687.png" class="webfeedsFeaturedVisual wp-post-image wp-stateless-item" alt="A minimalist, flat-style illustration of a person working on a computer in a dark room. A desk lamp casts a bright, diagonal beam of blue light across the desk and keyboard, while the computer screen displays glowing data charts and lines of code. The person is seen in silhouette with a soft blue outline, focused on the monitor." style="display: block; margin: auto; margin-bottom: 20px;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" data-image-size="large" data-stateless-media-bucket="cdn.thenewstack.io" data-stateless-media-name="media/2026/02/2deb66ef-gemini_generated_image_d78lg2d78lg2d78l.png" /><p>During the early phase of my career, I used to spend eight hours a day inside the Visual Studio IDE.</p> <p>The post <a href="https://thenewstack.io/ide-vs-desktop-agent/">IDEcline: How the world’s most powerful coding tools became second-class citizens overnight</a> appeared first on <a href="https://thenewstack.io">The New Stack</a>.</p>
    The IDE used to be where software happened. In an agent-first workflow, it is where the software gets verified and reviewed.
  25. Docker versus Nix: The quest for true reproducibility

    Sat, 07 Feb 2026 18:00:37 -0000

    <img width="1024" height="768" src="https://cdn.thenewstack.io/media/2026/02/b1dd1492-isidore-decamon-3jhuvphs50g-unsplash-1024x768.jpg" class="webfeedsFeaturedVisual wp-post-image wp-stateless-item" alt="" style="display: block; margin: auto; margin-bottom: 20px;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" data-image-size="large" data-stateless-media-bucket="cdn.thenewstack.io" data-stateless-media-name="media/2026/02/b1dd1492-isidore-decamon-3jhuvphs50g-unsplash.jpg" /><p>When conducting performance benchmarks, the ultimate goal is an apples-to-apples comparison. Docker, widely recognized as one of the most brilliant</p> <p>The post <a href="https://thenewstack.io/docker-versus-nix-the-quest-for-true-reproducibility/">Docker versus Nix: The quest for true reproducibility</a> appeared first on <a href="https://thenewstack.io">The New Stack</a>.</p>
    Flox has simplified Nix enough to position it as a Docker replacement on Kubernetes, offering finer dependency management.
  26. How WebAssembly and Web Workers prevent UI freezes

    Sat, 07 Feb 2026 17:00:55 -0000

    <img width="1024" height="684" src="https://cdn.thenewstack.io/media/2026/02/c076ef00-mika-baumeister-74tw4fxp4hw-unsplash-1-1024x684.jpg" class="webfeedsFeaturedVisual wp-post-image wp-stateless-item" alt="" style="display: block; margin: auto; margin-bottom: 20px;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" data-image-size="large" data-stateless-media-bucket="cdn.thenewstack.io" data-stateless-media-name="media/2026/02/c076ef00-mika-baumeister-74tw4fxp4hw-unsplash-1.jpg" /><p>We&#8217;ve all experienced a frozen web page followed by endless refreshing, frustrated sighs, and the occasional foot stomp, only to</p> <p>The post <a href="https://thenewstack.io/for-darryl-webassembly-and-web-workers/">How WebAssembly and Web Workers prevent UI freezes</a> appeared first on <a href="https://thenewstack.io">The New Stack</a>.</p>
    By combining WebAssembly with Web Workers, developers can offload heavy computations to background threads to prevent bottlenecks.
  27. How GSD turns Claude into a self-steering developer

    Sat, 07 Feb 2026 17:00:30 -0000

    <img width="1024" height="768" src="https://cdn.thenewstack.io/media/2026/02/cc56e694-sara-oliveira-s0y4m1x8a3u-unsplash-1024x768.jpg" class="webfeedsFeaturedVisual wp-post-image wp-stateless-item" alt="A vibrant, pop-art style illustration of a person sitting at a computer desk wearing headphones and leaning back. A large, stylized blue hand emerges directly from the computer monitor, pointing a finger toward the person’s forehead. The scene features a bright pink background with a Wi-Fi symbol floating above the person’s head and small yellow digital blocks scattered in the air." style="display: block; margin: auto; margin-bottom: 20px;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" data-image-size="large" data-stateless-media-bucket="cdn.thenewstack.io" data-stateless-media-name="media/2026/02/cc56e694-sara-oliveira-s0y4m1x8a3u-unsplash-scaled.jpg" /><p>The speed at which ClawdBot MoltBot OpenClaw climbed in popularity was quite phenomenal, and for good reason: It has an</p> <p>The post <a href="https://thenewstack.io/openclaw-gsd/">How GSD turns Claude into a self-steering developer</a> appeared first on <a href="https://thenewstack.io">The New Stack</a>.</p>
    OpenClaw may be flawed, but it does show how limited the available and entrenched digital assistants are compared to what agents can do.
  28. Memory-Safe Jule language emerges as C/C++ alternative

    Sat, 07 Feb 2026 16:00:25 -0000

    <img width="1024" height="683" src="https://cdn.thenewstack.io/media/2026/02/9088479c-nick-fewings-4pzu15oetxa-unsplash-1-1024x683.jpg" class="webfeedsFeaturedVisual wp-post-image wp-stateless-item" alt="" style="display: block; margin: auto; margin-bottom: 20px;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" data-image-size="large" data-stateless-media-bucket="cdn.thenewstack.io" data-stateless-media-name="media/2026/02/9088479c-nick-fewings-4pzu15oetxa-unsplash-1.jpg" /><p>With the U.S. government and other institutions calling for the use of memory-safe programming languages in critical systems, Jule, a</p> <p>The post <a href="https://thenewstack.io/jule-open-source-programming-language/">Memory-Safe Jule language emerges as C/C++ alternative</a> appeared first on <a href="https://thenewstack.io">The New Stack</a>.</p>
    Jule is an emerging open source systems language combines Go&#039;s simplicity with C&#039;s performance while offering C/C++ interoperability and compile-time safety features.
  29. Operant AI targets ‘shadow’ AI agents with real-time security platform

    Fri, 06 Feb 2026 18:15:31 -0000

    <img width="1024" height="683" src="https://cdn.thenewstack.io/media/2026/02/39386853-markus-winkler-3lvhsjcxrkc-unsplash-1024x683.jpg" class="webfeedsFeaturedVisual wp-post-image wp-stateless-item" alt="" style="display: block; margin: auto; margin-bottom: 20px;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" data-image-size="large" data-stateless-media-bucket="cdn.thenewstack.io" data-stateless-media-name="media/2026/02/39386853-markus-winkler-3lvhsjcxrkc-unsplash-scaled.jpg" /><p>As AI agents fan out across enterprise apps, APIs, and data stores, they&#8217;re creating a security blind spot: autonomous systems</p> <p>The post <a href="https://thenewstack.io/operant-ai-targets-shadow-ai-agents-with-real-time-security-platform/">Operant AI targets ‘shadow’ AI agents with real-time security platform</a> appeared first on <a href="https://thenewstack.io">The New Stack</a>.</p>
    AI agents are really popular and already proving to be really insecure. To address this threat, Operant AI is introducing Agent Protector.
  30. Is the SaaSpocalypse nigh? The era of paying for software seats may be ending.

    Fri, 06 Feb 2026 16:29:26 -0000

    <img width="1024" height="683" src="https://cdn.thenewstack.io/media/2026/02/c5621645-israel-andrade-yi_9sivvt_s-unsplash-1024x683.jpg" class="webfeedsFeaturedVisual wp-post-image wp-stateless-item" alt="A wide-angle, eye-level photograph of a modern, open-plan office. Several employees are seated at long wooden desks, working on large computer monitors and laptops with their backs to the camera. The space features an industrial-style white ceiling with exposed red pipes and dozens of warm, bare light bulbs hanging as pendants." style="display: block; margin: auto; margin-bottom: 20px;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" data-image-size="large" data-stateless-media-bucket="cdn.thenewstack.io" data-stateless-media-name="media/2026/02/c5621645-israel-andrade-yi_9sivvt_s-unsplash-scaled.jpg" /><p>In December 2024, Microsoft CEO Satya Nadella appeared on the BG2 podcast and made a prediction that felt provocative and</p> <p>The post <a href="https://thenewstack.io/dawn-of-a-saaspocalypse/">Is the SaaSpocalypse nigh? The era of paying for software seats may be ending.</a> appeared first on <a href="https://thenewstack.io">The New Stack</a>.</p>
    Satya Nadella predicted the death of SaaS, and AI agents are making it real. Explore the SaaSpocalypse and the shift from tools to delivered outcomes.
  31. How Homepage simplifies monitoring your self-hosted services

    Fri, 06 Feb 2026 16:00:04 -0000

    <img width="1024" height="768" src="https://cdn.thenewstack.io/media/2026/02/09627ea6-allison-saeng-7icarfsxo2y-unsplash-1024x768.jpg" class="webfeedsFeaturedVisual wp-post-image wp-stateless-item" alt="" style="display: block; margin: auto; margin-bottom: 20px;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" data-image-size="large" data-stateless-media-bucket="cdn.thenewstack.io" data-stateless-media-name="media/2026/02/09627ea6-allison-saeng-7icarfsxo2y-unsplash.jpg" /><p>Slowly but surely, I&#8217;ve been migrating over to self-hosted services so I can finally cut the cord to third parties.</p> <p>The post <a href="https://thenewstack.io/homepage-is-your-one-stop-shop-for-monitoring-and-viewing-all-of-the-services-you-depend-on/">How Homepage simplifies monitoring your self-hosted services</a> appeared first on <a href="https://thenewstack.io">The New Stack</a>.</p>
    This tutorial shows how to use Homepage as a centralized dashboard for monitoring self-hosted services across your local network.
  32. pg_lake comes to Snowflake Postgres: What it means for open standards

    Fri, 06 Feb 2026 09:00:03 -0000

    <img width="1024" height="576" src="https://cdn.thenewstack.io/media/2026/02/d8c681a1-alexander-mils-yrkuerek6i0-unsplash-1024x576.jpg" class="webfeedsFeaturedVisual wp-post-image wp-stateless-item" alt="" style="display: block; margin: auto; margin-bottom: 20px;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" data-image-size="large" data-stateless-media-bucket="cdn.thenewstack.io" data-stateless-media-name="media/2026/02/d8c681a1-alexander-mils-yrkuerek6i0-unsplash.jpg" /><p>The pg_lake extension, which was initially released to the open source community in November, is now natively available in Postgres,</p> <p>The post <a href="https://thenewstack.io/pg_lake-comes-to-snowflake-postgres-what-it-means-for-open-standards/">pg_lake comes to Snowflake Postgres: What it means for open standards</a> appeared first on <a href="https://thenewstack.io">The New Stack</a>.</p>
    Snowflake also announced several other features to broaden its platform’s interoperable, open data accessibility.
  33. Where on Earth is vibe coding taking off the most?

    Thu, 05 Feb 2026 21:03:54 -0000

    <img width="1024" height="683" src="https://cdn.thenewstack.io/media/2026/02/f9178bb0-arpit-rastogi-xv7dtjnx2yq-unsplash-1024x683.jpg" class="webfeedsFeaturedVisual wp-post-image wp-stateless-item" alt="" style="display: block; margin: auto; margin-bottom: 20px;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" data-image-size="large" data-stateless-media-bucket="cdn.thenewstack.io" data-stateless-media-name="media/2026/02/f9178bb0-arpit-rastogi-xv7dtjnx2yq-unsplash.jpg" /><p>Despite talk of its impending demise,&#160;the&#160;vibe coding craze appears to be&#160;alive and well, particularly in Europe. A new study by</p> <p>The post <a href="https://thenewstack.io/top-vibe-coding-countries/">Where on Earth is vibe coding taking off the most?</a> appeared first on <a href="https://thenewstack.io">The New Stack</a>.</p>
    A new study maps global search data to reveal which countries are most curious about -- and actively adopting -- vibe coding, and the results might surprise you.
  34. The one structural shift CISOs must make before AI outpaces their security strategy

    Thu, 05 Feb 2026 20:40:18 -0000

    <img width="1024" height="569" src="https://cdn.thenewstack.io/media/2026/02/a01def0e-mylene-caneso-jl63np79vxs-unsplash-1024x569.jpg" class="webfeedsFeaturedVisual wp-post-image wp-stateless-item" alt="Minimalist illustration on a yellow background showing two people choosing between paths toward a red flag. On the left, a woman looks confused by a long, winding red road. On the right, a man cheers next to a straight, direct blue &#039;paved road.&#039; The image contrasts complexity and friction with efficiency and success." style="display: block; margin: auto; margin-bottom: 20px;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" data-image-size="large" data-stateless-media-bucket="cdn.thenewstack.io" data-stateless-media-name="media/2026/02/a01def0e-mylene-caneso-jl63np79vxs-unsplash-scaled.jpg" /><p>Enterprise CISOs are stuck at a crossroads. Their budgets aren&#8217;t growing fast enough, AI is sucking up every bit of</p> <p>The post <a href="https://thenewstack.io/federate-security-gitlab/">The one structural shift CISOs must make before AI outpaces their security strategy</a> appeared first on <a href="https://thenewstack.io">The New Stack</a>.</p>
    With a federated approach, CISOs can set enterprise-wide policy and risk strategy, while data owners own security implementation.
  35. Open source USearch library jumpstarts ScyllaDB vector search

    Thu, 05 Feb 2026 20:00:18 -0000

    <img width="1024" height="536" src="https://cdn.thenewstack.io/media/2026/01/6711ae24-1200x628-license-change-2024-1024x536.jpg" class="webfeedsFeaturedVisual wp-post-image wp-stateless-item" alt="" style="display: block; margin: auto; margin-bottom: 20px;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" data-image-size="large" data-stateless-media-bucket="cdn.thenewstack.io" data-stateless-media-name="media/2026/01/6711ae24-1200x628-license-change-2024.jpg" /><p>ScyllaDB recently added vector search capabilities underpinned by USearch, an open source clustering and vector search library. The addition of</p> <p>The post <a href="https://thenewstack.io/open-source-usearch-library-jumpstarts-scylladb-vector-search/">Open source USearch library jumpstarts ScyllaDB vector search</a> appeared first on <a href="https://thenewstack.io">The New Stack</a>.</p>
    Built in C++, the library’s Rust extension builds on ScyllaDB’s shard-per-core architecture to optimize performance.
  36. The enterprise is not ready for “the rise of the developer”

    Thu, 05 Feb 2026 19:03:22 -0000

    <img width="1024" height="576" src="https://cdn.thenewstack.io/media/2026/02/5fe475ff-for-thumbnail-15-1024x576.jpg" class="webfeedsFeaturedVisual wp-post-image wp-stateless-item" alt="Sean O’Dell of Dynatrace argues that enterprises are unprepared for a major shift brought on by AI: the rise of the developer at Dynatrace Perform in Las Vegas." style="display: block; margin: auto; margin-bottom: 20px;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" data-image-size="large" data-stateless-media-bucket="cdn.thenewstack.io" data-stateless-media-name="media/2026/02/5fe475ff-for-thumbnail-15.jpg" /><p>Asked what enterprises aren&#8217;t ready for as AI advances, Sean O&#8217;Dell of Dynatrace offers a prediction &#8212; and before doing</p> <p>The post <a href="https://thenewstack.io/dynatrace-perform-rise-of-the-developer/">The enterprise is not ready for &#8220;the rise of the developer&#8221;</a> appeared first on <a href="https://thenewstack.io">The New Stack</a>.</p>
    Asked what enterprises aren&#039;t ready for as AI advances, Sean O’Dell of Dynatrace offers a prediction: “The rise of the developer.”
  37. OpenAI’s GPT-5.3-Codex helped build itself

    Thu, 05 Feb 2026 18:58:56 -0000

    <img width="1024" height="768" src="https://cdn.thenewstack.io/media/2026/02/b0f8adf3-img_2595-1024x768.jpg" class="webfeedsFeaturedVisual wp-post-image wp-stateless-item" alt="" style="display: block; margin: auto; margin-bottom: 20px;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" data-image-size="large" data-stateless-media-bucket="cdn.thenewstack.io" data-stateless-media-name="media/2026/02/b0f8adf3-img_2595-scaled.jpg" /><p>OpenAI&#8217;s new GPT-5.3-Codex model is the company&#8217;s most capable agentic coding model yet. However, unlike previous Codex models, it focuses</p> <p>The post <a href="https://thenewstack.io/openais-gpt-5-3-codex-helped-build-itself/">OpenAI&#8217;s GPT-5.3-Codex helped build itself</a> appeared first on <a href="https://thenewstack.io">The New Stack</a>.</p>
    GPT-5.3-Codex helped debug its own training and is OpenAI&#039;s first model designated &quot;high-capability&quot; for cybersecurity tasks.
  38. Anthropic debuts Opus 4.6 with standout scores for solving hard problems that other AIs miss

    Thu, 05 Feb 2026 17:45:21 -0000

    <img width="1024" height="768" src="https://cdn.thenewstack.io/media/2026/02/02c7a607-img_2912-1024x768.jpg" class="webfeedsFeaturedVisual wp-post-image wp-stateless-item" alt="Anthropic logo on a conference show floow." style="display: block; margin: auto; margin-bottom: 20px;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" data-image-size="large" data-stateless-media-bucket="cdn.thenewstack.io" data-stateless-media-name="media/2026/02/02c7a607-img_2912-scaled.jpg" /><p>Anthropic launched Opus 4.6 on Thursday, an update to its flagship Opus model that delivers major improvements over its predecessor&#8212;and</p> <p>The post <a href="https://thenewstack.io/anthropics-opus-4-6-is-a-step-change-for-the-enterprise/">Anthropic debuts Opus 4.6 with standout scores for solving hard problems that other AIs miss</a> appeared first on <a href="https://thenewstack.io">The New Stack</a>.</p>
    Anthropic&#039;s new flagship model also powers agent teams in Claude Code and now features a one-million token context window.
  39. It took a researcher fewer than 2 hours to hijack OpenClaw

    Thu, 05 Feb 2026 16:37:42 -0000

    <img width="1024" height="675" src="https://cdn.thenewstack.io/media/2026/02/564ec44f-getty-images-t3grnwa0cdy-unsplash-1024x675.jpg" class="webfeedsFeaturedVisual wp-post-image wp-stateless-item" alt="A vibrant digital illustration of an open combination padlock against a dark navy background. The lock features a colorful purple-to-green gradient and five tumblers displaying bright green asterisks, symbolizing an unlocked or compromised security system." style="display: block; margin: auto; margin-bottom: 20px;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" data-image-size="large" data-stateless-media-bucket="cdn.thenewstack.io" data-stateless-media-name="media/2026/02/564ec44f-getty-images-t3grnwa0cdy-unsplash-scaled.jpg" /><p>All those security fears about the OpenClaw AI agent and its social network, Moltbook, are already proving true, according to</p> <p>The post <a href="https://thenewstack.io/openclaw-moltbot-security-concerns/">It took a researcher fewer than 2 hours to hijack OpenClaw</a> appeared first on <a href="https://thenewstack.io">The New Stack</a>.</p>
    Security researchers write that it may be wise to steer clear of Moltbot and OpenClaw. The lack of meaningful security may pinch the unwary.
  40. 10 strategies to reduce MCP token bloat

    Thu, 05 Feb 2026 15:37:25 -0000

    <img width="1024" height="805" src="https://cdn.thenewstack.io/media/2026/02/88f3d009-michael-dziedzic-ir5gc4hlqt0-unsplash-1024x805.jpg" class="webfeedsFeaturedVisual wp-post-image wp-stateless-item" alt="A geometric glass prism sits on a vibrant red surface against a saturated orange background. The prism refracts and reflects bright neon blue and magenta light, creating sharp, clean internal angles." style="display: block; margin: auto; margin-bottom: 20px;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" data-image-size="large" data-stateless-media-bucket="cdn.thenewstack.io" data-stateless-media-name="media/2026/02/88f3d009-michael-dziedzic-ir5gc4hlqt0-unsplash-scaled.jpg" /><p>The Model Context Protocol (MCP) has reached an inflection point. While some MCP deployments are still in the experimentation phase,</p> <p>The post <a href="https://thenewstack.io/how-to-reduce-mcp-token-bloat/">10 strategies to reduce MCP token bloat</a> appeared first on <a href="https://thenewstack.io">The New Stack</a>.</p>
    Unrestrained use of MCP can quickly flood context windows. Experts share ten practical techniques to rein it in.
  41. GitHub is letting developers choose between Copilot and its biggest rivals

    Wed, 04 Feb 2026 18:12:45 -0000

    <img width="1024" height="768" src="https://cdn.thenewstack.io/media/2025/10/527549ea-c7f5cae8-11c9-44f3-842a-a5fde3f4d5b7-1024x768.jpg" class="webfeedsFeaturedVisual wp-post-image wp-stateless-item" alt="GitHub announces support for third-party coding agents in Agent HQ" style="display: block; margin: auto; margin-bottom: 20px;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" data-image-size="large" data-stateless-media-bucket="cdn.thenewstack.io" data-stateless-media-name="media/2025/10/527549ea-c7f5cae8-11c9-44f3-842a-a5fde3f4d5b7-scaled.jpg" /><p>GitHub subscribers now have a choice of coding agents to help them create. In addition to GitHub&#8217;s own Copilot, users</p> <p>The post <a href="https://thenewstack.io/github-agent-hq/">GitHub is letting developers choose between Copilot and its biggest rivals</a> appeared first on <a href="https://thenewstack.io">The New Stack</a>.</p>
    Pro+ and Enterprise subscribers can now assign tasks to Claude, Codex, or Copilot from one dashboard and let them work asynchronously.
  42. The ‘weird’ things that happened when Clickhouse replaced C++ with Rust

    Wed, 04 Feb 2026 15:26:14 -0000

    <img width="1024" height="683" src="https://cdn.thenewstack.io/media/2026/02/08f9f70f-drazen-nesic-or6bnhjwkhi-unsplash-1-1024x683.jpg" class="webfeedsFeaturedVisual wp-post-image wp-stateless-item" alt="" style="display: block; margin: auto; margin-bottom: 20px;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" data-image-size="large" data-stateless-media-bucket="cdn.thenewstack.io" data-stateless-media-name="media/2026/02/08f9f70f-drazen-nesic-or6bnhjwkhi-unsplash-1.jpg" /><p>ClickHouse&#8217;s decision to shift parts of its codebase to Rust is a perfect storm: the convergence of a wildly popular</p> <p>The post <a href="https://thenewstack.io/the-weird-things-that-happened-when-clickhouse-replaced-c-with-rust/">The &#8216;weird&#8217; things that happened when Clickhouse replaced C++ with Rust</a> appeared first on <a href="https://thenewstack.io">The New Stack</a>.</p>
    At FOSDEM 2026 in Brussels, Alexey Milovidov, co-founder and CTO of ClickHouse pushed back on the Rust hype.
  43. Moltbook: Hype or the Singularity?

    Tue, 03 Feb 2026 17:30:03 -0000

    <img width="1024" height="683" src="https://cdn.thenewstack.io/media/2026/02/4dc86518-monika-borys-dstjr8ojurw-unsplash-1024x683.jpg" class="webfeedsFeaturedVisual wp-post-image wp-stateless-item" alt="A phot of two lobsters on ice." style="display: block; margin: auto; margin-bottom: 20px;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" data-image-size="large" data-stateless-media-bucket="cdn.thenewstack.io" data-stateless-media-name="media/2026/02/4dc86518-monika-borys-dstjr8ojurw-unsplash-scaled.jpg" /><p>Hysteria continues to build over Moltbook, the so-called AI Agent social network. If you believe Elon Musk, Moltbook is at</p> <p>The post <a href="https://thenewstack.io/moltbook-the-singularity-or-hype/">Moltbook: Hype or the Singularity?</a> appeared first on <a href="https://thenewstack.io">The New Stack</a>.</p>
    The &quot;AI-only social network&quot; making headlines is less about intelligent agents and more about human prompt-writing and shoddy security.
  44. Why Kubernetes is retiring Ingress NGINX

    Tue, 03 Feb 2026 16:45:19 -0000

    <img width="1024" height="769" src="https://cdn.thenewstack.io/media/2026/02/e5f9ab53-trevor-turner-9xxnc55qvv8-unsplash-1024x769.jpg" class="webfeedsFeaturedVisual wp-post-image wp-stateless-item" alt="" style="display: block; margin: auto; margin-bottom: 20px;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" data-image-size="large" data-stateless-media-bucket="cdn.thenewstack.io" data-stateless-media-name="media/2026/02/e5f9ab53-trevor-turner-9xxnc55qvv8-unsplash-scaled.jpg" /><p>We warned you! Today, Ingress NGINX is still being used by 50% of Kubernetes users to manage incoming traffic, but&#160;it&#8217;s</p> <p>The post <a href="https://thenewstack.io/kubernetes-to-retire-ingress-nginx/">Why Kubernetes is retiring Ingress NGINX</a> appeared first on <a href="https://thenewstack.io">The New Stack</a>.</p>
    Kubernetes is pulling the plug on Ingress NGINX in March 2026, and there&#039;s no drop-in replacement. Time to start planning—now.
  45. Durable Execution: Build reliable software in an unreliable world

    Mon, 02 Feb 2026 23:23:19 -0000

    <img width="1024" height="768" src="https://cdn.thenewstack.io/media/2026/02/40061e3c-sara-oliveira-6kqalppnokg-unsplash-1024x768.jpg" class="webfeedsFeaturedVisual wp-post-image wp-stateless-item" alt="An illustration of a teal power strip with three devices plugged into it. The strip features an orange power adapter in the center and a beige adapter on the right, both with cords looping upward. On the far left, a standard grey plug is inserted. The style is a clean, sketch-like drawing with black outlines against a plain off-white background." style="display: block; margin: auto; margin-bottom: 20px;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" data-image-size="large" data-stateless-media-bucket="cdn.thenewstack.io" data-stateless-media-name="media/2026/02/40061e3c-sara-oliveira-6kqalppnokg-unsplash-scaled.jpg" /><p>Software reliability is a persistent problem for developers because IT systems are built on unreliable components: hardware degrades; software has</p> <p>The post <a href="https://thenewstack.io/temporal-durable-execution-platform/">Durable Execution: Build reliable software in an unreliable world</a> appeared first on <a href="https://thenewstack.io">The New Stack</a>.</p>
    Durable Execution helps ensure applications continue running despite failures, beyond what defensive coding and hardware redundancy can handle.
  46. Unlocking AI’s full potential: Why context is everything

    Mon, 02 Feb 2026 22:31:00 -0000

    <img width="1024" height="655" src="https://cdn.thenewstack.io/media/2026/02/286d2df7-getty-images-kpoucnr4opc-unsplash-1024x655.jpg" class="webfeedsFeaturedVisual wp-post-image wp-stateless-item" alt="A watercolor and ink-style sketch of a busy urban street scene. In the foreground, a person wearing a red hoodie walks away from the viewer toward a row of parked cars. To the right, a red and yellow tram passes by. The background features tall beige buildings, green trees, and utility wires set against a cloudy blue sky." style="display: block; margin: auto; margin-bottom: 20px;max-width: 100%;" link_thumbnail="" decoding="async" loading="lazy" data-image-size="large" data-stateless-media-bucket="cdn.thenewstack.io" data-stateless-media-name="media/2026/02/286d2df7-getty-images-kpoucnr4opc-unsplash-scaled.jpg" /><p>AI is ubiquitous in both the consumer and enterprise sectors. Yet few organizations are realizing AI&#8217;s full potential. Why? AI</p> <p>The post <a href="https://thenewstack.io/context-is-everything-sales-force-data-360/">Unlocking AI&#8217;s full potential: Why context is everything</a> appeared first on <a href="https://thenewstack.io">The New Stack</a>.</p>
    Data 360 turns data into trusted enterprise context—the new currency for accurate, unified, autonomous agents.
  47. Principal Platform Engineer, John Lewis Partnership

    Wed, 04 Feb 2026 18:00:00 -0000

    <div class="block-paragraph_advanced"><p><span style="font-style: italic; vertical-align: baseline;">For any organization that has invested in an internal developer platform, a question inevitably arises: Is it actually working? </span></p> <p><span style="font-style: italic; vertical-align: baseline;">Simply tracking adoption rates won't tell you if your platform is truly delivering value to your developers. This was the challenge faced by John Lewis, a major UK retailer. In our previous articles (parts </span><a href="https://cloud.google.com/blog/products/application-development/simplifying-platform-engineering-at-john-lewis-part-one"><span style="font-style: italic; text-decoration: underline; vertical-align: baseline;">1</span></a><span style="font-style: italic; vertical-align: baseline;"> and </span><a href="https://cloud.google.com/blog/products/application-development/simplifying-platform-engineering-at-john-lewis-part-two"><span style="font-style: italic; text-decoration: underline; vertical-align: baseline;">2</span></a><span style="font-style: italic; vertical-align: baseline;">) we introduced the John Lewis Digital Platform (JLDP) and how it enabled dozens of product teams to build high-quality software rapidly to power www.johnlewis.com and other critical applications. But how did they know that the platform was actually successful? Traditional product metrics like revenue and sales don’t translate easily to this world. When you focus only on whether your tenants use the platform, you don’t understand whether it’s bringing them value.</span></p> <p><span style="font-style: italic; vertical-align: baseline;">In this article, Alex Moss from the John Lewis platform team discusses how they moved beyond simple usage metrics to develop a sophisticated, multi-stage approach to measuring the real value of their platform — a journey that took them from lead-time metrics, to </span><a href="https://dora.dev/" rel="noopener" target="_blank"><span style="font-style: italic; text-decoration: underline; vertical-align: baseline;">DORA</span></a><span style="font-style: italic; vertical-align: baseline;">, and finally to a "Technical Health" score. Along the way, they explore how the JLDP’s purpose evolved — and its value along with it. - Darren Evans</span></p> <h3><strong style="vertical-align: baseline;">Initial measurement: A focus on platform value</strong></h3> <p><span style="vertical-align: baseline;">In the early days of the platform, understanding its value was actually much easier. This was because the platform was created with a very clear purpose: to enable speed of change. The John Lewis business wanted to create multiple product teams working on several features of johnlewis.com in parallel, and to put those features in front of customers quickly for feedback.</span></p> <p><span style="vertical-align: baseline;">Its origins in the world of the company’s John Lewis Digital online business resulted in it being treated as a product from a very early stage, and therefore integrated with that area’s reporting mechanisms too. Thus, it became normal to link the platform objectives to the online business’s broader goals each quarter and report on measurable key results. This kept the focus on the reasons the platform is important: do improvements to the platform continue to justify using it over seeking out a different one? We cannot afford to rest on our laurels!</span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/1_aSY3nPB.max-1000x1000.png" alt="1"> </a> <figcaption class="article-image__caption "><p data-block-key="nnhmb">The six annual measures reported against every quarter. The specific measures have varied over the years.</p></figcaption> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">In addition to this, in the first few years of the platform’s existence, there were three simple metrics that best indicated how the platform was living up to the rationale for creating it:</span></p> <ol> <li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">Service Creation Lead Time:</strong><span style="vertical-align: baseline;"> How long it took to create a tenancy (the space in which a product team was creating their software)</span></p> </li> <li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">Onboarding Lead Time:</strong><span style="vertical-align: baseline;"> How long it took that product team to deploy something into production</span></p> </li> <li><strong style="vertical-align: baseline;">First Customer Lead Time:</strong><span style="vertical-align: baseline;"> How long it took that product team to designate their service as “live to customers”</span></li> </ol></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/2_DVTZRKS.max-1000x1000.png" alt="2"> </a> <figcaption class="article-image__caption "><p data-block-key="nnhmb">Some screenshots from the early version of the platform's self-written service catalogue, tracking the three metrics mentioned</p></figcaption> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">This was then combined with the number of tenants present on the platform into a report, which was displayed as part of an initial home-grown Service Catalogue shown above (which was later </span><a href="https://medium.com/john-lewis-software-engineering/weve-gone-backstage-this-is-how-we-use-it-on-our-digital-platform-b299cd4acb24" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">replaced with Backstage</span></a><span style="vertical-align: baseline;">). This report served two purposes:</span></p> <ol> <li aria-level="1" style="list-style-type: lower-alpha; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">A very clear visualization for stakeholders of how much their platform was being adopted, and how fast they were able to get up and running (in particular, “Service Creation” being measured in single-digit hours, in comparison to the weeks teams would traditionally have had to wait). This is important, because in the early days of your product, you need to justify its continued growth and investment.</span></p> </li> <li aria-level="1" style="list-style-type: lower-alpha; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">A useful way for the platform team themselves (and stakeholders) to see which teams were taking their time about getting something into production. Is my product actually helping you? And if not, what more could we be doing?</span></p> </li> </ol> <p><span style="vertical-align: baseline;">Using this as a conversation-starter with our tenants opened doors to rich sources of feedback that could be turned into platform features: When we asked tenants “What’s stopping you from going live?”, they often answered that the product they were building was simply complex. But we also often saw that our own processes were getting in the way. This was important, as we could then do something about it.</span></p> <p><span style="vertical-align: baseline;">The easiest of these barriers for us to overcome were typically technology-related. In </span><a href="https://cloud.google.com/blog/products/application-development/simplifying-platform-engineering-at-john-lewis-part-one"><span style="text-decoration: underline; vertical-align: baseline;">previous articles</span></a><span style="vertical-align: baseline;">, we covered two examples, “My team is spending a lot of time writing Terraform to provision PubSub,” and “we’re having trouble learning how to use Kubernetes.” To help, the platform team created “paved roads” to enable self-service provisioning or simplification of Kubernetes, significantly reducing these burdens for teams.</span></p> <p><span style="vertical-align: baseline;">The more significant opportunities to streamline getting new services live were a result of our processes (e.g., security approvals) — and if your platform is empowered to simplify these sorts of organizational functions, then the gains can be extremely beneficial. One such example was the Information Security risk assurance process. Gaining the necessary security sign-offs and producing the required documentation was a necessary but time-consuming task, and - with the rate of change in the business - this was often something that many teams were going through in parallel. Our platform team successfully negotiated a simplified process for its tenants. It was able to do this because, by being resident on the platform, they could guarantee that security controls were in place and that policies were being followed. This was a direct result of the platform building features to meet those needs, and being able to provide evidence that they were being used — removing the need for the tenant team to either document or invent this themselves. This is still simplifying the developer experience through platform engineering, even though the solution is a less technically-based one.</span></p> <p><span style="vertical-align: baseline;">Sometimes the conversation resulted in feedback that wasn’t even platform-shaped — for example, helping teams understand concepts like feature flagging and dark launching, or software design options to help break dependencies with legacy systems. John Lewis’ platform teams are staffed with experienced engineers, ideally ones with software development experience, which helps a lot with these sorts of interactions.</span></p> <p><span style="vertical-align: baseline;">A key point here is that by measuring how effectively teams were making it into production, we could identify who to talk to and elucidate the feedback we needed on what problems needed to be addressed. Simply relying on your tenants thinking of this themselves when they don’t see the bigger picture (or have other priorities) is not nearly as effective.</span></p> <p><span style="vertical-align: baseline;">We then combined the process with more traditional approaches such as sending out a survey or use of Net Promoter Scoring to help build popularity in the product. The results of these were usually very positive, and could be used to generate mindshare — especially where a product team was comfortable talking about their positive experiences in internal tech conferences and the like.</span></p> <h3><strong style="vertical-align: baseline;">Helping understand team performance</strong></h3> <p><span style="vertical-align: baseline;">A few years into the life of the platform, our emphasis started to shift. There was less of a need to prove the value of the platform — the business and our engineers were happy — so we shifted from “how can we get you into production as quickly as possible” towards “how can we enable you to continue to be as fast, but also reduce friction, in your day-to-day activities.” This led us towards DORA metrics.</span></p> <p><span style="vertical-align: baseline;">Our initial DORA implementations involved mining information from our systems of record for change and incident, complimented by our already-mature observability stack for availability data, as well as pulling events from things like cloud audit logs. We built software to do this and stored it in BigQuery, which enabled us to visualize the data in our home-grown Service Catalogue tool. Later, we moved this into Grafana dashboards instead, which are still in use today:</span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/3_N8Q4Xha.max-1000x1000.png" alt="3"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">Looking for patterns in this data led to us discovering additional features that would be useful for us to build. Two major examples of this were </span><span style="font-style: italic; vertical-align: baseline;">handling change</span><span style="vertical-align: baseline;">, and </span><span style="font-style: italic; vertical-align: baseline;">operational readiness</span><span style="vertical-align: baseline;">.</span></p> <p><span style="vertical-align: baseline;">JLP’s service management processes were geared towards handling complex release processes across multiple large systems and/or teams - but we had fundamentally changed our architecture by adopting microservices. This empowered teams to release independently at will, and therefore manage the consequences of failed changes themselves. We used the data we’d collected about change failure rates and frequency of small releases to justify a different approach: allowing tenants to automatically raise and close changes as part of their CI/CD pipelines. After clearing this approach with our Service Management team, we developed a CLI tool that teams could use within their pipelines. This had the additional benefit of allowing us to capture useful data at point of release, rather than scraping more awkward data sources. The automated change “carrot” was very popular and was widely adopted, shifting the approval point left to the pull request rather than later in the release process. This reduced time wastage, change-set size and risk of collisions.</span></p> <p><span style="vertical-align: baseline;">In a similar vein, with more teams operating their own services, the need for a central site-wide operations team was reduced. We could see from our metrics that teams practicing “You Build It, You Run It” had fewer incidents and were resolving them much more quickly. We used this as evidence to bring in tooling to help them respond to incidents faster, and decouple the centralized ops teams from those processes — in some cases allowing them to focus on legacy systems, and in others, removing the need for the service entirely (which resulted in significant cost savings, despite the fact that we had more individual product teams on-call). This, and supporting observability and alerting tooling, was all configured through the platform’s paved-road pipeline described in our </span><a href="https://cloud.google.com/blog/products/application-development/simplifying-platform-engineering-at-john-lewis-part-one"><span style="text-decoration: underline; vertical-align: baseline;">previous article</span></a><span style="vertical-align: baseline;">.</span></p> <p><span style="vertical-align: baseline;">The DORA metrics helped us architecturally as well. Operational data shined a light on the brittleness of third-party and legacy services, thereby driving greater investment into resilience engineering, alternative solutions, and in some cases, causing us to re-evaluate our build vs. buy decisions. </span></p> <h3><strong style="vertical-align: baseline;">Choosing what to measure</strong></h3> <p><span style="vertical-align: baseline;">It’s very important to choose wisely about what to measure. Experts in the field (such as </span><a href="https://www.youtube.com/watch?v=trO_fiTAZeM" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">Laura Tacho</span></a><span style="vertical-align: baseline;">) influenced us to avoid vanity metrics and to be cautious about interpreting the ones we do collect. It’s also important for metrics to be meaningful to the target audience, and presented accordingly.</span></p> <p><span style="vertical-align: baseline;">As an example, we communicate about cost and vulnerability with our teams, but the form this takes depends on the intended audience’s role. For example, we send new vulnerabilities or spikes in cost directly to product teams’ collaboration channels, because experience has taught us that having our engineers see these vulnerabilities results in a faster response. On the other hand, for compliance reporting or review by team leads, reports are more effective at summarising the areas that need action. Because if we know one thing, it’s that nobody wants to be a leader of the “vulnerabilities outside of policy” dashboard!</span></p> <p><span style="vertical-align: baseline;">It was not unusual for us to historically look at measures such as the number or frequency of incidents. But in a world of highly automated response systems, this is a trap, as alerts can be easily duplicated. Focusing too much on a number can drive the wrong behavior — at worst, deliberately avoiding creating an incident at all! Instead, it’s much better to focus on the impact of the parent incident and how long it took to recover. Another example is reporting on the number of vulnerabilities. Imagine you have a package that is used extensively across many components in a distributed system. Disclosing that the package has a vulnerability can create a false sense of scale, when in fact patching the base image deals with the problem swiftly. Instead, it’s better to look at the speed of response than a pre-agreed policy based on severity. This is both a much more effective and reasonable metric for teams to act on, so we see better engagement.</span></p> <p><span style="vertical-align: baseline;">It’s very important that you put across as much context as possible when presenting the data so that the right conclusions can be drawn — especially where those reports are seen by decision-makers. With that in mind, we combined raw metrics we could visualize with user opinion about them. This helped to bring that missing context: Is the team that’s suffering from a high change failure rate also struggling with its release processes and batch size? Is the team that’s not addressing vulnerabilities quickly also reporting that they’re spending too much time on feature development and not enough on operational matters? We reached for a different tool — </span><a href="https://getdx.com/" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">DX</span></a><span style="vertical-align: baseline;"> — to help us bring this sort of information to bear. In our </span><a href="https://cloud.google.com/blog/products/application-development/how-john-lewis-partnership-chose-its-monitoring-metrics"><span style="text-decoration: underline; vertical-align: baseline;">follow-up article</span></a><span style="vertical-align: baseline;">, we’ll elaborate on how we did this and how it prompted us to expand the data we collected about our tenants. Stay tuned!</span></p> <p><span style="font-style: italic; vertical-align: baseline;">To learn more about shifting down with platform engineering on Google Cloud, start </span><a href="https://cloud.google.com/solutions/platform-engineering"><span style="font-style: italic; text-decoration: underline; vertical-align: baseline;">here</span></a><span style="font-style: italic; vertical-align: baseline;">.</span></p></div>
  48. Principal Platform Engineer, John Lewis Partnership

    Wed, 04 Feb 2026 18:00:00 -0000

    <div class="block-paragraph_advanced"><p><span style="font-style: italic; vertical-align: baseline;">In </span><a href="https://cloud.google.com/blog/products/application-development/at-john-lewis-partnership-measuring-developer-platform-value"><span style="font-style: italic; text-decoration: underline; vertical-align: baseline;">part one</span></a><span style="font-style: italic; vertical-align: baseline;"> of this article, Alex Moss from the John Lewis Partnership covered the metrics that they use to measure the value of their developer platform. Now, let's talk about a crucial aspect of any measurement strategy: choosing the right things to measure. It's easy to get lost in a sea of data or to focus on metrics that look impressive, but don't actually reflect the health of your platform or the experience of your developers. Here, Alex shares the John Lewis philosophy on how to choose meaningful metrics and present them in a way that drives the right conversations and actions, ensuring that the data is always presented with as much context as possible. - Darren Evans</span></p> <p><span style="vertical-align: baseline;">While the solution we detailed in the first half of this article worked very well, relying solely on objective measures comes with a number of traps. They are very easy to misinterpret: either wasting time (“the team is working on another product at the moment”) or not telling the right story (“the incident wasn’t closed properly”). This leads to a scaling challenge: Chatting with a small number of teams to understand a situation is one thing. But when you are only one small team trying to build a product, and you need to talk across several dozen teams, it’s not so easy.</span></p> <h3><strong style="vertical-align: baseline;">Collecting engineers’ subjective feedback</strong></h3> <p><span style="vertical-align: baseline;">We needed a way to collate more subjective feedback, ideally in a form that we could visualize and contrast to the objective DORA and other service metrics we held.</span></p> <p><span style="vertical-align: baseline;">Our initial attempt at this involved creating Service Operability Assessments — questionnaires that tenants fill in every quarter. Service Operability Assessments are intended to hold a series of thought-provoking questions aimed at whether the team is following good practices for running their service. This worked well with an experienced facilitator (usually a senior platform engineer) who could ask further probing questions and pull out the key feedback and actions. But as you might imagine, this suffered from scaling challenges. We eventually let this be handled entirely self-service — an imperfect system, since many teams are quite happy to just copy/paste their answers from the previous quarter, which may or may not reflect reality!</span></p> <p><span style="vertical-align: baseline;">We then learned about a tool called </span><a href="https://getdx.com/" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">the DX platform</span></a><span style="vertical-align: baseline;">, which significantly changed how we approached this, and which is now used across our entire Engineering community. It works by surveying individual engineers (rather than teams) for a few minutes every three months. The questions are curated based on DX’s research, backed by the founders of DORA and other similar frameworks. We’ve found it very helpful to be able to slice the results in different ways, including looking at areas across whole platforms or deep-diving on particular teams. The latter, in combination with our DORA data, makes for rich conversations. For example, in the DX tool, a team which recently suffered through some highly impactful incidents might also have registered concerns on “Production Debugging,” while another team that saw a marked drop in release frequency flagged worries around “Change Confidence” or “Ease of Release.” The platforms team can at this point step in to offer advice or potentially implement new features to help with the issues the teams are seeing.</span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/1_J4WNCsj.max-1000x1000.png" alt="1"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">The pre-built drivers and reports in DX are tremendously useful, but we also augment it with our own custom queries to help us understand areas of current focus. For example, we measure Customer Satisfaction (CSAT) for the platform and its portal (Backstage), and collect data on how long it takes for a newcomer to begin submitting pull requests and ask them about how they found the onboarding process. We also recently started assessing engineers’ opinions on the effectiveness of AI coding assistants to help justify further investment in them (instead of just relying on market insight).</span></p> <p><span style="vertical-align: baseline;">An example of where this helped focus our efforts was with documentation, namely, building capabilities into our Backstage developer portal to make it easier for teams to view each others’ docs through pipelines that automatically publish content and make it discoverable.</span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/2_gf9lDAw.max-1000x1000.png" alt="2"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><h3><strong style="vertical-align: baseline;">Service health - Feature adoption &amp; beyond</strong></h3> <p><span style="vertical-align: baseline;">Outside of the insights we generate from the likes of DORA and DX, we’ve recently begun questioning not only whether the platform itself is valuable, but whether tenants are </span><span style="font-style: italic; vertical-align: baseline;">getting the value they should</span><span style="vertical-align: baseline;"> from it. In other words, we’ve effectively started to measure platform feature adoption.</span></p> <p><span style="vertical-align: baseline;">To do this, we built out what we refer to internally as our Technical Health feature. It takes the form of a custom plugin that integrates with our Backstage Developer Portal, which then queries an in-house API that surfaces data fed from a large number of small jobs that collect information on the things we want to measure. These jobs are independently releasable themselves, which allowed us to scale this up pretty quickly. </span></p> <p><span style="vertical-align: baseline;">We currently capture four categories of health measures:</span></p> <ol> <li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">Technical health: </strong><span style="vertical-align: baseline;">We currently have 17 “technical” measures. Examples here include measuring whether teams are using our paved road pipeline and custom Microservice CRD (see previous articles </span><a href="https://cloud.google.com/blog/products/application-development/simplifying-platform-engineering-at-john-lewis-part-one"><span style="text-decoration: underline; vertical-align: baseline;">1</span></a><span style="vertical-align: baseline;"> and </span><a href="https://cloud.google.com/blog/products/application-development/simplifying-platform-engineering-at-john-lewis-part-two"><span style="text-decoration: underline; vertical-align: baseline;">2</span></a><span style="vertical-align: baseline;">) rather than “terraforming” their own resources, following our recommended Kubernetes practices (such as resource sizing, disruption budgets and lifecycle probes), keeping base images up to date, and the like. We also include some “softer” technical measures such as whether they are running pipelines frequently enough to pick up changes (we don’t run this for teams), reviewing their operability assessments, staying on top of git branches, and so on.</span></p> </li> <li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">Operational readiness:</strong><span style="vertical-align: baseline;"> Then, there are 18 measures relating to operational health — things like whether a pre-flight configuration is in place, whether runbooks are written, docs have been published, and so on. This is an evolution of an Operational Readiness checklist from several years ago (back when we used to have separate Delivery and Operations teams, and therefore these sorts of checks were mandatory for “handover”). We tailored this checklist to the specific features of the platform that help teams achieve good operability, rather than being a generic list. This also serves to help our Service Management team feel confident that the right practices are being followed, thereby eliminating a point of friction when carrying out manual reviews.</span></p> </li> <li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">Migrations: </strong><span style="vertical-align: baseline;">From time to time, the Platform requires tenants to carry out work to keep up with changes to the platform itself. A classic example of this is getting teams to deal with deprecated Kubernetes API versions. This also includes adoption of different features that we want to drive more forcefully in order to remove the older way of doing things (say for example, in favour of something more secure). We found that as the Platform grew, we had a long tail of migration work that we needed teams to perform, providing an easy way for Product Managers and Delivery Leads to prioritize their teams’ workloads.</span></p> </li> <li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">Broader engineering practices: </strong><span style="vertical-align: baseline;">We recently opened up the feature to allow other teams to contribute — in this case, our Engineering leadership — to build in their own measures, such as whether teams are keeping up to date with versions of our design system or whether they’re following broader engineering practices that extend beyond just the JL Digital Platform. </span></p> </li> </ol> <p><span style="vertical-align: baseline;">We present this data through aggregated views (like the example shown below), as well as individual tasks and broader leaderboards — all designed to catch the eye of those with influence over a team’s priorities. We’ve found that the desire for an engineer to turn a traffic-light green can be a powerful motivator — far more effective than relying on documentation or announcements.</span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/3_paqGoLi.max-1000x1000.png" alt="3"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">This technology works through custom plugins that we’ve built for the Backstage Portal. Each “health check” is itself its own microservice (often running as a job) which interrogates the appropriate system to determine whether the measure is met. For example, one microservice checks that a PodDisruptionBudget has been created by querying Kubernetes directly, while another that looks at whether distroless base images are in use, does so by inspecting container image layers. There’s a template for creating new metrics, which makes it easy for engineers to create new ones — including those outside the platform team themselves. The results are stored in BigQuery, with an API to make Backstage plugin development simpler.</span></p> <p><span style="vertical-align: baseline;">A reality of introducing measures like this is that it drives more work into the product teams. It is important that your culture be ready for this. If we had implemented these measures very early in the platform’s life, this would likely have affected how the product was perceived — perhaps as very strict or inhibiting the pace of change with guardrails. This can negatively impact overall adoption. By introducing these later on, we benefited from many tenants who already saw the platform as very valuable, as well as the confidence that we had selected the right measures and could apply them consistently. That said, we did still see a small drop in CSAT for the platform after we started doing this. We try to be considerate about the pace that we launch each measure to give product teams the time to absorb the work, as well as provide a means for teams to suppress the indicators that aren’t relevant to them. For example, a tenant might deliberately choose not to use pod autoscaling for performance reasons, or have a functional reason why they can’t use our Microservice CRD.</span></p> <p><span style="vertical-align: baseline;">The introduction of these sorts of assurance measures on tenant behaviour is a reflection of the maturity of the platform. In the early days, we relied on highly skilled teams to do the right thing whilst going fast. But as time has passed, we’ve witnessed a variety of skills and capabilities, combined with shifts in ownership of services, that pushed us to introduce techniques to drive the right outcomes. This is also due to the platform itself becoming complex — the cognitive load for a new team is much higher than it was, due to all its new features. We needed to put some lights along the edges of our paved road to help teams stay on it!</span></p> <p><span style="vertical-align: baseline;">Throughout this evolution, we’ve continued to report on our key results for the business themselves: Are we still doing what they want of us? This has naturally shifted from “go fast, enable teams” (which we largely see as a solved problem, to be honest) towards “do it safely, and manage your technical debt.”</span></p> <h3><strong style="vertical-align: baseline;">Are you being served? Key takeaways</strong></h3> <p><span style="vertical-align: baseline;">Long story short, the question of whether a developer platform has value is complex, and can be answered in many ways. As you embark on building out — and quantifying — your own developer platform, here are a few concluding thoughts to keep in mind:  </span></p> <ol> <li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">Measurement is a journey, not a destination:</strong><span style="vertical-align: baseline;"> Start by measuring something meaningful to your stakeholders, but be prepared to adapt as your platform evolves. In the beginning, it’s okay to prioritize further investment in your product, but it’s better to actually measure how the platform is enabling your teams. The things that mattered when you were initially proving out the platform’s viability are unlikely to be what are important several years later when your features are more mature and your priorities have shifted.</span></p> </li> <li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">Listen to the humans: </strong><span style="vertical-align: baseline;">Don’t assume that just because your platform is being used, that it is providing value. The most powerful metrics are often qualitative; engineers wanting to use your tool and CSAT are strong signals, but asking them questions about how they are using it is a better way to gain insight into how you can improve it. It is hard to figure out what’s working (and what isn’t) through measurement alone.</span></p> </li> <li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">Data is for enabling, not just reporting:</strong><span style="vertical-align: baseline;"> Use your insights to help teams improve, not just to show graphs to leadership. Further, be transparent about what specific data led you to act. For example, when you see a dip in release frequency for a specific team, use that data to start a conversation about potential roadblocks rather than simply flagging it as a problem. By doing this, you build the trust and goodwill with both leadership and your tenants to keep moving the platform forward. </span></p> </li> </ol> <hr/> <p><sub><span style="font-style: italic; vertical-align: baseline;">The evolution of the John Lewis Partnership’s measurement strategy serves as a compelling case study. By transitioning from basic lead-time tracking to a holistic model — blending DORA metrics with qualitative developer feedback — they demonstrated that true platform success is defined by the genuine value it delivers, not merely by adoption rates.</span></sub></p> <p><sub><span style="font-style: italic; vertical-align: baseline;">To learn more about platform engineering on Google Cloud, check out some of our other articles: Using Platform Engineering to simplify the developer experience - </span><a href="https://cloud.google.com/blog/products/application-development/simplifying-platform-engineering-at-john-lewis-part-one"><span style="font-style: italic; text-decoration: underline; vertical-align: baseline;">part one</span></a><span style="font-style: italic; vertical-align: baseline;">, </span><a href="https://cloud.google.com/blog/products/application-development/simplifying-platform-engineering-at-john-lewis-part-two"><span style="font-style: italic; text-decoration: underline; vertical-align: baseline;">part two</span></a><span style="font-style: italic; vertical-align: baseline;">, </span><a href="https://cloud.google.com/blog/products/application-development/common-myths-about-platform-engineering"><span style="font-style: italic; text-decoration: underline; vertical-align: baseline;">5 myths about platform engineering: what it is and what it isn’t</span></a><span style="font-style: italic; vertical-align: baseline;"> and</span><span style="font-style: italic; vertical-align: baseline;"> </span><a href="https://cloud.google.com/blog/products/application-development/another-five-myths-about-platform-engineering"><span style="font-style: italic; text-decoration: underline; vertical-align: baseline;">Another five myths about platform engineering</span></a><span style="font-style: italic; vertical-align: baseline;">. We also recommend reading about </span><a href="https://cloud.google.com/blog/products/application-development/introducing-app-hub"><span style="text-decoration: underline; vertical-align: baseline;">App Hub</span></a><span style="vertical-align: baseline;">, </span><span style="font-style: italic; vertical-align: baseline;">our foundational tool for managing application-centric governance across your organization.</span></sub></p></div>
  49. 10X Lead, delta Team, Google Cloud Consulting

    Thu, 08 Jan 2026 17:00:00 -0000

    <div class="block-paragraph_advanced"><p class="p1">FINRA, the Financial Industry Regulatory Authority, consistently seeks to achieve the highest standards in its technology practices. To elevate its software development lifecycle, FINRA — which oversees member broker-dealers — engaged Google consultants to help apply a metrics-driven methodology to its engineering practices.</p> <p class="p1"><a href="https://dora.dev/" rel="noopener" target="_blank">DORA</a> is a popular framework <span style="vertical-align: baseline;">for helping organization improve software delivery performance through capabilities that can be measured by key metrics. These include </span>deployment frequency, change lead time, change failure rate, failed deployment recovery time, and rework.</p> <p class="p1">While FINRA had begun laying the groundwork to adopt DORA internally, the organization recognized an opportunity to accelerate implementation by tapping Google's firsthand experience.</p> <p class="p1">Google conducted a discovery effort alongside technology leaders to identify opportunities for improvement. The recommendation that followed included increasing the existing focus on continuous improvement, adopting a user-centric approach to developing software and further enabling a generative culture within the department.</p> <p class="p1">The implementation itself was deliberately flexible. Rather than recommending a one-size-fits-all approach, Google helped FINRA tailor its actions to individual team objectives. Teams prioritizing product value concentrated on lead time and deployment frequency metrics, while teams focused on stability concentrated on change failure rates and<span style="vertical-align: baseline;"> failed deployment recovery time</span>.</p> <p class="p1">Over the first year of implementation, engineering teams demonstrated continuous improvement across DORA capabilities, achieving a 9% per-developer productivity gain and reporting directionally positive developer experience feedback.</p> <p class="p1">Sprint velocities also improved by 5%, enabling smaller engineering teams to deliver greater incremental product value to the business. Beyond raw metrics, teams also reported heightened transparency around delivery performance and appreciation for a standardized methodology.</p> <p class="p1">Looking ahead, FINRA is maturing its DORA practice by providing more granular metrics tied to high-level DORA measurements, increasing emphasis on developer experience and correlating product metrics with software delivery performance indicators.</p> <p class="p1"><em>Want to discover what AI can do for governments, nonprofits, and other public sector organizations? Register to attend our upcoming <a href="https://cloudonair.withgoogle.com/events/gemini-for-government-your-front-door-for-mission-ai" rel="noopener" target="_blank">Gemini for Government webinar on February 5</a>, where we will dive deeper into the transformative technology powering the next wave of innovation across the public sector.</em></p></div>
  50. Senior Product Marketing Manager

    Tue, 09 Dec 2025 17:00:00 -0000

    <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">The </span><a href="https://cloud.google.com/blog/products/ai-machine-learning/announcing-the-2025-dora-report"><span style="text-decoration: underline; vertical-align: baseline;">2025 State of AI-assisted Software Development report</span></a><span style="vertical-align: baseline;"> revealed a critical truth: AI is an amplifier. It magnifies the strengths of high-performing organizations and the dysfunctions of struggling ones.</span></p> <p><span style="vertical-align: baseline;">While AI adoption is now near-universal, with 90% of developers using it in their daily workflows, success is not guaranteed. Our cluster analysis of nearly 5,000 technology professionals reveals significant variation in team performance: Not everyone experiences the same outcomes from adopting AI. </span></p> <p><span style="vertical-align: baseline;">From this disparity, we can conclude that how they are using AI is a critical factor. We wanted to understand the particular capabilities and conditions that enable teams to achieve positive outcomes, leading us to develop the </span><a href="https://cloud.google.com/resources/content/2025-dora-ai-capabilities-model-report"><span style="text-decoration: underline; vertical-align: baseline;">DORA AI Capabilities Model report</span></a><span style="vertical-align: baseline;">. </span></p> <p><span style="vertical-align: baseline;">This companion guide to the 2025 DORA Report is designed to help you navigate our new reality. It provides actionable strategies, implementation tactics, and measurement frameworks to help technology leaders build an environment where AI thrives.</span></p> <h3><strong style="vertical-align: baseline;">Seven capabilities that amplify success</strong></h3> <p><span style="vertical-align: baseline;">Successfully using AI requires cultivating your technical and cultural environment. From the same set of respondents who participated in the 2025 DORA survey, we identified seven foundational capabilities that are proven to amplify the positive impact of AI on organizational performance:</span></p> <ol> <li role="presentation"><strong style="vertical-align: baseline;">Clear and communicated AI stance</strong><span style="vertical-align: baseline;">: Ambiguity creates risk. A clear policy provides the psychological safety developers need to experiment effectively.</span></li> <li role="presentation"><strong style="vertical-align: baseline;">Healthy data ecosystems</strong><span style="vertical-align: baseline;">: AI is only as good as the data it learns from. Investing in high-quality, accessible, and unified internal data significantly amplifies AI's benefits.</span></li> <li role="presentation"><strong style="vertical-align: baseline;">AI-accessible internal data</strong><span style="vertical-align: baseline;">: This involves "context engineering," moving beyond simple prompts to securely connect AI tools to your internal documentation and codebases.</span></li> <li role="presentation"><strong style="vertical-align: baseline;">Strong version control practices</strong><span style="vertical-align: baseline;">: As AI increases the volume and velocity of code generation, version control becomes your critical safety net. Frequent commits and robust rollback capabilities are essential for maintaining stability in an AI-assisted world.</span></li> <li role="presentation"><strong style="vertical-align: baseline;">Working in small batches</strong><span style="vertical-align: baseline;">: AI can easily generate massive blocks of code, which are hard to review and test. Enforcing the discipline of small batches counteracts this risk, ensuring that speed translates to product performance rather than instability.</span></li> <li role="presentation"><strong style="vertical-align: baseline;">User-centric focus</strong><span style="vertical-align: baseline;">: Speed is irrelevant if you are moving in the wrong direction. Adopting AI tools can actually harm teams that lack a user-centric focus. Keeping user needs as your North Star is essential for guiding AI-assisted development.</span></li> <li><strong style="vertical-align: baseline;">Quality internal platforms</strong><span style="vertical-align: baseline;">: A platform provides the automated, secure "paved roads" that allow AI benefits to scale across the organization. It prevents individual productivity gains from being lost to downstream bottlenecks.</span></li> </ol></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/dora-ai-capabilities-model.max-1000x1000.jpg" alt="dora-ai-capabilities-model"> </a> <figcaption class="article-image__caption "><p data-block-key="y4u85">The DORA AI Capabilities Model shows which capabilities amplify the effect of AI adoption on</p><p data-block-key="7k909">specific outcomes</p></figcaption> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><h3><strong style="vertical-align: baseline;">Where to start: Assessing your team</strong></h3> <p><span style="vertical-align: baseline;">Every organization starts their AI journey differently. To help you prioritize, this report introduces seven distinct team archetypes derived from our cluster analysis. These profiles range from "harmonious high-achievers," who excel in both performance and well-being, to teams facing "foundational challenges" or those stuck in a "legacy bottleneck," where unstable systems undermine morale.</span></p> <p><span style="vertical-align: baseline;">Identifying the profile that best matches your team can help pinpoint the most impactful interventions. For example, a "high impact, low cadence" team might prioritize automation to improve stability, while a team "constrained by process" might focus on reducing friction through a better AI stance.</span></p> <h3><strong style="vertical-align: baseline;">Digging deeper with Value Stream Mapping</strong></h3> <p><span style="vertical-align: baseline;">Once you understand your team's profile, how do you direct your efforts? The report includes a step-by-step facilitation guide for running a Value Stream Mapping (VSM) exercise.</span></p> <p><span style="vertical-align: baseline;">VSM acts as an AI force multiplier. By visualizing your flow from idea to customer, you can identify where work waits and where friction exists. This ensures that the efficiency gains from AI aren't just creating local optimizations that pile up work downstream, but are instead channeled into solving system-level constraints.</span></p> <h3><strong style="vertical-align: baseline;">Get better at getting better</strong></h3> <p><span style="vertical-align: baseline;">AI adoption is an organizational transformation. The greatest returns come not from the tools themselves, but from investing in the foundational systems that enable them.</span></p> <ul> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><a href="https://cloud.google.com/resources/content/2025-dora-ai-capabilities-model-report"><span style="text-decoration: underline; vertical-align: baseline;">Download the full report</span></a></p> </li> <li><span style="vertical-align: baseline;">Join the </span><a href="https://dora.community/" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">DORA community</span></a></li> </ul></div>
  51. Practice Lead, SRE

    Mon, 08 Dec 2025 17:00:00 -0000

    <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">When was the last time you </span><span style="font-style: italic; vertical-align: baseline;">knew — </span><span style="vertical-align: baseline;">not just </span><span style="font-style: italic; vertical-align: baseline;">hoped</span><span style="vertical-align: baseline;"> — that your disaster recovery plan would work perfectly?</span></p> <p><span style="vertical-align: baseline;">For most of us, the answer is unclear. Sure, you may have a DR plan, a meticulously crafted document stored in a wiki or a shared drive, that gets dusted off for compliance audits or the occasional tabletop drill. You assume its procedures are correct, its contact lists are current, and its dependencies are fully mapped, and you certainly </span><span style="font-style: italic; vertical-align: baseline;">hope</span><span style="vertical-align: baseline;"> it works.</span></p> <p><span style="vertical-align: baseline;">But </span><a href="https://sre.google/prodverbs/?slide=10" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">hope is not a strategy</span></a><span style="vertical-align: baseline;">.</span></p> <p><span style="vertical-align: baseline;">Why wouldn’t it work? One problem is that systems are rarely static anymore. In a world where you deploy new microservices dozens of times per day, make constant configuration changes, and maintain an ever-growing web of third-party API dependencies, the DR plan you wrote last quarter is probably just as useful as one from 10 years ago. </span></p> <p><span style="vertical-align: baseline;">And if the failover does work, will it work well enough to meet the promises you've made to your customers (or board of directors or regulators)? When a key component fails, could you still even meet your target availability and latency targets, a.k.a., your Service Level Objectives (SLOs)?</span></p> <p><span style="vertical-align: baseline;">So, how do you close this gap between your current aspirational DR plan and a DR plan that you actually have confidence in? The answer isn't to write more documents or run more theatrical drills. The answer is to stop </span><span style="font-style: italic; vertical-align: baseline;">assuming</span><span style="vertical-align: baseline;"> and start </span><span style="font-style: italic; vertical-align: baseline;">proving</span><span style="vertical-align: baseline;">.</span></p> <p><span style="vertical-align: baseline;">This is where chaos engineering comes in. Unlike what the name might imply, chaos engineering isn’t a tool for recklessly breaking things. Instead, it’s a framework that provides data-driven confidence in your SLOs under stress. By running controlled experiments that simulate real-world disasters like a database failover or a regional outage, you can quantitatively measure the impact of those failures on your systems’ performance. Chaos engineering is how you transform your DR hypotheses into a proven method to ensure resilience. By validating your plan through experimentation, you create tangible evidence, verifying that your plan will safeguard your infrastructure and keep your promises to customers.</span></p> <h3><strong style="vertical-align: baseline;">Demystifying chaos engineering</strong></h3> <p><span style="vertical-align: baseline;">In a nutshell, chaos engineering is the practice of running controlled, scientific experiments to find weaknesses in your system before they cause a real outage. </span></p> <p><span style="vertical-align: baseline;">At its core, it’s about building confidence in your system’s resilience. The process starts with understanding your system's </span><strong style="vertical-align: baseline;">steady state</strong><span style="vertical-align: baseline;">, which is its normal, measurable, and healthy output. You can't know the true impact of a failure without first defining what "good" looks like. This understanding allows you to form a clear, testable </span><strong style="vertical-align: baseline;">hypothesis</strong><span style="vertical-align: baseline;">: a statement of belief that your system's steady state will persist even when a specific, turbulent condition is introduced.</span></p> <p><span style="vertical-align: baseline;">To test this hypothesis, you then execute a controlled </span><strong style="vertical-align: baseline;">action</strong><span style="vertical-align: baseline;">, which is a precise and targeted failure injected into the system. This isn't random mischief; it's a specific simulation of real-world failures, such as consuming all CPU on a host (</span><strong style="vertical-align: baseline;">resource exhaustion</strong><span style="vertical-align: baseline;">), adding network latency (</span><strong style="vertical-align: baseline;">network failure</strong><span style="vertical-align: baseline;">), or terminating a virtual machine (</span><strong style="vertical-align: baseline;">state failure</strong><span style="vertical-align: baseline;">). While this action is running, automated </span><strong style="vertical-align: baseline;">probes</strong><span style="vertical-align: baseline;"> act as your scientific instruments, continuously monitoring the system's state to measure the effect. </span></p> <p><span style="vertical-align: baseline;">Together, these components form a complete scientific loop: you use a </span><strong style="vertical-align: baseline;">hypothesis</strong><span style="vertical-align: baseline;"> to predict resilience, run an experiment by applying an </span><strong style="vertical-align: baseline;">action</strong><span style="vertical-align: baseline;"> to simulate adversity, and use </span><strong style="vertical-align: baseline;">probes</strong><span style="vertical-align: baseline;"> to measure the impact, turning uncertainty into hard data.</span></p> <h3><strong style="vertical-align: baseline;">Using chaos to validate disaster recovery plans</strong></h3> <p><span style="vertical-align: baseline;">Now that you understand the building blocks of a chaos experiment, you can build the bridge to your ultimate goal: transforming your DR plan from a document of hope into an evidence-based procedure. The key is to stop seeing your DR plan as a set of instructions and start seeing it for what it truly is: a collection of unproven hypotheses.</span></p> <p><span style="vertical-align: baseline;">When you think about it, every significant statement in your DR document is a claim waiting to be tested. When your plan states, </span><span style="font-style: italic; vertical-align: baseline;">"The database will failover to the replica in under 5 minutes,"</span><span style="vertical-align: baseline;"> that isn't a fact, it's a </span><strong style="vertical-align: baseline;">hypothesis</strong><span style="vertical-align: baseline;">. When it says, </span><span style="font-style: italic; vertical-align: baseline;">"In the event of a regional outage, traffic will be successfully rerouted to the secondary region,"</span><span style="vertical-align: baseline;"> that's another hypothesis. Your DR plan is filled with these critical assumptions about how your system </span><span style="font-style: italic; vertical-align: baseline;">should</span><span style="vertical-align: baseline;"> behave under duress. Until you test them, they remain nothing more than educated guesses.</span></p> <p><span style="vertical-align: baseline;">Chaos experiments are the ultimate validation tools, </span><strong style="vertical-align: baseline;">live-fire drills</strong><span style="vertical-align: baseline;"> that put your DR hypotheses to a real, empirical test. Instead of just talking through a scenario, you use controlled </span><strong style="vertical-align: baseline;">actions</strong><span style="vertical-align: baseline;"> to safely and precisely simulate the disaster. You're no longer asking "what if?"; you're actively measuring "what happens when."</span></p> <p><span style="vertical-align: baseline;">For example, imagine you have a DR plan for a regional outage. When you adopt chaos engineering, you break down that plan into a hypothesis and an experiment. For example:</span></p> <ul> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">The hypothesis:</strong><span style="vertical-align: baseline;"> "In case our primary region </span><code style="vertical-align: baseline;">us-central1</code><span style="vertical-align: baseline;"> becomes unreachable, the load balancers will failover all traffic to </span><code style="vertical-align: baseline;">us-east1</code><span style="vertical-align: baseline;"> within 3 minutes, with an error rate below 1%."</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">The chaos experiment:</strong><span style="vertical-align: baseline;"> Run an </span><strong style="vertical-align: baseline;">action</strong><span style="vertical-align: baseline;"> that simulates a regional outage by injecting a "blackhole" that drops all network traffic to and from </span><code style="vertical-align: baseline;">us-central1</code><span style="vertical-align: baseline;"> for a limited time. Your </span><strong style="vertical-align: baseline;">probes</strong><span style="vertical-align: baseline;"> then measure the actual failover time and error rates to validate the hypothesis.</span></p> </li> </ul> <p><span style="vertical-align: baseline;">In other words, by applying the chaos engineering methodology, you systematically move through your DR plan, turning each assumption into a proven fact. You're not just testing your plan; you're forging it in a controlled fire.</span></p> <h3><strong style="vertical-align: baseline;">Connecting chaos readiness to your SLOs</strong></h3> <p><span style="vertical-align: baseline;">Beyond simply proving system availability, chaos engineering builds trust in your reliability metrics, ensuring that you meet your SLOs even when services become unavailable. An SLO is a specific, acceptable target level of your service's performance measured over a specified period that reflects the user's experience. SLOs aren't just internal goals; they are the bedrock of customer trust and the foundation of your contractual service level agreements (SLAs).</span></p> <p><span style="vertical-align: baseline;">A traditional DR drill might get a "pass" because the backup system came online. But what if it took 20 minutes to fail over, during which every user saw errors? What if the backup region was under-provisioned, and performance became so slow that the service was unusable? From a technical perspective, you "recovered." But from a customer's perspective, you were down.</span></p> <p><span style="vertical-align: baseline;">A chaos experiment, however, can help you answer a critical question: </span><strong style="vertical-align: baseline;">"During a failover, did we still meet our SLOs?” </strong><span style="vertical-align: baseline;">Because your probes are constantly measuring performance against your SLOs, you get the full picture. You don't just see that the database failed over; you see that it took 7 minutes, during which your latency SLO was breached and your </span><a href="https://sre.google/sre-book/embracing-risk/#:~:text=Forming%20Your%20Error%20Budget,new%20releases%20can%20be%20pushed." rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">error budget</span></a><span style="vertical-align: baseline;"> was completely burned. This is the crucial, game-changing insight. It shifts the entire goal from simple disaster recovery to </span><strong style="vertical-align: baseline;">SLO preservation</strong><span style="vertical-align: baseline;">, which is what actually determines if a failure was a minor hiccup or a major business-impacting incident. It also provides the data necessary to set goals for system improvement. So the next time you run this experiment, you can measure if and how much your system resilience has improved, and ultimately if you can maintain your SLO during the disaster event.</span></p> <h3><strong style="vertical-align: baseline;">Build a culture of confidence</strong></h3> <p><span style="vertical-align: baseline;">The journey to resilience doesn't start by simulating a full regional failover. It starts with a single, small experiment. The goal is not to boil the ocean; it's to build momentum. Test one timeout, one retry mechanism, or one graceful error message.</span></p> <p><span style="vertical-align: baseline;">The biggest win from your first successful experiment won't be the technical data you gather. It will be the confidence you build. When your team sees that they can safely inject failure, learn from it, and improve the system, their entire relationship with failure changes. Fear is replaced by curiosity. That confidence is the catalyst for building a true, enduring culture of resilience. To learn more and get started with chaos engineering, check out </span><a href="https://cloud.google.com/blog/products/devops-sre/getting-started-with-chaos-engineering?e=48754805"><span style="text-decoration: underline; vertical-align: baseline;">this blog</span></a><span style="vertical-align: baseline;"> and </span><a href="https://sre.google/prodcast/#season3-episode12" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">this podcast</span></a><span style="vertical-align: baseline;">. And if you’re ready to get started, but unsure how, reach out to Google Cloud professional services to discuss how we can help.</span></p></div>
  52. Group Product Manager, Google Cloud

    Mon, 08 Dec 2025 17:00:00 -0000

    <div class="block-paragraph_advanced"><p style="text-align: justify;"><span style="vertical-align: baseline;">Earlier this year, we unveiled a big investment in platform and developer team productivity, with the launch of </span><a href="https://docs.cloud.google.com/application-design-center/docs/overview"><span style="text-decoration: underline; vertical-align: baseline;">Application Design Center</span></a><span style="vertical-align: baseline;">, </span><span style="vertical-align: baseline;">helping them streamline </span><span style="vertical-align: baseline;">the design and deployment of cloud application infrastructure, while ensuring applications are secure, reliable, and aligned with best practices</span><span style="vertical-align: baseline;">. And today, Application Design Center is generally available.</span></p> <p style="text-align: justify;"><span style="vertical-align: baseline;">We built Application Design Center to put applications at the center of your cloud experience, with a visual, canvas-style and AI-powered approach to design and modify Terraform-backed application templates. It also offers full lifecycle management that’s aligned with DevOps best practices across application design and deployment.</span></p> <p style="text-align: justify;"><span style="vertical-align: baseline;">Application Design Center is a core component of our </span><a href="https://docs.cloud.google.com/hub/docs/application-centric-google-cloud"><span style="text-decoration: underline; vertical-align: baseline;">application-centric cloud experience</span></a><span style="vertical-align: baseline;">. When you use Application Design Center to design and deploy your application infrastructure, your applications are easily discoverable, observable, and manageable. Application Design Center works in concert with </span><a href="https://cloud.google.com/app-hub/docs/overview"><span style="text-decoration: underline; vertical-align: baseline;">App Hub</span></a><span style="vertical-align: baseline;"> to automatically register application deployments, enabling a unified view and control plane for your application portfolio, and </span><a href="https://docs.cloud.google.com/hub/docs/overview"><span style="text-decoration: underline; vertical-align: baseline;">Cloud Hub</span></a><span style="vertical-align: baseline;">, to provide operational insights for your applications.</span></p> <p style="text-align: justify; padding-left: 40px;"><span style="font-style: italic; vertical-align: baseline;">“Google Application Design Center is a valuable enabler for Platform Engineering, providing a structured approach to harmonizing resource creation in Google Cloud Platform. By aligning tools, processes, and technologies, it streamlines workflows, reducing friction between development, operations, and other teams. This harmonization enhances collaboration, accelerates delivery, and ensures consistency across Google Cloud environments.”</span><span style="vertical-align: baseline;"> - </span><strong style="vertical-align: baseline;">Ervis Duraj, Principal Engineer,</strong><span style="vertical-align: baseline;"> </span><strong style="vertical-align: baseline;">MediaMarktSaturn Technology</strong></p> <h3><span style="vertical-align: baseline;">The gateway to an app-centric cloud</span></h3> <p style="text-align: justify;"><span style="vertical-align: baseline;">Our goal with Application Design Center is for you to innovate more, and administer less. It consists of </span><span style="vertical-align: baseline;">four key elements to help you minimize administrative overhead and maximize efficiency, so you can design and deploy applications with integrated best practices and essential guardrails. Let’s take a closer look.</span></p> <p role="presentation" style="text-align: justify;"><span style="vertical-align: baseline;">1. </span><strong style="vertical-align: baseline;">Terraform </strong><a href="https://docs.cloud.google.com/application-design-center/docs/supported-resources"><strong style="text-decoration: underline; vertical-align: baseline;">components</strong></a><strong style="vertical-align: baseline;"> and </strong><a href="https://docs.cloud.google.com/application-design-center/docs/design-application-templates"><strong style="text-decoration: underline; vertical-align: baseline;">application templates</strong></a><strong style="vertical-align: baseline;"> <br/></strong><span style="vertical-align: baseline;">Develop applications faster with our growing library of opinionated application templates. These provide well-architected patterns and pre-built components, including innovative "AI inference templates" to help you leverage AI to create dynamic and intelligent application foundations. As an example, at launch, Application Design Center provides opinionated templates for Google Kubernetes Engine (GKE) clusters (</span><a href="https://docs.cloud.google.com/application-design-center/docs/configure-gke-standard-cluster"><span style="text-decoration: underline; vertical-align: baseline;">Standard</span></a><span style="vertical-align: baseline;">, </span><a href="https://docs.cloud.google.com/application-design-center/docs/configure-gke-autopilot-cluster"><span style="text-decoration: underline; vertical-align: baseline;">Autopilot</span></a><span style="vertical-align: baseline;"> and </span><a href="https://docs.cloud.google.com/application-design-center/docs/configure-gke-node-pool"><span style="text-decoration: underline; vertical-align: baseline;">NodePool</span></a><span style="vertical-align: baseline;">) to run AI inference workloads using a variety of LLM models, as well as for enterprise-grade production clusters or single-region web app clusters. </span></p> <p><span style="vertical-align: baseline;">You can also </span><a href="https://docs.cloud.google.com/application-design-center/docs/import-components"><span style="text-decoration: underline; vertical-align: baseline;">ingest and manage your existing Terraform configurations</span></a><span style="vertical-align: baseline;"> (“Bring your own Terraform”) directly from Git repositories. Once imported, you can use Application Design Center to design with your own Terraform, or in combination with Google-provided Terraform, to create standardized, opinionated infrastructure patterns for sharing and reuse across your application teams.</span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/3-_Catalog_Share.gif" alt="3- Catalog Share"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><p role="presentation" style="text-align: justify;"><span style="vertical-align: baseline;">2. </span><strong style="vertical-align: baseline;">AI-powered design for rapid application designing and prototyping <br/></strong><span style="vertical-align: baseline;">Application Design Center integrates with Google's </span><a href="https://cloud.google.com/gemini/docs/cloud-assist/design-application"><span style="text-decoration: underline; vertical-align: baseline;">Gemini Cloud Assist Design Agent,</span></a><span style="vertical-align: baseline;"> empowering you to design actual, deployable application infrastructure application templates on Google Cloud that you can export as Terraform infrastructure-as-code. </span></p> <p style="text-align: justify;"><span style="vertical-align: baseline;">With Gemini Cloud Assist, you can describe your application design intents using natural language. In return, Gemini interactively generates multi-product application template suggestions, complete with visual architecture diagrams and summarized benefits. You can then refine these proposals through multi-turn reasoning or by directly manipulating the architecture within the Application Design Center canvas. </span></p> <p><span style="vertical-align: baseline;">Additionally, all designs that you create with Gemini are automatically observable, optimizable, and enabled for troubleshooting assistance during runtime, thanks to their tight integration with </span><a href="https://cloud.google.com/products/gemini/cloud-assist?hl=en"><span style="text-decoration: underline; vertical-align: baseline;">Gemini Cloud Assist</span></a><span style="vertical-align: baseline;">.</span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/1-Components_and_templates.gif" alt="1-Components and templates"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><p role="presentation" style="text-align: justify;"><span style="vertical-align: baseline;">3. </span><strong style="vertical-align: baseline;">A secure, sharable catalog of application templates with full lifecycle management<br/></strong><span style="vertical-align: baseline;">Platform admins can curate a collection of application templates built from Google's best-practice components. This provides developers a trusted, self-service experience from which they can quickly discover and deploy compliant applications. Tight integration with </span><a href="https://docs.cloud.google.com/hub/docs/overview"><span style="text-decoration: underline; vertical-align: baseline;">Cloud Hub</span></a><span style="vertical-align: baseline;"> transforms these governed templates into a live operational command center, complete with unified visibility into the health and deployment status of the resulting applications. This closes the critical loop between design and runtime, so that your production environments reflect your organization’s approved architectural standards.</span></p> <p><span style="vertical-align: baseline;">Also, Application Design Center’s robust </span><a href="https://docs.cloud.google.com/application-design-center/docs/manage-application-instances#create-application-revision"><span style="text-decoration: underline; vertical-align: baseline;">application template revisions</span></a><span style="vertical-align: baseline;"> serve as an immutable audit trail. It automatically detects and flags configuration drift between your intended designs and deployed applications, so that developers can remediate unauthorized changes or safely push approved configuration updates. This helps ensure continuous state consistency and compliance from Day 1 and through the subsequent evolution of your application.</span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/2-Design_Agent.gif" alt="2-Design Agent"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><p role="presentation" style="text-align: justify;"><span style="vertical-align: baseline;">4. </span><strong style="vertical-align: baseline;">GitOps integration automating developers’ day-to-day software design lifecycle tasks <br/></strong><span style="vertical-align: baseline;">By integrating Application Design Center into existing CI/CD workflows, platform teams empower developers to own the complete software delivery lifecycle right from their IDE. Developers can leverage compliant application </span><span style="font-style: italic; vertical-align: baseline;">and</span><span style="vertical-align: baseline;"> infrastructure (IaC) code using Application Design Center application templates. </span></p> <p style="text-align: justify;"><span style="vertical-align: baseline;">Further, every infrastructure decision made through Application Design Center is committed to code, versioned, and auditable. Specifically, developers can download the application IaC template from Application Design Center and import it into their app repos (the single source of truth), clone their repo, and edit the Terraform directly in their local IDEs. Any modifications go through a Git pull request for review. Once approved, this automatically triggers the existing CI/CD setup to build, test, and deploy both app and infra changes in lockstep. This unified approach minimizes friction, enforcing "golden paths" and providing an end-to-end automated pathway from a line of code in the IDE to a fully deployed change in production. </span></p> <h3 style="text-align: justify;"><span style="vertical-align: baseline;">What's new since preview</span></h3> <p style="text-align: justify;"><span style="vertical-align: baseline;">This GA launch is packed with features that users have been asking for. We’re excited to share powerful new capabilities: enterprise-grade governance and security with </span><a href="https://cloud.google.com/sdk/gcloud/reference/design-center"><span style="text-decoration: underline; vertical-align: baseline;">public APIs and gcloud CLI support</span></a><span style="vertical-align: baseline;">; </span><a href="https://docs.cloud.google.com/application-design-center/docs/set-up-secure-perimeter"><span style="text-decoration: underline; vertical-align: baseline;">full compatibility with VPC service controls</span></a><span style="vertical-align: baseline;">; </span><a href="https://docs.cloud.google.com/application-design-center/docs/import-components"><span style="text-decoration: underline; vertical-align: baseline;">bring your own Terraform</span></a><span style="vertical-align: baseline;"> and </span><a href="https://docs.cloud.google.com/application-design-center/docs/download-and-deploy#export_terraform_code"><span style="text-decoration: underline; vertical-align: baseline;">GitOps support</span></a><span style="vertical-align: baseline;"> for integration with your existing application patterns and automation pipelines; agentic application patterns using GKE templates (</span><a href="https://docs.cloud.google.com/application-design-center/docs/configure-gke-standard-cluster"><span style="text-decoration: underline; vertical-align: baseline;">Standard</span></a><span style="vertical-align: baseline;">, </span><a href="https://docs.cloud.google.com/application-design-center/docs/configure-gke-autopilot-cluster"><span style="text-decoration: underline; vertical-align: baseline;">Autopilot</span></a><span style="vertical-align: baseline;"> and </span><a href="https://docs.cloud.google.com/application-design-center/docs/configure-gke-node-pool"><span style="text-decoration: underline; vertical-align: baseline;">NodePool</span></a><span style="vertical-align: baseline;">); and finally, a simplified onboarding experience with </span><a href="https://docs.cloud.google.com/application-design-center/docs/setup"><span style="text-decoration: underline; vertical-align: baseline;">app-managed project support</span></a><span style="vertical-align: baseline;">, making Application Design Center an AI-powered engine for your applications on Google Cloud.</span></p> <h3 style="text-align: justify;"><span style="vertical-align: baseline;">Get started today</span></h3> <p style="text-align: justify;"><span style="vertical-align: baseline;">To help you get started, Google provides a growing library of curated Google application templates built by experts. These templates combine multiple Google Cloud products and best practices to serve common use cases, which you can configure for deployment, and view as infrastructure as code in-line. Platform teams can then create and securely share the catalogs and collaborate with teammates on designs and self-service deployment for developers. For enterprises with existing Terraform patterns and assets, Application Design Center interoperates by enabling their import and reuse within its native design and configuration experience.</span></p> <p><span style="vertical-align: baseline;">Ready to experience the power of </span><a href="https://docs.cloud.google.com/application-design-center/docs/setup"><span style="text-decoration: underline; vertical-align: baseline;">Application Design Center</span></a><span style="vertical-align: baseline;">? </span><span style="vertical-align: baseline;">You can learn more about ADC and get started building in minutes using the </span><a href="https://docs.cloud.google.com/application-design-center/docs/quickstart-create-template"><span style="text-decoration: underline; vertical-align: baseline;">quickstart</span></a><span style="vertical-align: baseline;">. </span><span style="vertical-align: baseline;">You can start building your first AI-powered application template in minutes, </span><a href="https://cloud.google.com/products/application-design-center/pricing"><span style="text-decoration: underline; vertical-align: baseline;">free of cost</span></a><span style="vertical-align: baseline;">, and quickly deploy applications with working code. For deeper insights, explore the comprehensive public documentation </span><a href="https://docs.cloud.google.com/application-design-center/docs/overview"><span style="text-decoration: underline; vertical-align: baseline;">here</span></a><span style="vertical-align: baseline;">. We can't wait to see how you innovate with the Application Design Center!</span></p></div>
  53. Senior Product Manager

    Wed, 03 Dec 2025 23:00:00 -0000

    <div class="block-paragraph_advanced"><p><strong style="font-style: italic; vertical-align: baseline;">Editor's note</strong><span style="font-style: italic; vertical-align: baseline;">: This blog was updated on Dec. 4, 5, 7, and 12, 2025, with additional guidance on Cloud Armor WAF rule syntax, and WAF enforcement across App Engine Standard, Cloud Functions, and Cloud Run.</span></p> <p><span style="vertical-align: baseline;">Earlier today, Meta and Vercel publicly disclosed two vulnerabilities that expose services built using the popular open-source frameworks </span><strong style="vertical-align: baseline;">React</strong><span style="vertical-align: baseline;"> </span><strong style="vertical-align: baseline;">Server Components</strong><span style="vertical-align: baseline;"> (</span><a href="https://www.cve.org/CVERecord?id=CVE-2025-55182" rel="noopener" target="_blank"><strong style="text-decoration: underline; vertical-align: baseline;">CVE-2025-55182</strong></a><span style="vertical-align: baseline;">) and </span><strong style="vertical-align: baseline;">Next.js </strong><span style="vertical-align: baseline;">to remote code execution risks when used for some server-side use cases. At Google Cloud, we understand the severity of these vulnerabilities, also known as </span><a href="https://react2shell.com/" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">React2Shell</span></a><span style="vertical-align: baseline;">, and our security teams have shared their recommendations to help our customers take immediate, decisive action to secure their applications.</span></p> <h3><span style="vertical-align: baseline;">Vulnerability background</span></h3> <p><span style="vertical-align: baseline;">The </span><strong style="vertical-align: baseline;">React Server Components framework</strong><span style="vertical-align: baseline;"> is commonly used for building user interfaces. On Dec. 3, 2025, </span><a href="http://cve.org" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">CVE.org</span></a><span style="vertical-align: baseline;"> assigned this vulnerability as </span><a href="https://www.cve.org/CVERecord?id=CVE-2025-55182" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">CVE-2025-55182</span></a><span style="vertical-align: baseline;">. The official Common Vulnerability Scoring System (CVSS) base severity score has been determined as Critical, a severity of 10.0. </span></p> <ul> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">Vulnerable versions</strong><span style="vertical-align: baseline;">: React 19.0, 19.1.0, 19.1.1, and 19.2.0</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">Patched</strong><span style="vertical-align: baseline;"> in React 19.2.1</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">Fix</strong><span style="vertical-align: baseline;">: </span><a href="https://github.com/facebook/react/commit/7dc903cd29dac55efb4424853fd0442fef3a8700" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">https://github.com/facebook/react/commit/7dc903cd29dac55efb4424853fd0442fef3a8700</span></a></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">Announcement</strong><span style="vertical-align: baseline;">: </span><a href="https://react.dev/blog/2025/12/03/critical-security-vulnerability-in-react-server-components" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">https://react.dev/blog/2025/12/03/critical-security-vulnerability-in-react-server-components</span></a></p> </li> </ul> <p><span style="vertical-align: baseline;">Next.js is a web development framework that depends on React, and is also commonly used for building user interfaces. (The Next.js vulnerability was referenced as </span><a href="https://www.cve.org/CVERecord?id=CVE-2025-66478" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">CVE-2025-66478</span></a><span style="vertical-align: baseline;"> before being marked as a duplicate.)</span></p> <ul> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">Vulnerable versions</strong><span style="vertical-align: baseline;">: Next.js 15.x, Next.js 16.x, Next.js 14.3.0-canary.77 and later canary releases</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">Patched</strong><span style="vertical-align: baseline;"> versions are listed </span><a href="https://nextjs.org/blog/CVE-2025-66478#required-action" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">here</span></a><span style="vertical-align: baseline;">.</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">Fix</strong><span style="vertical-align: baseline;">: </span><a href="https://github.com/vercel/next.js/commit/6ef90ef49fd32171150b6f81d14708aa54cd07b2" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">https://github.com/vercel/next.js/commit/6ef90ef49fd32171150b6f81d14708aa54cd07b2</span></a></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">Announcement</strong><span style="vertical-align: baseline;">: </span><a href="https://nextjs.org/blog/CVE-2025-66478" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">https://nextjs.org/blog/CVE-2025-66478</span></a></p> </li> </ul> <p><span style="vertical-align: baseline;">Google Threat Intelligence Group (GTIG) has also published a new report to help understand the </span><a href="https://cloud.google.com/blog/topics/threat-intelligence/threat-actors-exploit-react2shell-cve-2025-55182"><span style="text-decoration: underline; vertical-align: baseline;">specific threats exploiting React2Shell</span></a><span style="vertical-align: baseline;">.</span></p> <p><span style="vertical-align: baseline;">We strongly encourage organizations who manage environments relying on the React and Next.js frameworks to update to the latest version, and take the mitigation actions outlined below.</span></p> <h3><span style="vertical-align: baseline;">Mitigating CVE-2025-55182</span></h3> <p><span style="vertical-align: baseline;">We have created and rolled out a new </span><strong style="vertical-align: baseline;">Cloud Armor web application firewall (WAF) rule</strong><span style="vertical-align: baseline;"> designed to detect and block exploitation attempts related to CVE-2025-55182. This new rule is </span><strong style="vertical-align: baseline;">available now</strong><span style="vertical-align: baseline;"> and is intended to help protect your internet-facing applications and services that use global or regional Application Load Balancers. We recommend deploying this rule as a temporary mitigation while your vulnerability management program patches and verifies all vulnerable instances in your environment.</span></p> <p><span style="vertical-align: baseline;">For customers using </span><a href="https://cloud.google.com/appengine/"><strong style="text-decoration: underline; vertical-align: baseline;">App Engine Standard</strong></a><span style="vertical-align: baseline;">, </span><a href="https://cloud.google.com/functions/"><strong style="text-decoration: underline; vertical-align: baseline;">Cloud Functions</strong></a><span style="vertical-align: baseline;">, </span><a href="https://cloud.google.com/run/"><strong style="text-decoration: underline; vertical-align: baseline;">Cloud Run</strong></a><span style="vertical-align: baseline;">, </span><a href="https://firebase.google.com/products/hosting" rel="noopener" target="_blank"><strong style="text-decoration: underline; vertical-align: baseline;">Firebase Hosting</strong></a><span style="vertical-align: baseline;"> or </span><a href="https://firebase.google.com/products/app-hosting" rel="noopener" target="_blank"><strong style="text-decoration: underline; vertical-align: baseline;">Firebase App Hosting</strong></a><span style="vertical-align: baseline;">, we provide an additional layer of defense for serverless workloads by automatically enforcing platform-level WAF rules that can detect and block the most common exploitation attempts related to CVE-2025-55182.</span></p> <p><span style="vertical-align: baseline;">For </span><a href="https://support.projectshield.google/s/article/Protecting-Your-Website-From-Known-Vulnerabilities" rel="noopener" target="_blank"><strong style="text-decoration: underline; vertical-align: baseline;">Project Shield</strong></a><span style="vertical-align: baseline;"> users, we have deployed WAF protections for all sites and no action is necessary to enable these WAF rules. For long-term mitigation, you will need to patch your origin servers as an essential step to eliminate the vulnerability (see additional guidance below).</span></p> <p><span style="vertical-align: baseline;">Cloud Armor and the Application Load Balancer can be used to deliver and protect your applications and services regardless of whether they are deployed on Google Cloud, on-premises, or on another infrastructure provider. If you are not yet using Cloud Armor and the Application Load Balancer, please follow the guidance further down to get started.</span></p> <p><span style="vertical-align: baseline;">While these platform-level rules and the optional Cloud Armor WAF rules (for services behind an Application Load Balancer) help mitigate the risk from exploits of the CVE, we continue to strongly recommend updating your application dependencies as the primary long-term mitigation.</span></p> <h3><span style="vertical-align: baseline;">Deploying the cve-canary WAF rule for Cloud Armor</span></h3> <p><span style="vertical-align: baseline;">To configure Cloud Armor to detect and protect from CVE-2025-55182, you can use the </span><a href="https://docs.cloud.google.com/armor/docs/waf-rules#cves_and_other_vulnerabilities"><code style="text-decoration: underline; vertical-align: baseline;">cve-canary</code><span style="text-decoration: underline; vertical-align: baseline;"> preconfigured WAF rule</span></a><span style="vertical-align: baseline;"> leveraging the new ruleID that we have added for this vulnerability. This rule is opt-in only, and must be added to your policy even if you are already using the cve-canary rules.</span></p> <p><span style="vertical-align: baseline;">In your Cloud Armor backend security policy, create a new rule and configure the following match condition:</span></p></div> <div class="block-code"><dl> <dt>code_block</dt> <dd>&lt;ListValue: [StructValue([(&#x27;code&#x27;, &quot;(has(request.headers[&#x27;next-action&#x27;]) || has(request.headers[&#x27;rsc-action-id&#x27;]) || request.headers[&#x27;content-type&#x27;].contains(&#x27;multipart/form-data&#x27;) || request.headers[&#x27;content-type&#x27;].contains(&#x27;application/x-www-form-urlencoded&#x27;)) &amp;&amp; evaluatePreconfiguredWaf(&#x27;cve-canary&#x27;,{&#x27;sensitivity&#x27;: 0, &#x27;opt_in_rule_ids&#x27;: [&#x27;google-mrs-v202512-id000001-rce&#x27;,&#x27;google-mrs-v202512-id000002-rce&#x27;]})&quot;), (&#x27;language&#x27;, &#x27;&#x27;), (&#x27;caption&#x27;, &lt;wagtail.rich_text.RichText object at 0x7f701b295b20&gt;)])]&gt;</dd> </dl></div> <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">This can be accomplished from the Google Cloud console by navigating to Cloud Armor and modifying an existing or creating a new policy.</span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--medium h-c-grid__col h-c-grid__col--4 h-c-grid__col--offset-4 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/20251205_11am_rule_1.max-1000x1000.png" alt="20251205_11am_rule (1)"> </a> <figcaption class="article-image__caption "><p data-block-key="5admg">Cloud Armor rule creation in the Google Cloud console.</p></figcaption> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><p>Alternatively, the gcloud CLI can be used to create or modify a policy with the requisite rule:</p></div> <div class="block-code"><dl> <dt>code_block</dt> <dd>&lt;ListValue: [StructValue([(&#x27;code&#x27;, &#x27;gcloud compute security-policies rules create PRIORITY_NUMBER \\\r\n --security-policy SECURITY_POLICY_NAME \\\r\n --expression &quot;(has(request.headers[\&#x27;next-action\&#x27;]) || has(request.headers[\&#x27;rsc-action-id\&#x27;]) || request.headers[\&#x27;content-type\&#x27;].contains(\&#x27;multipart/form-data\&#x27;) || request.headers[\&#x27;content-type\&#x27;].contains(\&#x27;application/x-www-form-urlencoded\&#x27;)) &amp;&amp; evaluatePreconfiguredWaf(\&#x27;cve-canary\&#x27;,{\&#x27;sensitivity\&#x27;: 0, \&#x27;opt_in_rule_ids\&#x27;: [\&#x27;google-mrs-v202512-id000001-rce\&#x27;,\&#x27;google-mrs-v202512-id000002-rce\&#x27;]})&quot; \\\r\n --action=deny-403&#x27;), (&#x27;language&#x27;, &#x27;&#x27;), (&#x27;caption&#x27;, &lt;wagtail.rich_text.RichText object at 0x7f701b2e95b0&gt;)])]&gt;</dd> </dl></div> <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">Additionally, if you are managing your rules with Terraform, you may implement the rule via the following syntax:</span></p></div> <div class="block-code"><dl> <dt>code_block</dt> <dd>&lt;ListValue: [StructValue([(&#x27;code&#x27;, &#x27;rule {\r\n action = &quot;deny(403)&quot;\r\n priority = &quot;PRIORITY_NUMBER&quot;\r\n match {\r\n expr {\r\n expression = &quot;(has(request.headers[\&#x27;next-action\&#x27;]) || has(request.headers[\&#x27;rsc-action-id\&#x27;]) || request.headers[\&#x27;content-type\&#x27;].contains(\&#x27;multipart/form-data\&#x27;) || request.headers[\&#x27;content-type\&#x27;].contains(\&#x27;application/x-www-form-urlencoded\&#x27;)) &amp;&amp; evaluatePreconfiguredWaf(\&#x27;cve-canary\&#x27;,{\&#x27;sensitivity\&#x27;: 0, \&#x27;opt_in_rule_ids\&#x27;: [\&#x27;google-mrs-v202512-id000001-rce\&#x27;,\&#x27;google-mrs-v202512-id000002-rce\&#x27;]})&quot;\r\n }\r\n }\r\n description = &quot;Applies protection for CVE-2025-55182 (React/Next.JS)&quot;\r\n }&#x27;), (&#x27;language&#x27;, &#x27;&#x27;), (&#x27;caption&#x27;, &lt;wagtail.rich_text.RichText object at 0x7f701b2e9a00&gt;)])]&gt;</dd> </dl></div> <div class="block-paragraph_advanced"><h3><span style="vertical-align: baseline;">Verifying WAF rule safety for your application and consuming telemetry</span></h3> <p><span style="vertical-align: baseline;">Cloud Armor rules can be </span><a href="https://docs.cloud.google.com/armor/docs/security-policy-overview#preview_mode"><span style="text-decoration: underline; vertical-align: baseline;">configured in preview mode</span></a><span style="vertical-align: baseline;">, a logging-only mode to test or monitor the expected impact of the rule without Cloud Armor enforcing the configured action. We recommend that the new rule described above first be deployed in preview mode in your production environments so that you can see what traffic it would block. </span></p> <p><span style="vertical-align: baseline;">Once you verify that the new rule is behaving as desired in your environment, then you can disable preview mode to allow Cloud Armor to actively enforce it.</span></p> <p><span style="vertical-align: baseline;">Cloud Armor per-request WAF logs are emitted as part of the Application Load Balancer logs to Cloud Logging. To see what Cloud Armor’s decision was on every request, load balancer logging first </span><a href="https://docs.cloud.google.com/load-balancing/docs/https/https-logging-monitoring"><span style="text-decoration: underline; vertical-align: baseline;">needs to be enabled on a per backend service basis</span></a><span style="vertical-align: baseline;">. Once it is enabled, all subsequent Cloud Armor decisions will be logged and can be found in Cloud Logging by </span><a href="https://docs.cloud.google.com/armor/docs/request-logging"><span style="text-decoration: underline; vertical-align: baseline;">following these instructions</span></a><span style="vertical-align: baseline;">.</span></p> <h3><span style="vertical-align: baseline;">Interaction of Cloud Armor rules with </span><span style="vertical-align: baseline;">vulnerability</span><span style="vertical-align: baseline;"> scanning tools</span></h3> <p><span style="vertical-align: baseline;">There has been a proliferation of scanning tools designed to help identify vulnerable instances of React and Next.js in your environments. Many of those scanners are designed to identify the version number of relevant frameworks in your servers and do so by crafting a </span><span style="vertical-align: baseline;">legitimate</span><span style="vertical-align: baseline;"> query and inspecting the response from the server to detect the version of React and </span><span style="vertical-align: baseline;">Next.js</span><span style="vertical-align: baseline;"> that is running. </span></p> <p><span style="vertical-align: baseline;">Our WAF rule is designed to detect and prevent exploit attempts of </span><span style="vertical-align: baseline;">CVE-2025-55182</span><span style="vertical-align: baseline;">. As the scanners discussed above are not attempting an exploit, but sending a safe query to </span><span style="vertical-align: baseline;">elicit</span><span style="vertical-align: baseline;"> a response revealing indications of the version of the software, </span><strong style="vertical-align: baseline;">the above Cloud Armor rule will not detect or block such scanners. </strong></p> <p><span style="vertical-align: baseline;">If the findings of these scanners indicate a vulnerable instance of software protected by Cloud Armor, that does not mean that an actual exploit attempt of the vulnerability will successfully get through your Cloud Armor security policy. Instead, such findings mean that the version React or Next.js detected is known to be vulnerable and should be patched.</span></p> <h3><span style="vertical-align: baseline;">How to get started with Cloud Armor for new users</span></h3> <p><span style="vertical-align: baseline;">If your workload is already using an Application Load Balancer to receive traffic from the internet, you can configure Cloud Armor to protect your workload from this and other application-level vulnerabilities (as well as DDoS attacks) by following </span><a href="https://docs.cloud.google.com/armor/docs/configure-security-policies"><span style="text-decoration: underline; vertical-align: baseline;">these instructions</span></a><span style="vertical-align: baseline;">. </span></p> <p><span style="vertical-align: baseline;">If you are not yet using an Application Load Balancer and Cloud Armor, you can get started with the </span><a href="https://docs.cloud.google.com/load-balancing/docs/https"><span style="text-decoration: underline; vertical-align: baseline;">external Application Load Balancer overview</span></a><span style="vertical-align: baseline;">, the </span><a href="https://docs.cloud.google.com/armor/docs/security-policy-overview"><span style="text-decoration: underline; vertical-align: baseline;">Cloud Armor overview</span></a><span style="vertical-align: baseline;">, and the </span><a href="https://docs.cloud.google.com/armor/docs/best-practices"><span style="text-decoration: underline; vertical-align: baseline;">Cloud Armor best practices</span></a><span style="vertical-align: baseline;">.</span></p> <p><span style="vertical-align: baseline;">If your workload is using </span><a href="http://docs.cloud.google.com/run/"><span style="text-decoration: underline; vertical-align: baseline;">Cloud Run</span></a><span style="vertical-align: baseline;">, </span><a href="https://cloud.google.com/functions"><span style="text-decoration: underline; vertical-align: baseline;">Cloud Run functions</span></a><span style="vertical-align: baseline;">, or </span><a href="https://cloud.google.com/appengine"><span style="text-decoration: underline; vertical-align: baseline;">App Engine</span></a><span style="vertical-align: baseline;"> and receives traffic from the internet, you must first </span><a href="https://docs.cloud.google.com/load-balancing/docs/https/setup-global-ext-https-serverless"><span style="text-decoration: underline; vertical-align: baseline;">set up an Application Load Balancer in front of your endpoint</span></a><span style="vertical-align: baseline;"> to leverage Cloud Armor security policies to protect your workload. You will then need to </span><a href="https://docs.cloud.google.com/armor/docs/integrating-cloud-armor#serverless"><span style="text-decoration: underline; vertical-align: baseline;">configure the appropriate controls</span></a><span style="vertical-align: baseline;"> to ensure that Cloud Armor and the Application Load Balancer can’t be bypassed.</span></p> <h3><span style="vertical-align: baseline;">Best practices and additional risk mitigations</span></h3> <p><span style="vertical-align: baseline;">Once you configure Cloud Armor, we recommend consulting our </span><a href="https://docs.cloud.google.com/armor/docs/best-practices"><span style="text-decoration: underline; vertical-align: baseline;">best practices guide</span></a><span style="vertical-align: baseline;">. Be sure to account for </span><a href="https://docs.cloud.google.com/armor/docs/security-policy-overview#limitations"><span style="text-decoration: underline; vertical-align: baseline;">limitations</span></a><span style="vertical-align: baseline;"> </span><span style="vertical-align: baseline;">discussed in the documentation to minimize risk and optimize performance while ensuring the safety and availability of your workloads. </span></p> <h3><span style="vertical-align: baseline;">Serverless platform protections</span></h3> <p><span style="vertical-align: baseline;">Google Cloud is enforcing platform-level protections across App Engine Standard, Cloud Functions, and Cloud Run to automatically help protect against common exploit attempts of CVE-2025-55182. This protection supplements the protections already in place for Firebase Hosting and Firebase App Hosting.</span></p> <p><strong style="vertical-align: baseline;">What this means for you:</strong></p> <ul> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">Applications deployed to those serverless services benefit from these WAF rules that are enabled by default to help provide a base level of protection without requiring manual configuration.</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">These rules are designed to block known malicious payloads targeting this vulnerability.</span></p> </li> </ul> <p><strong style="vertical-align: baseline;">Important considerations:</strong></p> <ul> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">Patching is still critical:</strong><span style="vertical-align: baseline;"> These platform-level defenses are intended to be a temporary mitigation. The most effective long-term solution is to update your application's dependencies to non-vulnerable versions of React and Next.js, and redeploy them.</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">Potential impacts:</strong><span style="vertical-align: baseline;"> While unlikely, if you believe this platform-level filtering is incorrectly impacting your application's traffic, please contact </span><a href="https://support.google.com/cloud/answer/6282346" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">Google Cloud Support</span></a><span style="vertical-align: baseline;"> and reference issue number 465748820.</span></p> </li> </ul> <h3><span style="vertical-align: baseline;">Long-term mitigation: Mandatory framework update and redeployment</span></h3> <p><span style="vertical-align: baseline;">While WAF rules provide critical frontline defense, the most comprehensive long-term solution is to patch the underlying frameworks.</span></p> <p><strong style="vertical-align: baseline;">While Google Cloud is providing platform-level protections and Cloud Armor options, we urge all customers running React and Next.js applications on Google Cloud to immediately update their dependencies to the latest stable versions (React 19.2.1 or the relevant version of Next.js listed </strong><a href="https://nextjs.org/blog/CVE-2025-66478#required-action" rel="noopener" target="_blank"><strong style="text-decoration: underline; vertical-align: baseline;">here</strong></a><strong style="vertical-align: baseline;">), and redeploy their services.</strong></p> <p><span style="vertical-align: baseline;">This applies specifically to applications deployed on:</span></p> <ul> <li role="presentation"><strong style="vertical-align: baseline;">Cloud Run, Cloud Run functions, or App Engine</strong><span style="vertical-align: baseline;">: Update your application dependencies with the updated framework versions and redeploy.</span></li> <li role="presentation"><strong style="vertical-align: baseline;">Google Kubernetes Engine (GKE)</strong><span style="vertical-align: baseline;">: Update your container images with the latest framework versions and redeploy your pods.</span></li> <li role="presentation"><strong style="vertical-align: baseline;">Compute Engine</strong><span style="vertical-align: baseline;">:</span><strong style="vertical-align: baseline;"> </strong><span style="vertical-align: baseline;">The public OS images provided by Google Cloud do not have React or Next.js packages installed by default. If you have installed a custom OS with the affected packages, update your workloads to include the latest framework versions and enable WAF rules in front of all workloads.</span></li> <li role="presentation"><strong style="vertical-align: baseline;">Firebase</strong><span style="vertical-align: baseline;">:</span><strong style="vertical-align: baseline;"> </strong><span style="vertical-align: baseline;">If you’re using Cloud Functions for Firebase, Firebase Hosting, or Firebase App Hosting, update your application dependencies with the updated framework versions and redeploy. Firebase Hosting and App Hosting are also automatically enforcing a rule to limit exploitation of CVE-2025-55182 through requests to custom and default domains.</span></li> </ul> <p><span style="vertical-align: baseline;">Patching your applications is an essential step to eliminate the vulnerability at its source and ensure the continued integrity and security of your services.</span></p> <p><span style="vertical-align: baseline;">We will continue to monitor the situation closely and provide further updates and guidance as necessary. Please refer to our official </span><a href="https://docs.cloud.google.com/support/bulletins#gcp-2025-072"><span style="text-decoration: underline; vertical-align: baseline;">Google Cloud Security advisories</span></a><span style="vertical-align: baseline;"> for the most current information and detailed steps.</span></p> <p><span style="vertical-align: baseline;">If you have any questions or require assistance, please contact </span><a href="https://support.google.com/cloud/answer/6282346" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">Google Cloud Support</span></a><span style="vertical-align: baseline;"> and reference issue number 465748820.</span></p></div>
  54. Key Enterprise Architect

    Mon, 13 Oct 2025 16:00:00 -0000

    <div class="block-paragraph"><p data-block-key="6kd7s">As engineers, we all dream of perfectly resilient systems — ones that scale perfectly, provide a great user experience, and never ever go down. What if we told you the key to building these kinds of resilient systems isn't avoiding failures, but deliberately causing them? Welcome to the world of chaos engineering, where you stress test your systems by <i>introducing</i> chaos, i.e., failures, into a system under a controlled environment. In an era where downtime can cost millions and destroy reputations in minutes, the most innovative companies aren't just waiting for disasters to happen — they're causing them and learning from the resulting failures, so they can build immunity to chaos before it strikes in production.</p><p data-block-key="396qd">Chaos engineering is useful for all kinds of systems, but particularly for cloud-based distributed ones. Modern architectures have evolved from monolithic to microservices-based systems, often comprising hundreds or thousands of services. These complex service dependencies introduce multiple points of failure, and it’s difficult if not impossible to predict all the possible failure modes through traditional testing methods. When these applications are deployed on the cloud, they are deployed across multiple availability zones and regions. This increases the likelihood of failure due to the highly distributed nature of cloud environments and the large number of services that coexist within them.</p><p data-block-key="93kcq">A common misconception is that cloud environments automatically provide application resiliency, eliminating the need for testing. Although cloud providers do offer various levels of resiliency and SLAs for their cloud products, these alone do not guarantee that your business applications are protected. If applications are not designed to be fault-tolerant or if they assume constant availability of cloud services, they will fail when a particular cloud service they depend on is not available.</p><p data-block-key="62d5j">In short, chaos engineering can take a team's worst "what if?" scenarios and transform them into well-rehearsed responses. Chaos engineering isn’t about breaking systems — engineering chaotically, as it were — it's about building teams that face production incidents with the calm confidence that only comes from having weathered that chaos before, albeit in controlled conditions.</p><p data-block-key="aipko">Google Cloud’s Professional Service Organization (PSO) Enterprise Architecture team consults on and provides hands-on expertise on customers’ cloud transformation journeys, including application development, cloud migrations, and enterprise architecture. And when advising on designing resilient architecture for cloud environments, we routinely introduce the principles and practices of chaos engineering and Site Reliability Engineering (SRE) practices.</p><p data-block-key="6ro3d">In this first blog post in a series, we explain the basics of chaos engineering — what it is and its core principles and elements. We then explore how chaos engineering is particularly helpful and important for teams running distributed applications in the cloud. Finally, we’ll talk about how to get started, and point you to further resources.</p><h2 data-block-key="pqp"><b>Understanding chaos engineering</b></h2><p data-block-key="fun25">Chaos engineering is a methodology invented by Netflix in 2010 when it created and popularized ‘Chaos Monkey’ to address the need to build more resilient and reliable systems in the face of increasing complexity in their AWS environment. Around the same time, Google introduced Disaster Resilience Testing, or DiRT, which enabled continuous and automated disaster readiness, response, and recovery of Google’s business, systems, and data. Here on Google Cloud’s PSO team, we offer various services to help customers implement DiRT as part of SRE practices. These offerings also include training on how to perform DiRT on applications and systems operating on Google Cloud. The central concept is straightforward: deliberately introduce controlled disruptions into a system to identify vulnerabilities, evaluate its resilience, and enhance its overall reliability.</p><p data-block-key="6t531">As a proactive discipline, chaos engineering enables organizations to identify weaknesses in their systems before they lead to significant outages or failures, where a system includes not only the technology components but also the people and processes of an organization. By introducing controlled, real-world disruptions, chaos engineering helps test a system's robustness, recoverability, and fault tolerance. This approach allows teams to uncover potential vulnerabilities, so that systems are better equipped to handle unexpected events and continue functioning smoothly under stress.</p><h3 data-block-key="59nsr"><b>Principles and practices of chaos engineering</b></h3><p data-block-key="df1o7">Chaos engineering is guided by a set of core principles about why it should be done, while practices define what needs to be done.</p><p data-block-key="8ao4o">Below are the principles of chaos engineering:</p><ol><li data-block-key="ftol1"><b>Build a hypothesis around steady state</b>: Prior to initiating any disruptive actions, you need to define what "normal" looks like for your system, commonly referred to as the "steady state hypothesis."</li><li data-block-key="6vvb8"><b>Replicate real-world conditions</b>: Chaos experiments should emulate realistic failure scenarios that the system might encounter in a production environment.</li><li data-block-key="decbe"><b>Run experiments in production</b>: Chaos engineering is firmly rooted in the belief that only a production environment with real traffic and dependencies can provide an accurate picture of resiliency. This is what separates chaos engineering from traditional testing.</li><li data-block-key="3de29"><b>Automate experiments:</b> Make resiliency testing part of a continuous ongoing process rather than a one-off test.</li><li data-block-key="am2bk"><b>Determine the blast radius</b>: Experiments should be meticulously designed to minimize adverse impacts on production systems. This requires categorizing applications and services in different tiers based on the impact the experiments can have on customers and other applications and services.</li></ol><p data-block-key="hldj">With these principles established, follow these practices when conducting a chaos engineering experiment:</p><ol><li data-block-key="1bkn"><b>Define steady state:</b> Identifies the specific metrics (e.g., latency, throughput) that you will look at and establish a baseline for them.</li><li data-block-key="c86r7"><b>Formulate a hypothesis</b>: This is the practice of creating a single testable statement, for example, ‘By deleting this container pod, user login will not be affected’. Hypotheses are generally created by identifying customer user journeys and deriving test scenarios from them.</li><li data-block-key="39bql"><b>Use a controlled environment:</b> While one chaos engineering principle states that experiments need to run in production, you should still start small and run your experiment in a non-production environment first, learn and adjust, and then gradually expand the scope to production environment.</li><li data-block-key="gtlb"><b>Inject failures</b>: This is the practice of causing disruption by injecting failures either directly into the system (e.g., deleting a VM, stopping a database instance) or indirectly by injecting failures in the environment (e.g. deleting a network route, adding a firewall rule).</li><li data-block-key="1410c"><b>Automate experimental execution</b>: Automation is crucial for establishing chaos engineering as a repeatable and scalable practice. This includes using automated tools for fault injection (e.g., making it part of a CI/CD pipeline) and automated rollback mechanisms.</li><li data-block-key="58mg2"><b>Derive actionable insights</b>: The primary objective of using chaos engineering is to gain insights into system vulnerabilities, thereby enhancing resilience. This involves rigorous analysis of experimental results; identifying weaknesses and areas for improvement; and disseminating findings to relevant teams to inform subsequent experimental design and system enhancements.</li></ol><p data-block-key="fh7in">In other words, chaos engineering isn't about breaking things for the sake of it, but about building more resilient systems by understanding their limitations and addressing them proactively.</p><h3 data-block-key="ftslk"><b>Elements of chaos engineering</b></h3><p data-block-key="evq8f">Here are the core elements you'll use in a chaos engineering experiment, derived from these five principles:</p><ul><li data-block-key="2isvq"><b>Experiments</b>: A chaos experiment constitutes a deliberate, pre-planned procedure wherein faults are introduced into a system to ascertain its response.</li><li data-block-key="d6djm"><b>Steady-state hypotheses</b>: A steady-state hypothesis defines the baseline operational state, or "normal" behavior, of the system under evaluation.</li><li data-block-key="3d8o5"><b>Actions</b>: An action represents a specific operation executed upon the system being experimented on.</li><li data-block-key="bpbv8"><b>Probes</b>: A probe provides a mechanism for observing defined conditions within the system during experimentation.</li><li data-block-key="f50fb"><b>Rollbacks</b>: An experiment may incorporate a sequence of actions designed to reverse any modifications implemented during the experiment.</li></ul><h2 data-block-key="327mk"><b>Getting started with chaos engineering</b></h2><p data-block-key="123gj">Now that you have a good understanding of chaos engineering and why to use it in your cloud environment, the next step is to try it out for yourself in your own development environment.</p><p data-block-key="6i4s2">There are multiple chaos engineering solutions in the market; some are paid products and some are open-source frameworks. To get started quickly, we recommend that you use <a href="https://chaostoolkit.org/" target="_blank">Chaos Toolkit</a> as your chaos engineering framework.</p><p data-block-key="atl4d">Chaos Toolkit is an open-source framework written in Python that provides a modular architecture where you can plug in other libraries (also known as ‘drivers’) to extend your chaos engineering experiments. For example, there are extension libraries for <a href="https://chaostoolkit.org/drivers/gcp/" target="_blank">Google Cloud</a>, <a href="https://chaostoolkit.org/drivers/kubernetes/" target="_blank">Kubernetes</a>, and many other technologies. Since Chaos Toolkit is a Python-based developer tool, you can begin by configuring your Python environment. You can find a good example of a Chaos Toolkit experiment and step-by-step explanation <a href="https://chaostoolkit.org/reference/tutorial/#getting-started-with-the-chaos-toolkit" target="_blank">here</a>.</p><p data-block-key="r2pl">Finally, to enable Google Cloud customers and engineers to introduce chaos testing in their applications, we’ve created a series of Google Cloud-specific chaos engineering recipes. Each recipe covers a specific scenario to introduce chaos in a particular Google Cloud service. For example, one recipe covers introducing chaos in an application/service running behind a Google Cloud internal or external application load balancer; another recipe covers simulating a network outage between an application running on Cloud Run and connecting to a Cloud SQL database by leveraging another Chaos Toolkit extension named <a href="https://chaostoolkit.org/drivers/toxiproxy/" target="_blank">ToxiProxy</a>.</p><p data-block-key="7bkoj">You can find a complete collection of recipes, including step-by-step instructions, scripts, and sample code, to learn how to introduce chaos engineering in your Google Cloud environment on <a href="https://github.com/GoogleCloudPlatform/chaos-engineering/blob/main/Chaos-Engineering-Recipes-Book.md" target="_blank">GitHub</a>. Then, stay tuned for subsequent posts, where we’ll talk about chaos engineering techniques, such as how to introduce faults into your Google Cloud environment.</p></div>
  55. Researcher

    Tue, 23 Sep 2025 14:00:00 -0000

    <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">Today, we are excited to announce the </span><a href="http://cloud.google.com/dora"><span style="text-decoration: underline; vertical-align: baseline;">2025 DORA Report: State of AI-assisted Software Development</span></a><span style="vertical-align: baseline;">. Drawing on insights from over 100 hours of qualitative data and survey responses from nearly 5,000 technology professionals from around the world. </span></p> <p><span style="vertical-align: baseline;">The report reveals a key insight: AI doesn't fix a team; it amplifies what's already there. Strong teams use AI to become even better and more efficient. Struggling teams will find that AI only highlights and intensifies their existing problems. The greatest return comes not from the AI tools themselves, but from a strategic focus on the quality of internal platforms, the clarity of workflows, and the alignment of teams.</span></p> <h3><strong style="vertical-align: baseline;">AI, the great amplifier</strong></h3> <p><span style="vertical-align: baseline;">As we established from the </span><a href="https://dora.dev/research/2024/" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">2024 report</span></a><span style="vertical-align: baseline;"> as well as the special report published this year called </span><a href="https://dora.dev/research/ai/" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">“Impact of Generative AI in Software Development”</span></a><span style="vertical-align: baseline;">, organizations are continuing to heavily adopt AI and receive substantial benefits across important outcomes. And there is evidence of learning to better integrate these tools into our workflow. Unlike last year, we observe a positive relationship between AI adoption on both software delivery throughput and product performance. It appears that people, teams, and tools are learning where, when, and how AI is most useful. However, AI adoption does continue to have a negative relationship with software delivery stability.</span></p> <p><span style="vertical-align: baseline;">This confirms our central theory - AI accelerates software development, but that acceleration can expose weaknesses downstream. Without robust control systems, like strong automated testing, mature version control practices, and fast feedback loops, an increase in change volume leads to instability. Teams working in loosely coupled architectures with fast feedback loops see gains, while those constrained by tightly coupled systems and slow processes see little or no benefit.</span></p> <p><strong style="vertical-align: baseline;">Key findings from the 2025 report</strong></p> <p><span style="vertical-align: baseline;">Beyond this central theme, this year’s research highlighted the following about modern software development:</span></p> <ul> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">AI adoption is near-universal</strong><span style="vertical-align: baseline;">: 90% of survey respondents report using AI at work. More than 80% believe it has increased their productivity. However, skepticism remains as 30% report little or no trust in the code generated by AI, a slightly lower percentage than last year but a key trend to note.</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">User-centricity is a prerequisite for AI success</strong><span style="vertical-align: baseline;">: AI becomes most useful when it's pointed at a clear problem, and a user-centric focus provides that essential direction. Our data shows this focus amplifies AI’s positive influence on team performance.</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">Platform engineering is the foundation</strong><span style="vertical-align: baseline;">: Our data shows that 90% of organizations have adopted at least one platform and there is a direct correlation between a high quality internal platform and an organization’s ability to unlock the value of AI, making it an essential foundation for success.</span></p> </li> </ul> <h3><strong style="vertical-align: baseline;">The seven team archetypes</strong></h3> <p><span style="vertical-align: baseline;">Simple software delivery metrics alone aren’t sufficient. They tell you what is happening but not why it’s happening. To connect performance data to experience, we conducted a cluster analysis that reveals seven common team profiles or archetypes, each with a unique interplay of performance, stability, and well-being. This model provides leaders with a way to diagnose team health and apply the right interventions. </span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/2_YtpOb3P.max-1000x1000.jpg" alt="2"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">The ‘Foundational challenges’ group are trapped in survival mode and face significant gaps in their processes and environment, leading to low performance, high system stability, and high levels of burnout and friction. While the ‘Harmonious high achievers’ excel across multiple areas, showing positive metrics for team well-being, product outcomes, and software delivery. </span></p> <p><span style="vertical-align: baseline;">Read more details of each archetype in the "Understanding your software delivery performance: A look at seven team profiles" chapter of the report.</span></p> <h3><strong style="vertical-align: baseline;">Unlocking the value of AI with the ‘DORA AI Capabilities Model’</strong></h3> <p><span style="vertical-align: baseline;">This year, we went beyond identifying AI’s impact to investigating the conditions in which AI-assisted technology-professionals  realize the best outcomes. The value of AI is unlocked not by the tools themselves, but by the surrounding technical practices and cultural environment.</span></p> <p><span style="vertical-align: baseline;">Our research identified seven capabilities that are shown to magnify the positive impact of AI in organizations.</span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/DORA_inline_2.max-1000x1000.png" alt="image1"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><h3><strong style="vertical-align: baseline;">Where leaders should get started</strong></h3> <p><span style="vertical-align: baseline;">One of the key insights derived from the research this year is that the value of AI will be unlocked by reimagining the system of work it inhabits. Technology leaders should treat AI adoption as an organizational transformation.</span></p> <p><span style="vertical-align: baseline;">Here’s where we suggest you begin:</span></p> <ul> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">Clarify and socialize your AI policies</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">Connect AI to your internal context</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">Prioritize foundational practices</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">Fortify your safety nets</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">Invest in your internal platform</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">Focus on your end-users</span></p> </li> </ul> <p><span style="vertical-align: baseline;">The </span><a href="https://dora.dev/" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">DORA research program</span></a><span style="vertical-align: baseline;"> is committed to serving as a compass to teams and organizations as we navigate the important and transformative period with AI. We hope the new team profiles and the DORA AI capabilities model provide a clear roadmap for you to move beyond simply adopting AI to unlocking its value by investing in teams and people. We look forward to learning how you put these insights into practice. To learn more:</span></p> <ul> <li role="presentation"><a href="http://cloud.google.com/dora"><span style="text-decoration: underline; vertical-align: baseline;">Download</span></a><span style="vertical-align: baseline;"> the full report</span></li> <li role="presentation"><span style="vertical-align: baseline;">Join the </span><a href="https://dora.community/" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">DORA community</span></a></li> <li><span style="vertical-align: baseline;">Share this </span><a href="https://dora.dev/research/2025/" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">overview</span></a><span style="vertical-align: baseline;"> with your colleagues</span></li> </ul></div>
  56. Cloud Solutions Architect Manager, Google Cloud

    Wed, 13 Aug 2025 16:00:00 -0000

    <div class="block-paragraph"><p data-block-key="bgr19">What guides your approach to software development? In our roles at Google, we’re constantly working to build better software, faster. Within Google, our Developer Platform team and Google Cloud have a strategic partnership and a shared strategy: together, we take our internal capabilities and engineering tools and package them up for Google Cloud customers.</p><p data-block-key="e2l3s">At the heart of this is understanding the many ways that software teams, big and small, need to balance efficiency, quality, and cost, all while delivering value. In our recent <a href="https://www.youtube.com/watch?v=T6a9gPSoqxo" target="_blank">talk at PlatformCon 2025</a>, we shared key parts of our platform strategy, which we call “shift down.”</p><p data-block-key="d6oe8"><b>Shift down is an approach that advocates for embedding decisions and responsibilities into underlying internal developer platforms (IDPs)</b>, thereby reducing the operational burden on developers. This contrasts with the <a href="https://cloud.google.com/devops">DevOps</a> trend of "shift left," which pushes more effort earlier into the development cycle, a method that is proving difficult at scale due to the sheer volume and rate of change in requirements. Our shift down strategy helps us maximize value with existing resources so businesses can achieve high innovation velocity with acceptable quality, acceptable risk, and sustainable costs across a diverse range of business models. In the talk, we share learnings that have been really helpful to us in our software and <a href="https://cloud.google.com/solutions/platform-engineering">platform engineering</a> journey:</p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/image1_98vVMdt.max-1000x1000.jpg" alt="image1"> </a> </figure> </div> </div> </div> <div class="block-aside"><dl> <dt>aside_block</dt> <dd>&lt;ListValue: []&gt;</dd> </dl></div> <div class="block-paragraph"><ol><li data-block-key="bgr19"><b>Work backwards from the business model:</b> By starting with the business model, organizations can intentionally guide platform evolution and investment to align with desired margins, risk tolerance, and quality requirements. At Google, our central platform must support diverse business models, necessitating continuous strategic refinement and adaptation.</li><li data-block-key="fs6ra"><b>Focus on quality attributes for central software control:</b> Quality attributes, such as reliability, security, efficiency, and performance, are <a href="https://en.wikipedia.org/wiki/Emergence" target="_blank">emergent</a> properties of software systems and are important for creating business value and managing risk. These are often referred to as “non-functional requirements” because they define how our software behaves, not what it functionally does. With a shift down strategy, we can embed the responsibility for assuring quality attributes directly into the underlying platform systems and infrastructure, thereby significantly reducing the operational burden on individual developers.</li><li data-block-key="5a5sh"><b>Abstractions and coupling are key technical tools to gain control of quality attributes:</b> We define two key technical components in the way we build platforms: <i>abstractions</i> and <i>coupling</i>. In a shift down strategy, abstractions provide understandability, risk management levers, accountability, and cost control by encapsulating complexity. Coupling refers to the interconnectedness and interdependence of components within a system or development ecosystem. For a successful shift down strategy, the right degree of coupling is crucial because it allows the development platform and ecosystem design to directly implement and influence quality attributes. In fact, coupling is how we offer entire infrastructure and platform solutions as coherent services like <a href="https://cloud.google.com/kubernetes-engine">Google Kubernetes Engine</a> (GKE).</li><li data-block-key="2pktp"><b>Shared responsibility, education, and policy are equally important social tools:</b> Shared responsibility is a crucial social tool within software at scale. This is actively cultivated through education, such as training engineers on platform and AI usage, and fostering a "one team" culture that encourages a shift from artifact-bound identities to overarching mission goals and client-focused engagement. Furthermore, explicit policies like centrally enforced style guides and secure-by-design APIs are fundamental for embedding quality attribute assurance directly into the platform and infrastructure, significantly reducing the operational burden on individual developers by ensuring consistency and automated controls at scale.</li><li data-block-key="bh7kd"><b>Use a map.</b> Supporting many business units with one platform is a vast and complex problem; we need a map. The ecosystem model is a framework that categorizes different types of software development environments, ranging from highly flexible, developer-controlled systems to highly opinionated, vertically integrated ones where the ecosystem itself assures quality attributes. Its critical purpose is to provide a visual and conceptual tool for evaluating how well our ecosystem controls match our business risk. This helps us ensure that the level of oversight and assurance of quality attributes aligns with the potential cost of mistakes. The goal is to be in the "ecosystem effectiveness zone," where controls are balanced to mitigate significant risks from human error without imposing overly restrictive systems that negatively impact velocity and developer satisfaction.</li></ol></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/1_xiA9TUH.max-1000x1000.png" alt="1"> </a> </figure> </div> </div> </div> <div class="block-paragraph"><p data-block-key="bgr19">6. <b>Divide up the problem space by identifying different platform and ecosystem types.</b></p><p data-block-key="dk549">Because the developer experience and platform infrastructure change with scale and degree of shifting down, it’s not enough to just know where the ecosystem effectiveness zone is — you have to identify the ecosystem by type. We differentiate ecosystem types by the degree of oversight and assurance for quality attributes. As an ecosystem becomes more vertically integrated, such as Google's highly optimized "Assured" (Type 4) ecosystem, the platform itself assumes increasing responsibility for vital quality attributes, allowing specialists like site reliability engineers (SRE) and security teams to have full ownership in taking action through large-scale observability and embedded capabilities. Conversely, in less uniform "YOLO," "AdHoc," or "Guided" (Type 0-2) ecosystems, developers have more responsibility for assuring these attributes, while central specialist teams have less direct control and enforcement mechanisms are less pervasive. It’s really important to note here that this is <b>not</b> a maturity model — the best ecosystem and platform type is the one that best fits your business need (see point #1 above!).</p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/2_SQqhW9d.max-1000x1000.png" alt="2"> </a> </figure> </div> </div> </div> <div class="block-paragraph"><h3 data-block-key="bgr19"><b>Intentional choices in platform engineering</b></h3><p data-block-key="2cujr">The most important takeaway is to make active choices. Tailor platform engineering for each business unit and application to achieve the best outcomes. Place critical emphasis on identifying and solving stable sub-problems in reliable, reusable ways across various business problems. This approach directly underpins our "shift down" strategy, moving toward composable platforms that embed decisions and responsibilities for software quality directly into the underlying platform infrastructure, thereby improving our ability to maximize business value with the right resources, at the right quality level, and with sustainable costs.</p><p data-block-key="8q0du"><a href="https://www.youtube.com/watch?v=T6a9gPSoqxo" target="_blank">Watch our full discussion</a> for more insights on effective platform engineering.</p></div>
  57. Product Manager

    Mon, 04 Aug 2025 16:00:00 -0000

    <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">Application owners are looking for three things when they think about optimizing cloud costs:</span></p> <ol> <li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">What are the most expensive resources?</span></p> </li> <li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">Which resources are costing me more this week or month?</span></p> </li> <li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">Which resources are poorly utilized?</span></p> </li> </ol> <p><span style="vertical-align: baseline;">To help you answer these questions quickly and easily, we </span><a href="https://cloud.google.com/blog/products/application-development/an-application-centric-ai-powered-cloud?e=13802955"><span style="text-decoration: underline; vertical-align: baseline;">announced</span></a><span style="vertical-align: baseline;"> Cloud Hub Optimization and Cost Explorer, in private preview, at Google Cloud Next 2025. And today, we are excited to announce that both Cloud Hub Optimization and Cost Explorer are now in public preview.</span></p> <h2><span style="vertical-align: baseline;">Application cost and utilization</span></h2> <p><span style="vertical-align: baseline;">As an app owner, your primary objective is keeping your application healthy at all times. Yet, monitoring all the individual components of your application, which may straddle dozens of Projects, can be quite overwhelming. </span><a href="https://cloud.google.com/products/app-hub"><span style="text-decoration: underline; vertical-align: baseline;">AppHub Applications</span></a><span style="vertical-align: baseline;"> allow you to reorganize cloud around your application, giving you the information and controls you need at your fingertips.</span></p> <p><span style="vertical-align: baseline;">In addition to supporting Google Cloud Projects, Cloud Hub Optimization and Cost Explorer leverage </span><a href="https://cloud.google.com/products/app-hub"><span style="text-decoration: underline; vertical-align: baseline;">App Hub</span></a><span style="vertical-align: baseline;"> applications to show you the cost-efficiency of your application’s workloads and services instantly. This is great for instance when you are trying to pinpoint deployments running on GKE clusters that might be wasting valuable resources, such as GPUs.</span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/1_CHO_utilization_summary_app.max-1000x1000.jpg" alt="1_CHO_utilization summary app"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><h2><span style="vertical-align: baseline;">Not just another cost dashboard</span></h2> <p><span style="vertical-align: baseline;">When you bring up Cloud Hub Optimization, you can immediately see the resources that are costing you the most, along with the percentage change in their cost. With this highly granular cost information, you can now attribute your costs to specific resources and resource owners to reason about any changes in costs.</span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/2_CHO_cost_summary.max-1000x1000.jpg" alt="2_CHO_cost summary"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">We have additionally integrated granular cost data from Cloud Billing and resource utilization data from Cloud Monitoring to give you a comprehensive picture of your cost efficiency. This includes average vCPU utilization for your Project, which helps you find the most promising optimization candidates across hundreds of Google Cloud Projects.</span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/3_CHO_utilization_summary_project.max-1000x1000.jpg" alt="3_CHO_utilization summary project"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">The Cost Explorer dashboard also shows you your costs logically organized at the product level, for even more cost explainability. Instead of seeing a lump sum cost for Compute Engine, you can now see your exact spend on individual products including Google Kubernetes Engine (GKE) clusters, Persistent Disks, Cloud Load Balancing, and more.</span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/4_CHO_cost_explorer.max-1000x1000.jpg" alt="4_CHO_cost explorer"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><h2><strong style="vertical-align: baseline;">Simple is powerful</strong></h2> <p><span style="vertical-align: baseline;">Customers who have tried these new tools love the information that is surfaced as well as the simplicity of the interfaces.</span></p> <p style="padding-left: 40px;"><span style="font-style: italic; vertical-align: baseline;">“My team has to keep an eye on cloud costs across tens of business units and hundreds of developers. The Cloud Hub Optimization and Cost Explorer dashboards are a force multiplier for my team as they tell us where to look for cost savings and potential optimization opportunities.”</span><span style="vertical-align: baseline;"> - Frank Dice, Principal Cloud Architect, Major League Baseball</span></p> <p><span style="vertical-align: baseline;">Customers especially appreciate the </span><a href="https://cloud.google.com/stackdriver/docs/costs/optimize-costs#supported_products"><span style="text-decoration: underline; vertical-align: baseline;">breadth of product coverage</span></a><span style="vertical-align: baseline;"> available out of the box without any additional setup, and the fact that there is no additional charge to using these features.</span></p> <h2><strong style="vertical-align: baseline;">What’s next</strong></h2> <p><span style="vertical-align: baseline;">As your organization “shifts left” on cloud cost management, we are working to help application owners and developers understand and optimize their cloud costs. You can try Cloud Hub Optimize and Cost Explorer </span><a href="https://console.cloud.google.com/cloud-hub/optimization"><span style="text-decoration: underline; vertical-align: baseline;">here</span></a><span style="vertical-align: baseline;">.</span></p> <p><span style="vertical-align: baseline;">You can also see a live demo of how Cloud Hub Optimization and Cost Explorer can be used to identify underutilized GKE clusters within seconds in the Google Cloud Next 2025 talk Maximize Your Cloud ROI.</span></p></div> <div class="block-video"> <div class="article-module article-video "> <figure> <a class="h-c-video h-c-video--marquee" href="https://youtube.com/watch?v=7csgD3iIc2Q" data-glue-modal-trigger="uni-modal-7csgD3iIc2Q-" data-glue-modal-disabled-on-mobile="true"> <div class="article-video__aspect-image" style="background-image: url(https://storage.googleapis.com/gweb-cloudblog-publish/images/maxresdefault_LGJSUja.max-1000x1000.jpg);"> <span class="h-u-visually-hidden">Maximize your cloud ROI: A practical approach to efficiency and optimization</span> </div> <svg role="img" class="h-c-video__play h-c-icon h-c-icon--color-white"> <use xlink:href="#mi-youtube-icon"></use> </svg> </a> </figure> </div> <div class="h-c-modal--video" data-glue-modal="uni-modal-7csgD3iIc2Q-" data-glue-modal-close-label="Close Dialog"> <a class="glue-yt-video" data-glue-yt-video-autoplay="true" data-glue-yt-video-height="99%" data-glue-yt-video-vid="7csgD3iIc2Q" data-glue-yt-video-width="100%" href="https://youtube.com/watch?v=7csgD3iIc2Q" ng-cloak> </a> </div> </div> <div class="block-paragraph_advanced"><hr/> <p><sup><span style="font-style: italic; vertical-align: baseline;">Major League Baseball trademarks and copyrights are used with permission of Major League Baseball. Visit MLB.com.</span></sup></p></div>
  58. Senior Product Manager

    Fri, 01 Aug 2025 16:00:00 -0000

    <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">Are you ready to unlock the power of Google Cloud and want guidance on how to set up your environment effectively? Whether you're a cloud novice or part of an experienced team looking to migrate critical workloads, getting your foundational infrastructure right is the key to success. That's where </span><a href="https://cloud.google.com/docs/enterprise/setup-checklist"><strong style="text-decoration: underline; vertical-align: baseline;">Google Cloud Setup</strong></a><span style="vertical-align: baseline;"> comes in — your guided pathway to a secure cloud foundation and quick start on Google Cloud.</span></p> <p><span style="vertical-align: baseline;">Google Cloud Setup helps you quickly implement Google Cloud's recommended best practices. Our goal is to provide a fast and easy path to deploying your workloads without unnecessary configuration effort. Think of it as your expert guide, walking you through the essential first steps so you can focus on what truly matters: rapidly deploying your innovative applications and services. To help you get started without financial barriers, all components and service integrations enabled during the setup process are free or include some level of no-cost access.</span></p></div> <div class="block-aside"><dl> <dt>aside_block</dt> <dd>&lt;ListValue: [StructValue([(&#x27;title&#x27;, &#x27;Try Google Cloud for free&#x27;), (&#x27;body&#x27;, &lt;wagtail.rich_text.RichText object at 0x7f701b68cac0&gt;), (&#x27;btn_text&#x27;, &#x27;Get started for free&#x27;), (&#x27;href&#x27;, &#x27;https://console.cloud.google.com/freetrial?redirectPath=/welcome&#x27;), (&#x27;image&#x27;, None)])]&gt;</dd> </dl></div> <div class="block-paragraph_advanced"><h3><strong style="vertical-align: baseline;">Choose the foundation that fits your needs</strong></h3> <p><span style="vertical-align: baseline;">We understand that every organization and project has unique requirements. That's why Cloud Setup offers three distinct guided flows to choose from:</span></p> <ul> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">Proof-of-concept:</strong><span style="vertical-align: baseline;"> Designed for users who want to set up a lightweight environment to explore Google Cloud and run initial tests or sandbox workloads. This flow focuses on the minimum configuration to get you started quickly.</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">Production:</strong><span style="vertical-align: baseline;"> This flow is recommended for supporting production-ready workloads with security and scalability in mind. It aligns with Google Cloud’s best practices and is tailored for administrators setting up basic foundational infrastructure for production workloads.</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">Enhanced security:</strong><span style="vertical-align: baseline;"> Designed for organizations, regions or workloads with advanced security and compliance requirements, this flow defaults to more advanced security controls and is designed to help you meet rigorous requirements. Even this advanced foundation sets you up with a perpetual free tier up to certain usage limits.</span></p> </li> </ul></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/image1_LQ4uQKn.max-1000x1000.png" alt="1"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><h3><strong style="vertical-align: baseline;">Building blocks for a solid foundation</strong></h3> <p><span style="vertical-align: baseline;">Cloud Setup guides you through a series of onboarding steps, presenting defaults backed by</span><strong style="vertical-align: baseline;"> </strong><a href="https://cloud.google.com/security/best-practices"><strong style="text-decoration: underline; vertical-align: baseline;">Google Cloud best practices</strong></a><span style="vertical-align: baseline;">. Throughout the process, you'll also encounter key features designed to help protect your organization and prepare it for growth, including:</span></p> <ul> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><a href="https://cloud.google.com/kms/docs/kms-autokey"><strong style="text-decoration: underline; vertical-align: baseline;">Cloud KMS AutoKey</strong></a><strong style="vertical-align: baseline;">:</strong><span style="vertical-align: baseline;"> Automates the provisioning and assignment of customer-managed encryption keys (CMEK).</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><a href="https://cloud.google.com/security/products/security-command-center"><strong style="text-decoration: underline; vertical-align: baseline;">Security Command Center</strong></a><strong style="vertical-align: baseline;">: </strong><span style="vertical-align: baseline;">Provides security posture management for Google Cloud deployments including automatic project scanning for security issues such as open ports and misconfigured access controls.</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><a href="https://cloud.google.com/docs/observability"><strong style="text-decoration: underline; vertical-align: baseline;">Centralized Logging and Monitoring</strong></a><strong style="vertical-align: baseline;">:</strong><span style="vertical-align: baseline;"> Enables you to easily set up infrastructure to monitor your system's health and performance from a central location — critical for audit logging compliance and visualizing metrics across projects.</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><a href="https://cloud.google.com/vpc/docs/shared-vpc"><strong style="text-decoration: underline; vertical-align: baseline;">Shared VPC Networks</strong></a><strong style="vertical-align: baseline;">: </strong><span style="vertical-align: baseline;">Allows you to establish a centralized network across multiple projects, enabling secure and efficient communication between your Google Cloud resources.</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><a href="https://cloud.google.com/hybrid-connectivity"><strong style="text-decoration: underline; vertical-align: baseline;">Hybrid Connectivity</strong></a><strong style="vertical-align: baseline;">:</strong><span style="vertical-align: baseline;"> Facilitates connecting your Google Cloud environment to your on-premises infrastructure or other cloud providers. This is often a critical step for workload migrations.</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><a href="https://cloud.google.com/support"><strong style="text-decoration: underline; vertical-align: baseline;">Support plan</strong></a><strong style="vertical-align: baseline;">:</strong><span style="vertical-align: baseline;"> Enables you to quickly resolve any issues with help from experts at Google Cloud.</span></p> </li> </ul> <p><span style="vertical-align: baseline;">At the end of the guided flow, you can deploy your configuration directly via the Google Cloud console or download a </span><a href="https://cloud.google.com/docs/enterprise/deploy-foundation-using-terraform-from-console"><span style="text-decoration: underline; vertical-align: baseline;">Terraform configuration file</span></a><span style="vertical-align: baseline;"> for later deployment using other Infrastructure as Code (IaC) methods.</span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/2_RwqPvpA.gif" alt="2"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><h3><strong style="vertical-align: baseline;">Experience the cloud faster and smarter</strong></h3> <p><span style="vertical-align: baseline;">Organizations using Cloud Setup experience enjoy:</span></p> <ul> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">Faster application deployment: </strong><span style="vertical-align: baseline;">By simplifying the initial setup, you can get your applications up and running more quickly, accelerating your cloud journey.</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">Reduced setup effort:</strong><span style="vertical-align: baseline;"> Our streamlined flow significantly reduces the number of manual steps, allowing you to establish a basic foundation with less effort.</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">Greater access to Google Cloud's full potential: </strong><span style="vertical-align: baseline;">By establishing a solid foundation quickly, you can more easily explore and leverage a wider range of Google Cloud services to meet your evolving needs and unlock greater value.</span></p> </li> </ul> <p><span style="vertical-align: baseline;">Ready to start your Google Cloud journey? Visit Google Cloud Setup today for a streamlined path to a secure cloud foundation. Let us guide you through the initial steps so you can focus on innovation and growth.</span></p> <p><span style="vertical-align: baseline;">To learn more, visit:</span></p> <ul> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><a href="https://cloud.google.com/docs/enterprise/setup-checklist"><span style="text-decoration: underline; vertical-align: baseline;">Cloud Setup documentation</span></a></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><a href="https://console.cloud.google.com/cloud-setup/overview" style="font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, Cantarell, 'Open Sans', 'Helvetica Neue', sans-serif;"><span style="text-decoration: underline; vertical-align: baseline;">Cloud Setup overview</span></a><span style="vertical-align: baseline;"> (requires login)</span></p> </li> </ul></div>
  59. Product Manager

    Fri, 18 Jul 2025 16:00:00 -0000

    <div class="block-paragraph_advanced"><p style="text-align: justify;"><span style="vertical-align: baseline;">As developers and operators, you know that having access to the right information in the proper context is crucial for effective troubleshooting. This is why organizations invest a lot upfront curating monitoring resources across different business units: so information is easy to find and contextualize when needed.</span></p> <p style="text-align: justify;"><span style="vertical-align: baseline;">Today we are reducing the need for this upfront investment with an out-of-the-box </span><strong style="vertical-align: baseline;">Application Monitoring</strong><span style="vertical-align: baseline;"> experience for your organization on Google Cloud within </span><a href="https://cloud.google.com/stackdriver/docs"><span style="text-decoration: underline; vertical-align: baseline;">Cloud Observability</span></a><span style="vertical-align: baseline;">. </span></p> <p style="text-align: justify;"><span style="vertical-align: baseline;">Application Monitoring consists of a set of pre-curated dashboards with relevant metrics and logs mapped to a user-defined application in </span><a href="https://cloud.google.com/products/app-hub"><span style="text-decoration: underline; vertical-align: baseline;">App Hub</span></a><span style="vertical-align: baseline;">. It incorporates best practices pioneered by Google Site Reliability Engineers (SRE) to optimize manual troubleshooting and unlock AI-assisted troubleshooting.</span></p> <p><span style="vertical-align: baseline;">Application Monitoring automatically labels and brings together key telemetry for your application into a centralized experience, making it easy to discover, filter and correlate trends. It also feeds application context into </span><a href="https://cloud.google.com/gemini/docs/cloud-assist/investigations"><span style="text-decoration: underline; vertical-align: baseline;">Gemini Cloud Assist Investigations</span></a><span style="vertical-align: baseline;">, for AI-assisted troubleshooting. </span></p></div> <div class="block-aside"><dl> <dt>aside_block</dt> <dd>&lt;ListValue: [StructValue([(&#x27;title&#x27;, &#x27;Try Google Cloud for free&#x27;), (&#x27;body&#x27;, &lt;wagtail.rich_text.RichText object at 0x7f701b792970&gt;), (&#x27;btn_text&#x27;, &#x27;Get started for free&#x27;), (&#x27;href&#x27;, &#x27;https://console.cloud.google.com/freetrial?redirectPath=/welcome&#x27;), (&#x27;image&#x27;, None)])]&gt;</dd> </dl></div> <div class="block-paragraph_advanced"><h3><span style="vertical-align: baseline;">1. Application, service and workload dashboards </span></h3> <p style="text-align: justify;"><strong style="font-style: italic; vertical-align: baseline;">No more spending hours configuring application dashboards. </strong></p> <p style="text-align: justify;"><span style="vertical-align: baseline;">From the moment you </span><a href="https://cloud.google.com/app-hub/docs/set-up-app-hub-folder"><span style="text-decoration: underline; vertical-align: baseline;">describe your application in App Hub</span></a><span style="vertical-align: baseline;">, Application Monitoring starts to automatically build dashboards tailored to your environment. Each dashboard comprises relevant telemetry for your application and is searchable, filterable and ready for deep dives — no configuration required. </span></p> <p style="text-align: justify;"><span style="vertical-align: baseline;">The dashboards offer an overview of charts detailing the </span><a href="https://sre.google/sre-book/monitoring-distributed-systems/#xref_monitoring_golden-signals" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">SRE Four Golden Signals</span></a><span style="vertical-align: baseline;">: traffic, latency, error rate, and saturation. This provides a high-level view of application performance, integrating automatically collected system metrics across various services and workloads such as load balancers, Cloud Run, GKE workloads, MIGs, and databases. From this overview, you can then drill down into services or workloads with performance issues or active alerts to access detailed metrics and logs.</span></p> <p><span style="vertical-align: baseline;">For example in the image below, a user defined an App Hub application called </span><span style="font-style: italic; vertical-align: baseline;">Cymbal BnB app</span><span style="vertical-align: baseline;">, with multiple services and workloads. The flow below shows the automatically generated experience with golden signals, alerts and relevant logs.</span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/1_zgV6J6C.gif" alt="1"> </a> <figcaption class="article-image__caption "><p data-block-key="g1e0b">Figure 1 - A user’s flow from an App Hub defined application (i.e. Cymbal BnB) to the automatic prebuilt Application Monitoring experience in Cloud Observability</p></figcaption> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><h3 role="presentation" style="text-align: justify;"><span style="vertical-align: baseline;">2. Labels and context propagation </span></h3> <p style="text-align: justify;"><strong style="font-style: italic; vertical-align: baseline;">See application labels propagated seamlessly across Google Cloud </strong></p> <p style="text-align: justify;"><span style="vertical-align: baseline;">Once Application Monitoring is enabled, your application labels are propagated across Google Cloud, so you can see and use them to filter and focus on the most essential signals across the logs, metrics and trace explorers.</span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/2_yj24vCu.max-1000x1000.png" alt="2"> </a> <figcaption class="article-image__caption "><p data-block-key="g1e0b">Figure 2 - Logs Explorer showing application automatically tagged with application labels</p></figcaption> </figure> </div> </div> </div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/3_kukVdIB.max-1000x1000.png" alt="3"> </a> <figcaption class="article-image__caption "><p data-block-key="g1e0b">Figure 3 - Metrics Explorer showing application labels automatically associated with metrics</p></figcaption> </figure> </div> </div> </div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/4_BGEDIwf.max-1000x1000.png" alt="4"> </a> <figcaption class="article-image__caption "><p data-block-key="g1e0b">Figure 4 - Trace Explorer showing AppHub label Integration</p></figcaption> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><h3><span style="vertical-align: baseline;">3. Gemini Cloud Assist Investigations</span></h3> <p style="text-align: justify;"><strong style="font-style: italic; vertical-align: baseline;">Troubleshoot issues faster with AI powered Investigations. </strong></p> <p><a href="https://cloud.google.com/gemini/docs/cloud-assist/investigations"><span style="text-decoration: underline; vertical-align: baseline;">Gemini Cloud Assist’s investigation feature</span></a><span style="vertical-align: baseline;"> makes it easier to troubleshoot issues because application boundaries and relationships have been propagated into the AI model, grounding it in context about your environment.  </span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/5_O7Wiid5.gif" alt="5"> </a> <figcaption class="article-image__caption "><p data-block-key="g1e0b">Figure 5 - Seamless entry point into Gemini Cloud Assist powered Investigations from application logs</p></figcaption> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><p style="text-align: justify;"><span style="vertical-align: baseline;">Note - Gemini Cloud Assist Investigations is currently in private preview</span></p> <h3><span style="vertical-align: baseline;">Try Application Monitoring today</span></h3> <p style="text-align: justify;"><span style="vertical-align: baseline;">The new</span><span style="vertical-align: baseline;"> Application Monitoring experience provides a low-effort unified view of application and infrastructure performance for your troubleshooting needs.</span></p> <p style="text-align: justify;"><span style="vertical-align: baseline;">Take advantage of the new Google Cloud Application Monitoring experience by:</span></p> <ol> <li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"> <p role="presentation" style="text-align: justify;"><span style="vertical-align: baseline;">Visiting your Cloud console</span></p> </li> <li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"> <p role="presentation" style="text-align: justify;"><a href="https://cloud.google.com/app-hub/docs/set-up-app-hub-folder"><span style="text-decoration: underline; vertical-align: baseline;">Setting up </span><strong style="text-decoration: underline; vertical-align: baseline;">Applications</strong><span style="text-decoration: underline; vertical-align: baseline;"> in AppHub</span></a></p> </li> <ol> <li aria-level="2" style="list-style-type: lower-alpha; vertical-align: baseline;"> <p role="presentation" style="text-align: justify;"><span style="vertical-align: baseline;">Adding </span><strong style="vertical-align: baseline;">Services</strong><span style="vertical-align: baseline;"> and </span><strong style="vertical-align: baseline;">Workloads</strong><span style="vertical-align: baseline;"> to your Application</span></p> </li> </ol> <li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"> <p role="presentation" style="text-align: justify;"><span style="vertical-align: baseline;">Navigating to </span><strong style="vertical-align: baseline;">Application Monitoring</strong><span style="vertical-align: baseline;"> in Cloud Observability to see your automatically built experience</span></p> </li> <li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"> <p role="presentation" style="text-align: justify;"><span style="vertical-align: baseline;">Enable your Gemini Cloud Assist SKU and </span><a href="https://cloud.google.com/earlyaccess/gemini-cloud-assist?e=48754805&amp;hl=en"><span style="text-decoration: underline; vertical-align: baseline;">sign up for the trusted tester program</span></a><span style="vertical-align: baseline;"> to get access to the</span><strong style="vertical-align: baseline;"> Investigations experience</strong></p> </li> </ol> <h3 style="text-align: justify;"><span style="vertical-align: baseline;">Related docs</span></h3> <ol style="list-style-type: lower-alpha;"> <li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"> <p role="presentation" style="text-align: justify;"><span style="vertical-align: baseline;">Application Monitoring </span><a href="https://cloud.google.com/stackdriver/docs/observability/about-application-monitoring"><span style="text-decoration: underline; vertical-align: baseline;">docs</span></a></p> </li> <li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"> <p role="presentation" style="text-align: justify;"><span style="vertical-align: baseline;">AppHub </span><a href="https://cloud.google.com/app-hub/docs/set-up-app-hub"><span style="text-decoration: underline; vertical-align: baseline;">docs</span></a></p> <ol style="list-style-type: lower-alpha;"> <li role="presentation" style="text-align: justify;"><span style="vertical-align: baseline;">Apphub </span><a href="https://cloud.google.com/app-hub/docs/supported-resources" style="font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, Cantarell, 'Open Sans', 'Helvetica Neue', sans-serif;"><span style="text-decoration: underline; vertical-align: baseline;">coverage docs</span></a></li> </ol> </li> </ol></div>
  60. Director of Engineering, Google Cloud

    Thu, 10 Jul 2025 09:30:00 -0000

    <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">At Google Cloud, we are committed to making it as seamless as possible for you to build and deploy the next generation of AI and agentic applications. Today, we’re thrilled to announce that we are </span><a href="https://docker.com/blog/build-ai-agents-with-docker-compose/" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">collaborating with Docker</span></a><span style="vertical-align: baseline;"> to drastically simplify your deployment workflows, enabling you to bring your sophisticated AI applications from local development to </span><a href="https://cloud.google.com/run"><span style="text-decoration: underline; vertical-align: baseline;">Cloud Run</span></a><span style="vertical-align: baseline;"> with ease. </span></p> <h3><strong style="vertical-align: baseline;">Deploy your compose.yaml directly to Cloud Run</strong></h3> <p><span style="vertical-align: baseline;">Previously, bridging the gap between your development environment and managed platforms like Cloud Run required you to manually translate and configure your infrastructure. Agentic applications that use MCP servers and self-hosted models added additional complexity. </span></p> <p><span style="vertical-align: baseline;">The open-source </span><a href="http://compose-spec.io" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">Compose Specification</span></a><span style="vertical-align: baseline;"> is one of the most popular ways for developers to iterate on complex applications in their local environment, and is the basis of Docker Compose. And now, </span><strong style="vertical-align: baseline;">gcloud run compose up</strong><span style="vertical-align: baseline;"> brings the simplicity of Docker Compose to Cloud Run, automating this entire process. Now in </span><a href="https://forms.gle/XDHCkbGPWWcjx9mk9" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">private preview</span></a><span style="vertical-align: baseline;">, you can deploy your existing</span><code style="vertical-align: baseline;"> compose.yaml</code><span style="vertical-align: baseline;"> file to Cloud Run with a single command, including building containers from source and leveraging Cloud Run’s volume mounts for data persistence.  </span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/compose.gif" alt="compose"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">Supporting the Compose Specification with Cloud Run makes for easy transitions across your local and cloud deployments, where you can keep the same configuration format, ensuring consistency and accelerating your dev cycle.</span></p> <p style="padding-left: 40px;"><span style="font-style: italic; vertical-align: baseline;">“We’ve recently evolved Docker Compose to support agentic applications, and we’re excited to see that innovation extend to Google Cloud Run with support for GPU-backed execution. Using Docker and Cloud Run, developers can now iterate locally and deploy intelligent agents to production at scale with a single command. It’s a major step forward in making AI-native development accessible and composable. We’re looking forward to continuing our close collaboration with Google Cloud to simplify how developers build and run the next generation of intelligent applications.” - </span><span style="vertical-align: baseline;">Tushar Jain, EVP Engineering and Product, Docker</span></p> <h3><strong style="vertical-align: baseline;">Cloud Run, your home for AI applications</strong></h3> <p><span style="vertical-align: baseline;">Support for the compose spec isn’t the only AI-friendly innovation you’ll find in Cloud Run. We recently announced </span><a href="https://cloud.google.com/blog/products/serverless/cloud-run-gpus-are-now-generally-available"><span style="text-decoration: underline; vertical-align: baseline;">general availability of Cloud Run GPUs</span></a><span style="vertical-align: baseline;">, removing a significant barrier to entry for developers who want access to GPUs for AI workloads. With its pay-per-second billing, scale to zero, and rapid scaling (which takes approximately 19 seconds for a gemma3:4b model for time-to-first-token), Cloud Run is a great hosting solution for deploying and serving LLMs. </span></p> <p><span style="vertical-align: baseline;">This also makes Cloud Run a strong solution for Docker’s recently </span><a href="https://www.docker.com/blog/docker-mcp-gateway-secure-infrastructure-for-agentic-ai/" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">announced</span></a><span style="vertical-align: baseline;"> OSS MCP Gateway and Model Runner, making it easy for developers to take the AI applications locally to production in the cloud seamlessly. By supporting Docker’s recent addition of </span><a href="https://github.com/compose-spec/compose-spec/blob/main/spec.md#models" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">‘models’ to the open Compose Spec</span></a><span style="vertical-align: baseline;">, you can deploy these complex solutions to the cloud with a single command.  </span></p> <h3><strong style="vertical-align: baseline;">Bringing it all together</strong></h3> <p><span style="vertical-align: baseline;">Let's review the compose file for the above demo. It consists of a multi-container application (defined in </span><code style="vertical-align: baseline;">services</code><span style="vertical-align: baseline;">) built from sources and leveraging a storage volume (defined in </span><code style="vertical-align: baseline;">volumes</code><span style="vertical-align: baseline;">). It also uses the new </span><code style="vertical-align: baseline;">models</code><span style="vertical-align: baseline;"> attribute to define AI models and a Cloud Run-extension defining the runtime image to use:</span></p></div> <div class="block-code"><dl> <dt>code_block</dt> <dd>&lt;ListValue: [StructValue([(&#x27;code&#x27;, &#x27;name: agent\r\nservices:\r\n webapp:\r\n build: .\r\n ports:\r\n - &quot;8080:8080&quot;\r\n volumes:\r\n - web_images:/assets/images\r\n depends_on:\r\n - adk\r\n\r\n adk:\r\n image: us-central1-docker.pkg.dev/jmahood-demo/adk:latest\r\n ports:\r\n - &quot;3000:3000&quot;\r\n models:\r\n - ai-model\r\n\r\nmodels:\r\n ai-model:\r\n model: ai/gemma3-qat:4B-Q4_K_M\r\n x-google-cloudrun:\r\n inference-endpoint: docker/model-runner:latest-cuda12.2.2\r\n\r\nvolumes:\r\n web_images:&#x27;), (&#x27;language&#x27;, &#x27;&#x27;), (&#x27;caption&#x27;, &lt;wagtail.rich_text.RichText object at 0x7f701bfd09a0&gt;)])]&gt;</dd> </dl></div> <div class="block-paragraph_advanced"><h3><strong style="vertical-align: baseline;">Building the future of AI</strong></h3> <p><span style="vertical-align: baseline;">We’re committed to offering developers maximum flexibility and choice by adopting open standards and supporting various agent frameworks.</span><strong style="vertical-align: baseline;"> </strong><span style="vertical-align: baseline;">This collaboration on Cloud Run and Docker is another example of how we aim to simplify the process for developers to build and deploy intelligent applications. </span></p> <p><span style="vertical-align: baseline;">Compose Specification support is available for our trusted users — </span><a href="https://forms.gle/XDHCkbGPWWcjx9mk9" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">sign up here for the private preview</span></a><span style="vertical-align: baseline;">. </span></p></div>
  61. Principal Platform Engineer, John Lewis Partnership

    Thu, 26 Jun 2025 16:00:00 -0000

    <div class="block-paragraph_advanced"><p><strong style="font-style: italic; vertical-align: baseline;">Editor's note:</strong><span style="font-style: italic; vertical-align: baseline;"> This is part one of the story. After you’re finished reading, head over to </span><a href="https://cloud.google.com/blog/products/application-development/simplifying-platform-engineering-at-john-lewis-part-two"><span style="font-style: italic; text-decoration: underline; vertical-align: baseline;">part two</span></a><span style="font-style: italic; vertical-align: baseline;">. </span></p> <hr/> <p><span style="vertical-align: baseline;">In 2017, John Lewis, a major UK retailer with a £2.5bn annual online turnover, was hampered by its monolithic e-commerce platform. This outdated approach led to significant cross-team dependencies, cumbersome and infrequent releases (monthly at best), and excessive manual testing, all further hindered by complex on-premises infrastructure. What was needed were some bold decisions to drive a quick and significant transformation.</span></p> <p><span style="vertical-align: baseline;">The John Lewis engineers knew there was a better way. Working with Google Cloud, they modernized their e-commerce operations with </span><a href="https://cloud.google.com/kubernetes-engine"><span style="text-decoration: underline; vertical-align: baseline;">Google Kubernetes Engine</span></a><span style="vertical-align: baseline;">. They started with the frontend, and started to see results fast: the frontend was moved onto Google Cloud in mere months, releases to the frontend browser journey started to happen weekly, and the business gladly backed expansion into other areas.</span></p> <p><span style="vertical-align: baseline;">At the same time, the team had a broader strategy in mind: to take </span><a href="https://cloud.google.com/solutions/platform-engineering"><span style="text-decoration: underline; vertical-align: baseline;">a platform engineering approach</span></a><span style="vertical-align: baseline;">, creating many product teams who built their own microservices to replace the functionality of the legacy commerce engine, as well as creating brand new experiences for customers. </span></p> <p><span style="vertical-align: baseline;">And so The John Lewis Digital Platform was born. The vision was to empower development teams and arm them with the tools and processes they needed to go to market fast, with full ownership of their own business services. The team’s motto? "You Build It. You Run It. You Own It." This decentralization of development and operational responsibilities would also enable the team to scale. </span></p> <p><span style="vertical-align: baseline;">This article features insights from Principal Platform Engineer Alex Moss, who delves into their strategy, platform build, and key learnings of John Lewis’ journey to modernize and streamline its operations with platform engineering — so you can begin to think about how you might apply platform engineering to your own organization.</span></p></div> <div class="block-aside"><dl> <dt>aside_block</dt> <dd>&lt;ListValue: [StructValue([(&#x27;title&#x27;, &#x27;Try Google Cloud for free&#x27;), (&#x27;body&#x27;, &lt;wagtail.rich_text.RichText object at 0x7f701b6e8f10&gt;), (&#x27;btn_text&#x27;, &#x27;Get started for free&#x27;), (&#x27;href&#x27;, &#x27;https://console.cloud.google.com/freetrial?redirectPath=/welcome&#x27;), (&#x27;image&#x27;, None)])]&gt;</dd> </dl></div> <div class="block-paragraph_advanced"><h3><strong style="vertical-align: baseline;">Step 1: From monolithic to multi-tenant</strong></h3> <p><span style="vertical-align: baseline;">In order to make this happen, John Lewis needed to adopt a multi-tenant architecture — one tenant for each business service, allowing each owning team to work independently without risk to others -- and thereby permitting the Platform team to give the team a greater degree of freedom.</span></p> <p><span style="vertical-align: baseline;">Knowing that the business' primary objective was to greatly increase the number of product teams helped inform our initial design thinking, positioning ourselves to enable many independent teams even though we only had a handful of tenants. </span></p> <p><span style="vertical-align: baseline;">This foundational design has served us very well and is largely unchanged now, seven years later. Central to the multi-tenant concept is what we chose to term a "Service" — a logical business application, usually composed of several microservices plus components for storing data.</span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/article1-image1.max-1000x1000.png" alt="article1-image1"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">We largely position our platform as a “bring your own container” experience, but encourage teams to make use of other Google Cloud services — particularly for handling state. Adopting services like Firestore and Pub/Sub reduces the complexity that our platform team has to work with, particularly for areas like resilience and disaster recovery. We also favor Kubernetes over compute products like Cloud Run because it strikes the right balance for us between enabling development teams to have freedom whilst allowing our platform to drive certain certain behaviours, e.g., the right level of guardrails, without introducing too much friction.</span></p> <p><span style="vertical-align: baseline;">On our platform, Product Teams (i.e., tenants) have a large amount of control over their own Namespaces and Projects. This allows them to prototype, build, and ultimately operate, their workloads without dependency on others — a crucial element of enabling scale. </span></p> <p><span style="vertical-align: baseline;">Our early-adopter teams were extremely helpful in helping evolve the platform; they were accepting of the lack of features and willing to develop their own solutions, and provided very rich feedback on whether we were building something that met their needs.</span></p> <p><span style="vertical-align: baseline;">The first tenant to adopt the platform was rebuilding the </span><a href="http://johnlewis.com" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">johnlewis.com</span></a><span style="vertical-align: baseline;">, search capability, replacing a commercial-off-the-shelf solution. This team was staffed with experienced engineers familiar with modern software development and the advantages of a microservice-based architecture. They quickly identified the need for supporting services for their application to store data and asynchronously communicate between their components. They worked with the Platform Team to identify options, and were onboard with our desire to lean into Google Cloud native services to avoid running our own databases or messaging. This led to us adopting Cloud Datastore and Pub/Sub for our first features that extended beyond Google Kubernetes Engine.</span></p> <h3><strong style="vertical-align: baseline;">All roads lead to success</strong></h3> <p><span style="vertical-align: baseline;">A risk with a platform that allows very high team autonomy is that it can turn into a bit of a wild-west of technology choices and implementation patterns. To handle this, but to do so in a way that remained developer-centric, we adopted the concept of a </span><strong style="vertical-align: baseline;">paved road, </strong><span style="vertical-align: baseline;"> analogous to a “golden path.” </span></p> <p><span style="vertical-align: baseline;">We found that the paved road approach made it easier to:</span></p> <ul> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">build useful platform features to help developers do things rapidly and safely</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">share approaches and techniques, and engineers to move between teams</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">demonstrate to the wider organisation that teams are following required practices (which we do by building assurance capabilities, </span><strong style="vertical-align: baseline;">not </strong><span style="vertical-align: baseline;">by gating release)</span></p> </li> </ul> <p><span style="vertical-align: baseline;">The concept of the paved road permeates most of what the platform builds, and has inspired other areas of the John Lewis Partnership beyond the John Lewis Digital space.</span></p> <p><span style="vertical-align: baseline;">Our paved road is powered by two key features to enable simplification for teams:</span></p> <ol> <li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">The Paved Road Pipeline</strong><span style="vertical-align: baseline;">. This operates on the whole Service and drives capabilities such as Google Cloud resource provisioning and observability tools.</span></p> </li> <li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">The Microservice CRD</strong><span style="vertical-align: baseline;">. As the name implies, this is an abstraction at the microservice level. The majority of the benefit here is in making it easier for teams to work with Kubernetes.</span></p> </li> </ol> <p><span style="vertical-align: baseline;">Whilst both features were created with the developer experience in mind, we discovered that they also hold a number of benefits for the platform team too.</span></p> <p><span style="vertical-align: baseline;">The Paved Road Pipeline is driven by a configuration file — in yaml (of course!) — which we call the Service Definition. This allows </span><strong style="vertical-align: baseline;">the team that owns the tenancy</strong><span style="vertical-align: baseline;"> to describe, through easy-to-reason-about configuration, what they would like the platform to provide for them. Supporting documentation and examples help them understand what can be achieved. Pushes to this file then drive a CI/CD pipeline for a number of platform-owned jobs, which we refer to as provisioners. These provisioners are microservices-like themselves in that they are independently releasable and generally focus on performing one task well. Here are some examples of our provisioners and what they can do:</span></p> <ul> <li role="presentation"><span style="vertical-align: baseline;">Create Google Cloud resources in a tenant’s Project. For example, </span><a href="https://cloud.google.com/storage/docs/creating-buckets"><span style="text-decoration: underline; vertical-align: baseline;">Buckets</span></a><span style="vertical-align: baseline;">, </span><a href="https://cloud.google.com/pubsub/docs/overview"><span style="text-decoration: underline; vertical-align: baseline;">PubSub</span></a><span style="vertical-align: baseline;">, and </span><a href="https://firebase.google.com/docs/firestore" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">Firestore</span></a><span style="vertical-align: baseline;"> — amongst many others</span></li> <li role="presentation"><span style="vertical-align: baseline;">Configure platform-provided dashboards and custom dashboards based on golden-signal and self-instrumented metrics</span></li> <li role="presentation"><span style="vertical-align: baseline;">Tune alert configurations for a given microservice’s SLOs, and the incident response behaviour for those alerts</span></li> </ul></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/article1-image2.max-1000x1000.png" alt="article1-image2"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">Our product teams are therefore freed from the need to familiarize themselves deeply with how Google Cloud resource provisioning works, or Infrastructure-as-Code (IaC) tooling for that matter. Our preferred technologies and good practices can be curated by our experts, and developers can focus on building differentiating software for the business, while remaining fully in control of what is provisioned and when.</span></p> <p><span style="vertical-align: baseline;">Earlier, we mentioned that this approach has the added benefit of being something that the platform team can rely upon to build their own features. The configuration updated by teams for their Service can be combined with metadata about their team and surfaced via an API and events published to Pub/Sub. This can then drive updates to other features like incident response and security tooling, pre-provision documentation repositories, and more. This is an example of how something that was originally intended as a means to help teams avoid writing their own IaC can also be used to make it easier for us to build platform features, further improving the value-add — without the developer even needing to be aware of it!</span></p> <p><span style="vertical-align: baseline;">We think this approach is also more scalable than providing pre-built Terraform modules for teams to use. That approach still burdens teams with being familiar with Terraform, and versioning and dependency complexities can create maintenance headaches for platform engineers. Instead, we provide an easy-to-reason-about API and </span><strong style="vertical-align: baseline;">deliberately burden the platform team,</strong><span style="vertical-align: baseline;"> ensuring that the Service provides all the functionality our tenants require. This abstraction also means we can make significant refactoring choices if we need to.</span></p> <p><span style="vertical-align: baseline;">Adopting this approach also results in a broad consistency in technologies across our platform. For example, why would a team implement Kafka when the platform makes creating resources in Pub/Sub so easy? When you consider that this spans not just the runtime components that assemble into a working business service, but also all the ancillary needs for operating that software — resilience engineering, monitoring &amp; alerting, incident response, security tooling, service management, and so on—  this has a massive amplifying effect on our engineers’ productivity. All of these areas have full paved road capabilities on the John Lewis Digital Platform, reducing the cognitive load for teams in recognizing the need for, identifying appropriate options, and then implementing technology or processes to use them.</span></p> <p><span style="vertical-align: baseline;">That being said, one of the reasons we particularly like the paved road concept is because it doesn't preclude teams choosing to "go off-road." A paved road shouldn’t be mandatory, but it should be compelling to use, so that engineers aren’t tempted to do something else. Preventing use of other approaches risks stifling innovation and the temptation to think the features you've built are "good enough." The paved road challenges our Platform Engineers to keep improving their product so that it continues to meet our Developers' changing needs. Likewise, development teams tempted to go off-road are put off by the increasing burden of replicating powerful platform features. </span></p> <p><span style="vertical-align: baseline;">The needs of our Engineers don’t remain fixed, and Google Cloud are of course releasing new capabilities all the time, so we have extended the analogy to include a “dusty path” representing brand new platform features that aren’t as feature-rich as we’d like (perhaps they lack self-service provisioning or out-the-box observability). Teams are trusted to try different options and make use of Google Cloud products that we haven't yet paved. The Paved Road Pipeline allows for this experimentation - what we term "snowflaking". We then have an unofficial "rule of three", whereby if we notice at least 3 teams requesting the same feature, we move to make the use of it self-service.</span></p> <p><span style="vertical-align: baseline;">At the other end of the scale, teams can go completely solo — which we refer to as “crazy paving” — and might be needed to support wild experimentation or to accommodate a workload which cannot comply with the platform’s expectations for safe operation. Solutions in this space are generally not long-lived.</span></p> <p><span style="vertical-align: baseline;">In this article, we've covered how John Lewis revolutionized its e-commerce operations by adopting a multi-tenant, "paved road" approach to platform engineering. We explored how this strategy empowered development teams and streamlined their ability to provision Google Cloud resources and deploy operational and security features.</span></p> <p><span><span style="vertical-align: baseline;">In </span><a href="https://cloud.google.com/blog/products/application-development/simplifying-platform-engineering-at-john-lewis-part-two?e=48754805"><span style="text-decoration: underline; vertical-align: baseline;">part 2</span></a><span style="vertical-align: baseline;"> of this series, we'll dive deeper into how John Lewis further simplified the developer experience by introducing the Microservice CRD. You'll discover how this custom Kubernetes abstraction significantly reduced the complexity of working with Kubernetes at the component level, leading to faster development cycles and enhanced operational efficiency.</span></span></p> <p><span style="vertical-align: baseline;">To learn more about shifting down with platform engineering on Google Cloud, you can find more information available </span><a href="https://cloud.google.com/solutions/platform-engineering"><span style="text-decoration: underline; vertical-align: baseline;">here</span></a><span style="vertical-align: baseline;">. To learn more about how Google Kubernetes Engine (GKE) empowers developers to effortlessly deploy, scale, and manage containerized applications with its fully managed, robust, and intelligent Kubernetes service, you can find more information </span><a href="https://cloud.google.com/kubernetes-engine"><span style="text-decoration: underline; vertical-align: baseline;">here</span></a><span style="vertical-align: baseline;">.</span></p></div>
  62. Principal Platform Engineer, John Lewis Partnership

    Thu, 26 Jun 2025 16:00:00 -0000

    <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">In our </span><a href="https://cloud.google.com/blog/products/application-development/simplifying-platform-engineering-at-john-lewis-part-one"><span style="text-decoration: underline; vertical-align: baseline;">previous article</span></a><span style="vertical-align: baseline;"> we introduced the John Lewis Digital Platform and its approach to simplifying the developer experience through platform engineering and so-called paved road features. We focused on the ways that platform engineering enables teams to create resources in Google Cloud and deploy the platform's operational and security features within dedicated tenant environments. In this article, we will build upon that concept for the next level of detail — how the platform simplifies build and run at a component (typically for us, a microservice) level too.</span></p> <p><span style="vertical-align: baseline;">Within just over a year, the John Lewis Digital Platform had fully evolved into a product. We had approximately 25 teams using our platform, with several key parts of the johnlewis.com retail website running in production. We had built a self-service capability to help teams provision resources in Google Cloud, and firmly established that the foundation of our platform was on Google Kubernetes Engine (GKE). But we were hearing signals from some of the recent teams that there was a learning curve to Kubernetes. This was expected — we were driving a cultural change for teams to build and run their own services, and so we anticipated that our application developers would need some Kubernetes skills to support their own software. But our vision was that we wanted to make developers' lives easier — and their feedback was clear. In some cases, we observed that teams weren't following "good practice"  (despite the existence of good documentation!) such as not using anti-affinity rules or </span><code style="vertical-align: baseline;">PodDisruptionBudgets</code><span style="vertical-align: baseline;"> to help their workloads tolerate failure.</span></p></div> <div class="block-aside"><dl> <dt>aside_block</dt> <dd>&lt;ListValue: [StructValue([(&#x27;title&#x27;, &#x27;Try Google Cloud for free&#x27;), (&#x27;body&#x27;, &lt;wagtail.rich_text.RichText object at 0x7f701b2cbc70&gt;), (&#x27;btn_text&#x27;, &#x27;Get started for free&#x27;), (&#x27;href&#x27;, &#x27;https://console.cloud.google.com/freetrial?redirectPath=/welcome&#x27;), (&#x27;image&#x27;, None)])]&gt;</dd> </dl></div> <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">All the way back in 2017, Kelsey Hightower wrote: “</span><span style="font-style: italic; vertical-align: baseline;">Kubernetes is a platform for building platforms. It's a better place to start, not the endgame.”</span></p> <p><span style="vertical-align: baseline;">Kelsey's quote inspired us to act. We had the idea to write our own custom controller to simplify the point of interaction for a developer with Kubernetes — a John Lewis-specific abstraction that aligned to our preferred approaches. And thus the JL </span><code style="vertical-align: baseline;">Microservice</code><span style="vertical-align: baseline;"> was born.</span></p> <p><span style="vertical-align: baseline;">To do this, we declared a Kubernetes  </span><code style="vertical-align: baseline;">CustomResourceDefinition</code><span style="vertical-align: baseline;"> with a simplified specification containing just the fields we felt our developers needed to set. For example, as we expect our tenants to build and operate their applications themselves, attributes such as the number of replicas and the amount of resources needed are best left up to the developers themselves. But do they really need to be able to customize the rules defining how to distribute pods across nodes? How often do they need to change the </span><code style="vertical-align: baseline;">Service</code><span style="vertical-align: baseline;"> pointing towards their </span><code style="vertical-align: baseline;">Deployment</code><span style="vertical-align: baseline;">? When we looked closer, we realized just how much duplication there was — our analysis at the time suggested that only around 33% of the lines in the yaml files developers were producing were relevant to their application. This was a target-rich scenario for simplification.</span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/article2-image1.max-1000x1000.png" alt="article2-image1"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">To help us build this feature, we selected </span><a href="https://github.com/kubernetes-sigs/kubebuilder" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">Kubebuilder,</span></a><span style="vertical-align: baseline;">  using it to declare our </span><code style="vertical-align: baseline;">CustomResourceDefinition</code><span style="vertical-align: baseline;"> and then build the Controller (what we call </span><code style="vertical-align: baseline;">MicroserviceManager</code><span style="vertical-align: baseline;">). This turned out to be a beneficial decision — initial prototyping was quick, and the feature was launched a few months later, and very well-received. Our team had to skill up in the </span><a href="https://go.dev/" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">Go programming language</span></a><span style="vertical-align: baseline;">, but this trade-off felt worthwhile due to the advantages Kubebuilder was bringing to the table, and it has continued to be helpful for other software engineering since.</span></p> <p><span style="vertical-align: baseline;">The initial implementation replaced an engineer's need to understand and fully configure a </span><code style="vertical-align: baseline;">Deployment</code><span style="vertical-align: baseline;"> and </span><code style="vertical-align: baseline;">Service</code><span style="vertical-align: baseline;">, instead applying a much briefer yaml file containing only the fields they need to change. As well as direct translation of identical fields (</span><code style="vertical-align: baseline;">image</code><span style="vertical-align: baseline;"> and </span><code style="vertical-align: baseline;">replicas </code><span style="vertical-align: baseline;">are equivalent to what you would see in a </span><code style="vertical-align: baseline;">Deployment</code><span style="vertical-align: baseline;">, for example), it also allowed us to simplify the choices made by the Kubernetes APIs, because in John Lewis we didn't need some of that functionality. For example, </span><code style="vertical-align: baseline;">writablePaths: []</code><span style="vertical-align: baseline;"> is an easy concept for our engineers to understand, and behind the scenes, our controller is converting those into the more complex combination of </span><code style="vertical-align: baseline;">Volumes </code><span style="vertical-align: baseline;">and </span><code style="vertical-align: baseline;">VolumeMounts</code><span style="vertical-align: baseline;">. Likewise, </span><code style="vertical-align: baseline;">visibleToOtherServices: true</code><span style="vertical-align: baseline;"> is an example of us simplifying the interaction with Kubernetes </span><code style="vertical-align: baseline;">NetworkPolicy</code><span style="vertical-align: baseline;"> — rather than requiring teams to read our documentation to understand the necessary incantations to label their resources correctly, the controller understands those conventions and handles it for them.</span></p> <p><span style="vertical-align: baseline;">With the core concept of the </span><code style="vertical-align: baseline;">Microservice </code><span style="vertical-align: baseline;">resource established, we were able to improve the value-add by augmenting it with further features. We rapidly extended it out to define our Prometheus scrape configuration, then more complex features such as allowing teams to declare that they use Google Cloud Endpoints, and have the controller inject the necessary sidecar container into their </span><code style="vertical-align: baseline;">Deployment</code><span style="vertical-align: baseline;"> and wiring it up to the </span><code style="vertical-align: baseline;">Service</code><span style="vertical-align: baseline;">. As we added more features, existing tenants converted to use this specification, and it now makes up the majority of workloads declared on the platform.</span></p> <h3><strong style="vertical-align: baseline;">Moving the platform boundary</strong></h3> <p><span style="vertical-align: baseline;">Our motivation to build MicroserviceManager was focused on making developers' lives easier. But we discovered an additional benefit that we had not initially expected - it was something we could greatly benefit from </span><span style="font-style: italic; vertical-align: baseline;">within</span><span style="vertical-align: baseline;"> the platform as well. It enabled us to make changes behind the scenes without needing to involve our tenants — reducing toil for them and making it easier for us to improve our product. This was a slightly unexpected but an exceptionally powerful benefit. It is generally difficult to change the agreement that you’ve established between your tenants and the platform, and creating an abstraction like this has allowed us to bring more under our control, for everyone’s benefit.</span></p> <p><span style="vertical-align: baseline;">An example of this was something we observed through our live load testing of johnlewis.com when certain workloads burst up to several hundred </span><code style="vertical-align: baseline;">Pods</code><span style="vertical-align: baseline;"> — numbers that exceeded the typical number of </span><code style="vertical-align: baseline;">Nodes</code><span style="vertical-align: baseline;"> we had running in the cluster. This led to new </span><code style="vertical-align: baseline;">Node</code><span style="vertical-align: baseline;"> creation — therefore slower </span><code style="vertical-align: baseline;">Pod</code><span style="vertical-align: baseline;"> autoscaling and poor bin-packing. Experienced Kubernetes operators can probably guess what was happening here: our default antiAffinity rules were set to optimize for resilience such that no more than one replica was allowed on any given </span><code style="vertical-align: baseline;">Node</code><span style="vertical-align: baseline;">. The good news though was that because the workloads were under the control of our Microservice Manager, rather than us having to instruct our tenants to copy the relevant yaml into their Deployments, it was a straightforward change for us to replace the antiAffinity rules with the more modern </span><a href="https://kubernetes.io/docs/concepts/scheduling-eviction/topology-spread-constraints/" rel="noopener" target="_blank"><code style="text-decoration: underline; vertical-align: baseline;">podTopologyConstraints</code></a><span style="vertical-align: baseline;">, allowing us to customize the number of replicas that could be stacked on a Node for workloads exceeding a certain replica count. And this happened with no intervention from our tenants.</span></p> <p><span style="vertical-align: baseline;">A more complex example of this was when we rolled out our service mesh. In keeping with our general desire to let Google Cloud handle the complexity of running control planes components, we opted to use </span><a href="https://cloud.google.com/products/service-mesh"><span style="text-decoration: underline; vertical-align: baseline;">Google's Cloud Service Mesh</span></a><span style="vertical-align: baseline;"> product. But even then, rolling out a mesh to a business-critical platform in constant use is not without its risks. Microservice Manager allowed us to control the rate at which we enrolled workloads into the mesh through the use of a feature flag on the </span><code style="vertical-align: baseline;">Microservice</code><span style="vertical-align: baseline;"> resource. We could start rollout with platform-owned workloads first to test our approach, then make tenants aware of the flag for early adopters to validate and take advantage of some of Cloud Service Mesh’s features. To scale the rollout, we could then manipulate the flag to release in waves based on business importance, providing an opt-out mechanism if needed to. This again greatly simplified the implementation — product teams had very little to do, and we avoided having to chase approximately 40 teams running hundreds of Microservices to make the appropriate changes in their configuration. This feature flagging technique is something we make extensive use of to support our own experimentation.</span></p> <h3><strong style="vertical-align: baseline;">Beyond the microservice</strong></h3> <p><span style="vertical-align: baseline;">Building the Microservice Manager has led to further thinking in Kubernetes-native ways: the </span><a href="https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/#customresourcedefinitions" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">Custom Resource + Controller concept</span></a><span style="vertical-align: baseline;"> is a powerful technique, and we have built other features since using it. One example is a controller that converts the need for external connectivity into Istio resources to route via our egress gateway. Istio in particular is an example of a very powerful platform capability that comes with a high cognitive load for its users, and so is a perfect example of where platform engineering can help manage that for teams whilst still allowing them to take advantage of it. We have a number of ideas in this area now that our confidence in the technology has grown.</span></p> <p><span style="vertical-align: baseline;">In summary, the John Lewis Partnership leveraged Google Cloud and platform engineering to modernize their e-commerce operations and developer experience. By implementing a "paved road" approach with a multi-tenant architecture, they empowered development teams, accelerated deployment cycles, and simplified Kubernetes interactions using a custom Microservice CRD. This strategy allowed them to scale effectively and enhance the developer experience by reducing complexity while maintaining operational efficiency and scaling engineering teams effectively.</span></p> <p><span style="vertical-align: baseline;">To learn more about platform engineering on Google Cloud, check out some of our other articles:</span><span style="vertical-align: baseline;"> </span><a href="https://cloud.google.com/blog/products/application-development/common-myths-about-platform-engineering"><span style="text-decoration: underline; vertical-align: baseline;">5 myths about platform engineering: what it is and what it isn’t</span></a><span style="vertical-align: baseline;">, </span><a href="https://cloud.google.com/blog/products/application-development/another-five-myths-about-platform-engineering"><span style="text-decoration: underline; vertical-align: baseline;">Another five myths about platform engineering</span></a><span style="vertical-align: baseline;">, and </span><a href="https://cloud.google.com/blog/products/application-development/golden-paths-for-engineering-execution-consistency"><span style="text-decoration: underline; vertical-align: baseline;">Light the way ahead: Platform Engineering, Golden Paths, and the power of self-service</span></a><span style="vertical-align: baseline;">.</span></p></div>
  63. Sr. Staff UX Designer

    Wed, 28 May 2025 16:00:00 -0000

    <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">In the event of a cloud incident, everyone wants swift and clear communication from the cloud provider, and to be able to leverage that information effectively. </span><a href="https://cloud.google.com/blog/products/devops-sre/personalized-service-health-is-now-generally-available?e=48754805?utm_source%3Dmarketingweb"><span style="text-decoration: underline; vertical-align: baseline;">Personalized Service Health</span></a><span style="vertical-align: baseline;"> in the Google Cloud console addresses this need with fast, transparent, relevant, and actionable communications about Google Cloud service disruptions, customized to your specific footprint. This helps you to quickly identify the source of the problem, helping you answer the question, “Is it Google or is it me?” You can then integrate this information into your incident response workflows to resolve the incident more efficiently.</span></p> <p><span style="vertical-align: baseline;">We're excited to announce that you can prompt </span><a href="https://g.co/kgs/j2BVWVE" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">Gemini Cloud Assist</span></a><span style="vertical-align: baseline;"> to pull real-time information about active incidents, powered by Personalized Service Health, providing you with streamlined incident management, including discovery, impact assessment, and recovery. By combining Gemini's guidance with Personalized Service Health insights and up-to-the-minute information, you can assess the scope of impact and begin troubleshooting – all within a single, AI-driven Gemini Cloud Assist chat. Further, you  can initiate this sort of incident discovery from anywhere within the console, offering immediate access to relevant incidents without interrupting your workflow. You can also check for active incidents impacting your projects, gathering details on their scope and the latest updates directly sourced from Personalized Service Health</span><span style="vertical-align: baseline;">.</span></p></div> <div class="block-aside"><dl> <dt>aside_block</dt> <dd>&lt;ListValue: [StructValue([(&#x27;title&#x27;, &#x27;Try Google Cloud for free&#x27;), (&#x27;body&#x27;, &lt;wagtail.rich_text.RichText object at 0x7f701b3322e0&gt;), (&#x27;btn_text&#x27;, &#x27;Get started for free&#x27;), (&#x27;href&#x27;, &#x27;https://console.cloud.google.com/freetrial?redirectPath=/welcome&#x27;), (&#x27;image&#x27;, None)])]&gt;</dd> </dl></div> <div class="block-paragraph_advanced"><h3><strong style="vertical-align: baseline;">Using Gemini Cloud Assist with Personalized Service Health</strong></h3> <p><span style="vertical-align: baseline;">We designed Gemini Cloud Assist with a user-friendly layout and a well-organized information structure. Crucial details, including dynamic timelines, latest updates, symptoms, and workarounds sourced directly from Personalized Service Health, are now presented in the console, enabling conversational follow-ups. Gemini Cloud Assist highlights critical insights from Personalized Service Health, helping you refine your investigations and understand the impact of incidents.</span></p> <p><span style="vertical-align: baseline;">To illustrate the power of this integration, the following demo showcases a typical incident response workflow leveraging the combined capabilities of Gemini and Personalized Service Health.</span></p> <p><strong style="vertical-align: baseline;">Incident discovery and triage<br/></strong><span style="vertical-align: baseline;">In the crucial first moments of an incident, Gemini Cloud Assist helps you answer "Is it Google or is it me?" Gemini Cloud Assist accesses data directly from Personalized Service Health, and provides feedback on which projects and at what locations are affected by a Google Cloud incident, speeding up the triage process.</span></p> <p><span style="vertical-align: baseline;">To illustrate how you can start this process, try asking Gemini Cloud Assist questions like:</span></p> <ul> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">Is my project impacted by a Google Cloud incident?</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">Are there any incidents impacting Google Cloud at the moment?</span></p> </li> </ul></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/1_UpdatedNew.gif" alt="1 UpdatedNew"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><p><strong style="vertical-align: baseline;">Investigating and evaluating impact<br/></strong><span style="vertical-align: baseline;">Once you’ve identified a relevant Google Cloud incident, you can use Gemini Cloud Assist to delve deeper into the specifics and evaluate its impact on your environment. Furthermore, by asking follow-up questions, Gemini Cloud Assist can retrieve updates from Personalized Service Health about the incident as it evolves. You can then further investigate by asking Gemini to pinpoint exactly which of your apps or projects, and at what locations, might be affected by the reported incident.</span></p> <p><span style="vertical-align: baseline;">Here are examples of prompts you might pose to Gemini Cloud Assist:</span></p> <ul> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">Tell me more about the ongoing Incident ID [X] (Replace [X] with the Incident ID)</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">Is [X] impacted? (Replace [X] with your specific location or Google Cloud product)</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">What is the latest update on Incident ID [X]?</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">Show me the details of Incident ID [X].</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">Can you guide me through some troubleshooting steps for [impacted Google Cloud product]?</span></p> </li> </ul></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--medium h-c-grid__col h-c-grid__col--4 h-c-grid__col--offset-4 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/2_Updated.gif" alt="2"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><p><strong style="vertical-align: baseline;">Mitigation and recovery<br/></strong><span style="vertical-align: baseline;">Finally, Gemini Cloud Assist can also act as an intelligent assistant during the recovery phase, providing you with actionable guidance. You can gain access to relevant logs and monitoring data for more efficient resolution. Additionally, Gemini Cloud Assist can help surface potential workarounds from Personalized Service Health and direct you to the tools and information you need to restore your projects or applications. Here are some sample prompts:</span></p> <ul> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">What are the workarounds for the incident ID [X]? (Replace [X] with the Incident ID)</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">Can you suggest a temporary solution to keep my application running?</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">How can I find logs for this impacted project?</span></p> </li> </ul></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--medium h-c-grid__col h-c-grid__col--4 h-c-grid__col--offset-4 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/original_images/3_Updated_tpPYqpq.gif" alt="3 Updated"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">From these prompts, Gemini retrieves relevant information from Personalized Service Health to provide you with personalized insights into your Google Cloud environment's health — both for ongoing events and incidents from up to one year in the past. This helps when investigating an incident to narrow down its impact, as well as assisting in recovery. </span></p> <h3><strong style="vertical-align: baseline;">Next steps</strong></h3> <p><span style="vertical-align: baseline;">Looking ahead, we are excited to provide even deeper insights and more comprehensive incident management with Gemini Cloud Assist and Personalized Service Health, extending these AI-driven capabilities beyond a single project view. Ready to get started? </span></p> <ul> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">Learn more about </span><a href="https://cloud.google.com/service-health/docs/overview"><span style="text-decoration: underline; vertical-align: baseline;">Personalized Service Health</span></a><span style="vertical-align: baseline;">, or reach out to your account team to enable it.</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">Get started with </span><a href="https://cloud.google.com/products/gemini/cloud-assist?e=48754805?utm_source%3Dmarketingweb" style="font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, Cantarell, 'Open Sans', 'Helvetica Neue', sans-serif;"><span style="text-decoration: underline; vertical-align: baseline;">Gemini Cloud Assist</span></a><span style="vertical-align: baseline;">. Refine your prompts to ask about your specific regions or Google Cloud products, and experiment to discover how it can help you proactively manage incidents.</span></p> </li> </ul></div> <div class="block-related_article_tout"> <div class="uni-related-article-tout h-c-page"> <section class="h-c-grid"> <a href="https://cloud.google.com/blog/products/devops-sre/personalized-service-health-is-now-generally-available/" data-analytics='{ "event": "page interaction", "category": "article lead", "action": "related article - inline", "label": "article: {slug}" }' class="uni-related-article-tout__wrapper h-c-grid__col h-c-grid__col--8 h-c-grid__col-m--6 h-c-grid__col-l--6 h-c-grid__col--offset-2 h-c-grid__col-m--offset-3 h-c-grid__col-l--offset-3 uni-click-tracker"> <div class="uni-related-article-tout__inner-wrapper"> <p class="uni-related-article-tout__eyebrow h-c-eyebrow">Related Article</p> <div class="uni-related-article-tout__content-wrapper"> <div class="uni-related-article-tout__image-wrapper"> <div class="uni-related-article-tout__image" style="background-image: url('https://storage.googleapis.com/gweb-cloudblog-publish/images/psh-hero_Ty1sB8V.max-500x500.jpg')"></div> </div> <div class="uni-related-article-tout__content"> <h4 class="uni-related-article-tout__header h-has-bottom-margin">Personalized Service Health is now generally available: Get started today</h4> <p class="uni-related-article-tout__body">Personalized Service Health provides visibility into incidents relevant to your environment, allowing you to evaluate their impact and tr...</p> <div class="cta module-cta h-c-copy uni-related-article-tout__cta muted"> <span class="nowrap">Read Article <svg class="icon h-c-icon" role="presentation"> <use xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="#mi-arrow-forward"></use> </svg> </span> </div> </div> </div> </div> </a> </section> </div> </div>
  64. Staff Site Reliability Engineer, Waze

    Mon, 28 Apr 2025 16:00:00 -0000

    <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">In 2023, the Waze platform engineering team transitioned to Infrastructure as Code (IaC) using Google Cloud's </span><a href="https://cloud.google.com/config-connector/docs/overview"><span style="text-decoration: underline; vertical-align: baseline;">Config Connector</span></a><span style="vertical-align: baseline;"> (KCC) — and we haven’t looked back since. We embraced Config Connector, an open-source Kubernetes add-on, to manage Google Cloud resources through Kubernetes. To streamline management, we also leverage Config Controller, a hosted version of Config Connector on Google Kubernetes Engine (GKE), incorporating Policy Controller and Config Sync. This shift has significantly improved our infrastructure management and is shaping our future infrastructure.</span></p> <h3><strong style="vertical-align: baseline;">The shift to Config Connector</strong></h3> <p><span style="vertical-align: baseline;">Previously, Waze relied on Terraform to manage resources, particularly during our dual-cloud, VM-based phase. However, maintaining state and ensuring reconciliation proved challenging, leading to inconsistent configurations and increased management overhead.</span></p> <p><span style="vertical-align: baseline;">In 2023, we adopted Config Connector, transforming our Google Cloud infrastructure into </span><a href="https://github.com/kubernetes/design-proposals-archive/blob/main/architecture/resource-management.md" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">Kubernetes Resource Modules</span></a><span style="vertical-align: baseline;"> (KRMs) within a GKE cluster. This approach addresses the reconciliation issues encountered with Terraform. Config Sync, paired with Config Connector, automates KRM synchronization from source repositories to our live GKE cluster. This managed solution eliminates the need for us to build and maintain custom reconciliation systems.</span></p> <p><span style="vertical-align: baseline;">The shift helped us meet the needs of three key roles within Waze’s infrastructure team: </span></p> <ol> <li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">Infrastructure consumers:</strong><span style="vertical-align: baseline;"> Application developers who want to easily deploy infrastructure without worrying about the maintenance and complexity of underlying resources.</span></p> </li> <li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">Infrastructure owners:</strong><span style="vertical-align: baseline;"> Experts in specific resource types (e.g., Spanner, Google Cloud Storage, Load Balancers, etc.), who want to define and standardize best practices in how resources are created across Waze on Google Cloud.</span></p> </li> <li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">Platform engineers: </strong><span style="vertical-align: baseline;">Engineers who build the system that enables infrastructure owners to codify and define best practices, while also providing a seamless API for infrastructure consumers.</span></p> </li> </ol></div> <div class="block-aside"><dl> <dt>aside_block</dt> <dd>&lt;ListValue: [StructValue([(&#x27;title&#x27;, &#x27;$300 in free credit to try Google Cloud containers and Kubernetes&#x27;), (&#x27;body&#x27;, &lt;wagtail.rich_text.RichText object at 0x7f701ba409d0&gt;), (&#x27;btn_text&#x27;, &#x27;Start building for free&#x27;), (&#x27;href&#x27;, &#x27;http://console.cloud.google.com/freetrial?redirectpath=/marketplace/product/google/container.googleapis.com&#x27;), (&#x27;image&#x27;, None)])]&gt;</dd> </dl></div> <div class="block-paragraph_advanced"><h3><strong style="vertical-align: baseline;">First stop: Config Connector</strong></h3> <p><span style="vertical-align: baseline;">It may seem circular to define all of our Google Cloud infrastructure as KRMs within a Google Cloud service, however, KRM is actually a great representation for our infrastructure as opposed to existing IaC tooling.</span></p> <p><span style="vertical-align: baseline;">Terraform's reconciliation issues – state drift, version management, out of band changes – are a significant pain. Config Connector, through Config Sync, offers out-of-the-box reconciliation, a managed solution we prefer. Both KRM and Terraform offer templating, but KCC's managed nature aligns with our shift to Google Cloud-native solutions and reduces our maintenance burden. </span></p> <p><span style="vertical-align: baseline;">Infrastructure complexity requires generalization regardless of the tool. We can see this when we look at the Spanner requirements at Waze:</span></p> <ul> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">Consistent backups for all Spanner databases</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">Each Spanner database utilizes a dedicated Cloud Storage bucket and Service Account to automate the execution of DDL jobs.</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">All IAM policies for Spanner instances, databases, and Cloud Storage buckets are defined in code to ensure consistent and auditable access control. </span></p> </li> </ul></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/1_-_Spanner_at_Waze.max-1000x1000.jpg" alt="1 - Spanner at Waze"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">To define these resources, we evaluated various templating and rendering tools and selected Helm, a robust CNCF package manager for Kubernetes. Its strong open-source community, rich templating capabilities, and native rendering features made it a natural fit. We can now refer to our bundled infrastructure configurations as 'Charts.' While </span><a href="https://cloud.google.com/blog/products/containers-kubernetes/introducing-kube-resource-orchestrator"><span style="text-decoration: underline; vertical-align: baseline;">KRO</span></a><span style="vertical-align: baseline;"> has since emerged that achieves a similar purpose, our selection process predated its availability.</span></p> <h3><strong style="vertical-align: baseline;">Under the hood</strong></h3> <p><span style="vertical-align: baseline;">Let's open the hood and dive into how the system works and is driving value for Waze.</span></p> <ol> <li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">Waze infrastructure owners generically define Waze-flavored infrastructure in Helm Charts. </span></p> </li> <li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">I<span><span style="vertical-align: baseline;">nfrastructure consumers use these Charts with simplified inputs to generate infrastructure (</span><a href="https://www.youtube.com/watch?v=B4RI4MwXOgg" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">demo</span></a><span style="vertical-align: baseline;">).</span></span></span></p> </li> <li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">Infrastructure code is stored in repositories, enabling validation and presubmit checks.</span></p> </li> </ol> <p><span style="vertical-align: baseline;">Code is uploaded to a </span><a href="https://cloud.google.com/artifact-registry/docs"><span style="text-decoration: underline; vertical-align: baseline;">Artifact Registry</span></a><span style="vertical-align: baseline;"> where Config Sync and Config Connector align Google Cloud infrastructure with the code definitions. </span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/2_-_Provisioning_Cloud_Resources_at_Waze.max-1000x1000.jpg" alt="2 - Provisioning Cloud Resources at Waze"> </a> <figcaption class="article-image__caption "><p data-block-key="98gzx">This diagram represents a single "data domain," a collection of bounded services, databases, networks, and data. Many tech orgs today consist of Prod, QA, Staging, Development, etc.</p></figcaption> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><h3><strong style="vertical-align: baseline;">Approaching our destination</strong></h3> <p><span style="vertical-align: baseline;">So why does all of this matter? Adopting this approach allowed us to move from Infrastructure as Code to Infrastructure as Software. By treating each Chart as a software component, our infrastructure management goes beyond simple code declaration. Now, versioned Charts and configurations enable us to leverage a rich ecosystem of software practices, including sophisticated release management, automated rollbacks, and granular change tracking.</span></p> <p><span style="vertical-align: baseline;">Here's where we apply this in practice: our configuration inheritance model minimizes redundancy. Resource Charts inherit settings from Projects, which inherit from Bootstraps. All three are defined as Charts. Consequently, Bootstrap configurations apply to all Projects, and Project configurations apply to all Resources.</span></p> <p><span style="vertical-align: baseline;">Every change to our infrastructure – from changes on existing infrastructure to rolling out new resource types – can be treated like a software rollout. </span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/3_-_Resource_Inheritance.max-1000x1000.jpg" alt="3 - Resource Inheritance"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">Now that all of our infrastructure is treated like software, we can see what this does for us system-wide:</span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/4_-_Data_Domain_Flow.max-1000x1000.jpg" alt="4 - Data Domain Flow"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><h3><strong style="vertical-align: baseline;">Reaching our destination</strong></h3> <p><span style="vertical-align: baseline;">In summary, Config Connector and Config Controller have enabled Waze to achieve true Infrastructure as Software, providing a robust and scalable platform for our infrastructure needs, along with many other benefits including: </span></p> <ul> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">Infrastructure consumers receive the latest best practices through versioned updates.</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">Infrastructure owners can iterate and improve infrastructure safely.</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">Platform Engineers and Security teams are confident our resources are auditable and compliant</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">Config Connector leverages </span><a href="https://cloud.google.com/kubernetes-engine/enterprise/config-controller/docs/overview"><span style="text-decoration: underline; vertical-align: baseline;">Google's managed services</span></a><span style="vertical-align: baseline;">, reducing operational overhead.</span></p> </li> </ul></div>
  65. Engineering Manager

    Mon, 24 Feb 2025 17:00:00 -0000

    <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">Distributed tracing is a critical part of an observability stack, letting you troubleshoot latency and errors in your applications. Cloud Trace, part of </span><a href="https://cloud.google.com/stackdriver/docs"><span style="text-decoration: underline; vertical-align: baseline;">Google Cloud Observability</span></a><span style="vertical-align: baseline;">, is Google Cloud’s native tracing product, and we’ve made numerous improvements to the Trace explorer UI on top of a new analytics backend.</span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/1_Components_of_the_new_trace_explorer.max-1000x1000.jpg" alt="1_Components of the new trace explorer"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">The new Trace explorer page contains:</span></p> <ol> <li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">A filter bar with options for users to choose a Google Cloud project-based trace scope, all/root spans and a custom attribute filter.</span></p> </li> <li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">A faceted span filter pane that displays commonly used filters based on </span><a href="https://opentelemetry.io/docs/specs/semconv/general/trace/" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">OpenTelemetry conventions</span></a><span style="vertical-align: baseline;">.</span></p> </li> <li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">A visualization of matching spans including an interactive span duration heatmap (default), a span rate line chart, and a span duration percentile chart.</span></p> </li> <li aria-level="1" style="list-style-type: decimal; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">A table of matching spans that can be narrowed down further by selecting a cell of interest on the heatmap.</span></p> </li> </ol> <h3><strong style="vertical-align: baseline;">A tour of the new Trace explorer</strong></h3> <p><span style="vertical-align: baseline;">Let’s take a closer look at these new features and how you can use them to troubleshoot your applications. Imagine you’re a developer working on the </span><span style="font-style: italic; vertical-align: baseline;">checkoutservice</span><span style="vertical-align: baseline;"> of a retail webstore application and you’ve been paged because there’s an ongoing incident.</span></p></div> <div class="block-aside"><dl> <dt>aside_block</dt> <dd>&lt;ListValue: [StructValue([(&#x27;title&#x27;, &#x27;Try Google Cloud for free&#x27;), (&#x27;body&#x27;, &lt;wagtail.rich_text.RichText object at 0x7f701845c5b0&gt;), (&#x27;btn_text&#x27;, &#x27;Get started for free&#x27;), (&#x27;href&#x27;, &#x27;https://console.cloud.google.com/freetrial?redirectPath=/welcome&#x27;), (&#x27;image&#x27;, None)])]&gt;</dd> </dl></div> <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">This application is instrumented using OpenTelemetry and sends trace data to Google Cloud Trace, so you navigate to the Trace explorer page on the Google Cloud console with the context set to the Google Cloud project that hosts the </span><span style="font-style: italic; vertical-align: baseline;">checkoutservice</span><span style="vertical-align: baseline;">.</span></p> <p><span style="vertical-align: baseline;">Before starting your investigation, you remember that your admin recommended using the </span><span style="font-style: italic; vertical-align: baseline;">webstore-prod</span><span style="vertical-align: baseline;"> trace scope when investigating webstore app-wide prod issues. By using this Trace scope, you'll be able to see spans stored in other Google Cloud projects that are relevant to your investigation.</span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/2_Scope_selection.max-1000x1000.jpg" alt="2_Scope selection"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">You set the trace scope to </span><span style="font-style: italic; vertical-align: baseline;">webstore-prod</span><span style="vertical-align: baseline;"> and your queries will now include spans from all the projects included in this trace scope.</span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/3_User_Journey.max-1000x1000.jpg" alt="3_User Journey"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">You select </span><span style="font-style: italic; vertical-align: baseline;">checkoutservice</span><span style="vertical-align: baseline;"> in </span><strong style="vertical-align: baseline;">Span filters</strong><span style="vertical-align: baseline;"> (1) and the following updates load on the page:</span></p> <ul> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">Other sections such as </span><strong style="vertical-align: baseline;">Span name</strong><span style="vertical-align: baseline;"> in the span filter pane (2) are updated with counts and percentages that take into account the selection made under service name. This can help you narrow down your search criteria to be more specific.</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">The span </span><strong style="vertical-align: baseline;">Filter</strong><span style="vertical-align: baseline;"> bar (3) is updated to display the active filter.</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">The heatmap visualization (4)  is updated to only display spans from the </span><span style="font-style: italic; vertical-align: baseline;">checkoutservice</span><span style="vertical-align: baseline;"> in the last 1 hour (default). You can change the time-range using the time-picker (5). The heatmap’s x-axis is time and the y-axis is span duration. It uses color shades to denote the number of spans in each cell with a legend that indicates the corresponding range.</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">The </span><strong style="vertical-align: baseline;">Spans</strong><span style="vertical-align: baseline;"> table (6) is updated with matching spans sorted by duration (default).</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">Other </span><strong style="vertical-align: baseline;">Chart view</strong><span style="vertical-align: baseline;">s (7) that you can switch to are also updated with the applied filter.</span></p> </li> </ul> <p><span style="vertical-align: baseline;">From looking at the heatmap, you can see that there are some spans in the &gt;100s range which is abnormal and concerning. But first, you’re curious about the traffic and corresponding latency of calls handled by the </span><span style="font-style: italic; vertical-align: baseline;">checkoutservice</span><span style="vertical-align: baseline;">.</span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/4_Span_rate_line_chart.max-1000x1000.jpg" alt="4_Span rate line chart"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">Switching to the Span rate line chart gives you an idea of the traffic handled by your service. The x-axis is time and the y-axis is spans/second. The traffic handled by your service looks normal as you know from past experience that 1.5-2 spans/second is quite typical.</span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/5_Span_duration_percentile_chart.max-1000x1000.jpg" alt="5_Span duration percentile chart"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">Switching to the Span duration percentile chart gives you p50/p90/p95/p99 span duration trends. While p50 looks fine, the p9x durations are greater than you expect for your service.</span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/6_Span_selection.max-1000x1000.jpg" alt="6_Span selection"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">You switch back to the heatmap chart and select one of the outlier cells to investigate further. This particular cell has two matching spans with a duration of over 2 minutes, which is concerning.</span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/7_Trace_details__span_attributes.max-1000x1000.jpg" alt="7_Trace details &amp; span attributes"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">You investigate one of those spans by viewing the full trace and notice that the </span><span style="font-style: italic; vertical-align: baseline;">orders publish</span><span style="vertical-align: baseline;"> span is the one taking up the majority of the time when servicing this request. Given this, you form a hypothesis that the </span><span style="font-style: italic; vertical-align: baseline;">checkoutservice</span><span style="vertical-align: baseline;"> is having issues handling these types of calls. To validate your hypothesis, you note the </span><span style="font-style: italic; vertical-align: baseline;">rpc.method</span><span style="vertical-align: baseline;"> attribute being </span><span style="font-style: italic; vertical-align: baseline;">PlaceOrder</span><span style="vertical-align: baseline;"> and exit this trace using the X button.</span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/8_Custom_attribute_search.max-1000x1000.jpg" alt="8_Custom attribute search"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">You add an attribute filter for key: </span><span style="font-style: italic; vertical-align: baseline;">rpc.method</span><span style="vertical-align: baseline;"> value:</span><span style="font-style: italic; vertical-align: baseline;">PlaceOrder</span><span style="vertical-align: baseline;"> using the Filter bar, which shows you that there is a clear latency issue with </span><span style="font-style: italic; vertical-align: baseline;">PlaceOrder</span><span style="vertical-align: baseline;"> calls handled by your service. You’ve seen this issue before and know that there is a runbook that addresses it, so you alert the SRE team with the appropriate action that needs to be taken to mitigate the incident.</span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/9_Send_feedback.max-1000x1000.jpg" alt="9_Send feedback"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">Share your feedback with us via the </span><strong style="vertical-align: baseline;">Send feedback</strong><span style="vertical-align: baseline;"> button.</span></p> <h3><strong style="vertical-align: baseline;">Behind the scenes</strong></h3></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/10_Cloud_Trace_architecture.max-1000x1000.jpg" alt="10_Cloud Trace architecture"> </a> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">This new experience is powered by BigQuery, using the same platform that backs </span><a href="https://cloud.google.com/blog/products/devops-sre/introducing-cloud-loggings-log-analytics-powered-by-big-query"><span style="text-decoration: underline; vertical-align: baseline;">Log Analytics</span></a><span style="vertical-align: baseline;">. We plan to launch new features that take full advantage of this platform: SQL queries, flexible sampling, export, and regional storage.</span></p> <p><span style="vertical-align: baseline;">In summary, you can use the new Cloud Trace explorer to perform service-oriented investigations with advanced querying and visualization of trace data. This allows developers and SREs to effectively troubleshoot production incidents and identify mitigating measures to restore normal operations.</span></p> <p><span style="vertical-align: baseline;">The new Cloud Trace explorer is generally available to all users — try it out and share your feedback with us via the </span><strong style="vertical-align: baseline;">Send feedback</strong><span style="vertical-align: baseline;"> button. </span></p></div>
  66. Technical Program Manager, Google

    Thu, 20 Feb 2025 17:00:00 -0000

    <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">Picture this: you’re an Site Reliability Engineer (SRE) responsible for the systems that power your company’s machine learning (ML) services. What do you do to ensure you have a reliable ML service, how do you know you’re doing it well, and how can you build strong systems to support these services? </span></p> <p><span style="vertical-align: baseline;">As artificial intelligence (AI) becomes more widely available, its features — including ML — will matter more to SREs. That’s because ML becomes both a part of the infrastructure used in production software systems, as well as an important feature of the software itself. </span></p> <p><span style="vertical-align: baseline;">Abstractly, machine learning relies on its </span><a href="https://sre.google/workbook/data-processing/" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">pipelines</span></a><span style="vertical-align: baseline;"> … and you know how to manage those! So you can begin with pipeline management, then look to other factors that will strengthen your ML services: training, model freshness, and efficiency. In the resources below, </span><span style="vertical-align: baseline;">we'll look at some of the ML-specific characteristics of these pipelines that you’ll want to consider in your operations. Then, we draw on the experience of Google SREs</span><span style="vertical-align: baseline;"> to show you how to apply your core SRE skills to operating and managing your organization’s machine-learning pipelines. </span></p> <h3><strong style="vertical-align: baseline;">Training ML models</strong></h3> <p><span style="vertical-align: baseline;">Training ML models applies the notion of pipelines to specific types of data, often running on specialized hardware. Critical aspects to consider about the pipeline:</span></p> <ul> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">how much data you’re ingesting</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">how fresh this data needs to be</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">how the system trains and deploys the models </span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">how efficiently the system handles these first three things</span></p> </li> </ul></div> <div class="block-video"> <div class="article-module article-video "> <figure> <a class="h-c-video h-c-video--marquee" href="https://youtube.com/watch?v=8lxUmXFpovg" data-glue-modal-trigger="uni-modal-8lxUmXFpovg-" data-glue-modal-disabled-on-mobile="true"> <div class="article-video__aspect-image" style="background-image: url(https://storage.googleapis.com/gweb-cloudblog-publish/images/maxresdefault_Fw6QEDC.max-1000x1000.jpg);"> <span class="h-u-visually-hidden">SREcon22 Europe/Middle East/Africa - SRE and ML: Why It Matters</span> </div> <svg role="img" class="h-c-video__play h-c-icon h-c-icon--color-white"> <use xlink:href="#mi-youtube-icon"></use> </svg> </a> </figure> </div> <div class="h-c-modal--video" data-glue-modal="uni-modal-8lxUmXFpovg-" data-glue-modal-close-label="Close Dialog"> <a class="glue-yt-video" data-glue-yt-video-autoplay="true" data-glue-yt-video-height="99%" data-glue-yt-video-vid="8lxUmXFpovg" data-glue-yt-video-width="100%" href="https://youtube.com/watch?v=8lxUmXFpovg" ng-cloak> </a> </div> </div> <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">This keynote presents an SRE perspective on the value of applying reliability principles to the components of machine learning systems. It provides insight into why ML systems matter for products, and how SREs should think about them. The challenges that ML systems present include capacity planning, resource management, and monitoring; other challenges include understanding the cost of ML systems as part of your overall operations environment.  </span></p></div> <div class="block-aside"><dl> <dt>aside_block</dt> <dd>&lt;ListValue: [StructValue([(&#x27;title&#x27;, &#x27;Try Google Cloud for free&#x27;), (&#x27;body&#x27;, &lt;wagtail.rich_text.RichText object at 0x7f701b757520&gt;), (&#x27;btn_text&#x27;, &#x27;Get started for free&#x27;), (&#x27;href&#x27;, &#x27;https://console.cloud.google.com/freetrial?redirectPath=/welcome&#x27;), (&#x27;image&#x27;, None)])]&gt;</dd> </dl></div> <div class="block-paragraph_advanced"><h3><strong style="vertical-align: baseline;">ML freshness and data volume</strong></h3> <p><span style="vertical-align: baseline;">As with any pipeline-based system, a big part of understanding the system is describing how much data it typically ingests and processes. The </span><a href="https://sre.google/workbook/data-processing/" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">Data Processing Pipelines</span></a><span style="vertical-align: baseline;"> chapter in the SRE Workbook lays out the fundamentals: automate the pipeline’s operation so that it is resilient, and can operate unattended. </span></p> <p><span style="vertical-align: baseline;">You’ll want to develop </span><a href="https://sre.google/workbook/implementing-slos/" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">Service Level Objectives</span></a><span style="vertical-align: baseline;"> (SLOs) in order to measure the pipeline’s health, especially for data freshness, i.e., how recently the model got the data it’s using to produce an inference for a customer. Understanding freshness provides an important measure of an ML system’s health, as data that becomes stale may lead to lower-quality inferences and sub-optimal outcomes for the user. For some systems, such as weather forecasting, data may need to be very fresh (just minutes or seconds old); for other systems, such as spell-checkers, data freshness can lag on the order of days — or longer! Freshness requirements will vary by product, so it’s important that you know what you’re building and how the audience expects to use it. </span></p> <p><span style="vertical-align: baseline;">In this way, freshness is a part of the </span><a href="https://sre.google/workbook/implementing-slos/" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">critical user journey described in the SRE Workbook</span></a><span style="vertical-align: baseline;">, describing one aspect of the customer experience. You can read more about data freshness as a component of pipeline systems in the Google SRE article </span><a href="https://sre.google/resources/practices-and-processes/reliable-data-processing-with-minimal-toil/" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">Reliable Data Processing with Minimal Toil</span></a><span style="vertical-align: baseline;">.  </span></p> <p><span style="vertical-align: baseline;">There’s more than freshness to ensuring high-quality data — there’s also how you define the model-training pipeline. </span><a href="https://googlesre.page.link/reliable-ml" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">A Brief Guide To Running ML Systems in Production</span></a><span style="vertical-align: baseline;"> gives you the nuts and bolts of this discipline, from using contextual metrics to understand freshness and throughput, to methods for understanding the quality of your input data. </span></p> <h3><strong style="vertical-align: baseline;">Serving efficiency</strong></h3> <p><span style="vertical-align: baseline;">The 2021 SRE blog post </span><a href="https://www.oreilly.com/content/efficient-machine-learning-inference/" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">Efficient Machine Learning Inference</span></a><span style="vertical-align: baseline;"> provides a valuable resource to learn about improving your model’s performance in a production environment. (And remember, </span><span style="font-style: italic; vertical-align: baseline;">training</span><span style="vertical-align: baseline;"> is never the same as </span><span style="font-style: italic; vertical-align: baseline;">production</span><span style="vertical-align: baseline;"> for ML services!) </span></p> <p><span style="vertical-align: baseline;">Optimizing machine learning inference serving is crucial for real-world deployment. In this article, the authors explore multi-model serving off of a shared VM. They cover realistic use cases and how to manage trade-offs between cost, utilization, and latency of model responses. By changing the allocation of models to VMs, and varying the size and shape of those VMs in terms of processing, GPU, and RAM attached, you can improve the cost effectiveness of model serving. </span></p> <h3><strong style="vertical-align: baseline;">Cost efficiency</strong></h3> <p><span style="vertical-align: baseline;">We mentioned that these AI pipelines often rely on specialized hardware. How do you know you’re using this hardware efficiently? Todd Underwood’s talk from SREcon EMEA 2023 on </span><a href="https://www.usenix.org/conference/srecon23emea/presentation/underwood" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">Artificial Intelligence: What Will It Cost You?</span></a><span style="vertical-align: baseline;"> gives you a sense of how much this specialized hardware costs to run, and how you can provide incentives for using it efficiently.</span><span style="font-style: italic; vertical-align: baseline;"> </span></p> <h3><strong style="vertical-align: baseline;">Automation for scale</strong></h3> <p><span style="vertical-align: baseline;">This </span><a href="https://sre.google/resources/practices-and-processes/reliable-data-processing-with-minimal-toil/" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">article from Google's SRE team</span></a><span style="vertical-align: baseline;"> outlines strategies for ensuring reliable data processing while minimizing manual effort, or toil. One of the key takeaways: use an existing, standard platform for as much of the pipeline as possible. After all, your business goals should focus on innovations in presenting the data and the ML model, not in the pipeline itself. The article covers automation, monitoring, and incident response, with a focus on using these concepts to build resilient data pipelines. You’ll read best practices for designing data systems that can handle failures gracefully and reduce a team’s operational burden. This article is essential reading for anyone involved in data engineering or operations. Read more about toil in the SRE Workbook: </span><a href="https://sre.google/workbook/eliminating-toil/" rel="noopener" target="_blank"><span style="text-decoration: underline; vertical-align: baseline;">https://sre.google/workbook/eliminating-toil/</span></a><span style="vertical-align: baseline;">. </span></p> <h3><strong style="vertical-align: baseline;">Next steps</strong></h3> <p><span style="vertical-align: baseline;">Successful ML deployments require careful management and monitoring for systems to be reliable and sustainable. That means taking a holistic approach, including implementing data pipelines, training pathways, model management, and validation, alongside monitoring and accuracy metrics. To go deeper, check out this guide on how to use </span><a href="https://cloud.google.com/kubernetes-engine/docs/integrations/ai-infra"><span style="text-decoration: underline; vertical-align: baseline;">GKE for your AI orchestration</span></a><span style="vertical-align: baseline;">.</span></p></div>
  67. Will AI Kill the OSS Star?

    Mon, 09 Feb 2026 11:54:47 -0000

    <div><img width="770" height="330" src="https://devops.com/wp-content/uploads/2026/02/Screenshot-2026-02-09-12.46.32-1.png" class="attachment-large size-large wp-post-image" alt="" style="margin-bottom: 0px;" decoding="async" srcset="https://devops.com/wp-content/uploads/2026/02/Screenshot-2026-02-09-12.46.32-1.png 770w, https://devops.com/wp-content/uploads/2026/02/Screenshot-2026-02-09-12.46.32-1-290x124.png 290w, https://devops.com/wp-content/uploads/2026/02/Screenshot-2026-02-09-12.46.32-1-360x154.png 360w, https://devops.com/wp-content/uploads/2026/02/Screenshot-2026-02-09-12.46.32-1-400x171.png 400w" sizes="(max-width: 770px) 100vw, 770px" /></div><img width="150" height="150" src="https://devops.com/wp-content/uploads/2026/02/Screenshot-2026-02-09-12.46.32-1-150x150.png" class="attachment-thumbnail size-thumbnail wp-post-image" alt="" decoding="async" />As AI-driven development accelerates, open source software faces an uncomfortable paradox: Usage is rising while engagement, sustainability and community economics quietly erode. AI isn’t eliminating OSS, but it is reshaping how code is written, discovered and maintained. The result may not be the death of open source, but the end of its long reign as the default foundation of modern software.
  68. Google Launches Developer Knowledge API to Give AI Tools Access to Official Documentation

    Mon, 09 Feb 2026 08:53:38 -0000

    <div><img width="770" height="329" src="https://devops.com/wp-content/uploads/2022/02/coding-gb646cb77a_1280-e1644931732205.jpg" class="attachment-large size-large wp-post-image" alt="AI coding, teams, vibecoding, shadow, vibecoding vibe, coding, GitHub, agents, Gemini, Canvas, Gemini, code, Augment Code, code, kernel compliance-as-code software secure software Terraform infrastructure" style="margin-bottom: 0px;" decoding="async" srcset="https://devops.com/wp-content/uploads/2022/02/coding-gb646cb77a_1280-e1644931732205.jpg 770w, https://devops.com/wp-content/uploads/2022/02/coding-gb646cb77a_1280-e1644931732205-290x124.jpg 290w, https://devops.com/wp-content/uploads/2022/02/coding-gb646cb77a_1280-e1644931732205-360x154.jpg 360w, https://devops.com/wp-content/uploads/2022/02/coding-gb646cb77a_1280-e1644931732205-400x171.jpg 400w" sizes="(max-width: 770px) 100vw, 770px" /></div><img width="150" height="150" src="https://devops.com/wp-content/uploads/2022/02/coding-gb646cb77a_1280-e1644931732205-150x150.jpg" class="attachment-thumbnail size-thumbnail wp-post-image" alt="AI coding, teams, vibecoding, shadow, vibecoding vibe, coding, GitHub, agents, Gemini, Canvas, Gemini, code, Augment Code, code, kernel compliance-as-code software secure software Terraform infrastructure" decoding="async" />Google's new Developer Knowledge API and MCP server provide AI assistants with direct access to up-to-date Google developer documentation.
  69. Five Great DevOps Job Opportunities

    Mon, 09 Feb 2026 08:36:08 -0000

    <div><img width="770" height="330" src="https://devops.com/wp-content/uploads/2024/11/dream-job-4453054_1280-1.jpg" class="attachment-large size-large wp-post-image" alt="job, opportunities, DevOps, hire, skills, careers," style="margin-bottom: 0px;" decoding="async" srcset="https://devops.com/wp-content/uploads/2024/11/dream-job-4453054_1280-1.jpg 770w, https://devops.com/wp-content/uploads/2024/11/dream-job-4453054_1280-1-290x124.jpg 290w, https://devops.com/wp-content/uploads/2024/11/dream-job-4453054_1280-1-360x154.jpg 360w, https://devops.com/wp-content/uploads/2024/11/dream-job-4453054_1280-1-400x171.jpg 400w" sizes="(max-width: 770px) 100vw, 770px" /></div><img width="150" height="150" src="https://devops.com/wp-content/uploads/2024/11/dream-job-4453054_1280-1-150x150.jpg" class="attachment-thumbnail size-thumbnail wp-post-image" alt="job, opportunities, DevOps, hire, skills, careers," decoding="async" />This week's report features top employers including Capital One, Google, CLS US Services, Thrive Market, and Cisco Systems, providing insights into the job market and salaries for crucial roles in DevOps.
  70. Veracode Extends Package Firewall Reach to Microsoft Artifacts

    Fri, 06 Feb 2026 19:39:27 -0000

    <div><img width="770" height="330" src="https://devops.com/wp-content/uploads/2026/02/devsecops1.jpg" class="attachment-large size-large wp-post-image" alt="" style="margin-bottom: 0px;" decoding="async" srcset="https://devops.com/wp-content/uploads/2026/02/devsecops1.jpg 770w, https://devops.com/wp-content/uploads/2026/02/devsecops1-290x124.jpg 290w, https://devops.com/wp-content/uploads/2026/02/devsecops1-360x154.jpg 360w, https://devops.com/wp-content/uploads/2026/02/devsecops1-400x171.jpg 400w" sizes="(max-width: 770px) 100vw, 770px" /></div><img width="150" height="150" src="https://devops.com/wp-content/uploads/2026/02/devsecops1-150x150.jpg" class="attachment-thumbnail size-thumbnail wp-post-image" alt="" decoding="async" />Veracode has extended the reach of a Package Firewall that applies policies that limit what types of code can be downloaded from a repository to Azure Artifacts from Microsoft. Additionally, DevSecOps teams can now define custom policies based on package risk profiles, vulnerability thresholds, or a specific security requirement their organization has adopted. Tim Jarrett, [&#8230;]
  71. How AI Is Expanding Who Gets to Build Infrastructure

    Fri, 06 Feb 2026 16:04:07 -0000

    <div><img width="769" height="330" src="https://devops.com/wp-content/uploads/2020/09/infrastructureascode.jpg" class="attachment-large size-large wp-post-image" alt="" style="margin-bottom: 0px;" decoding="async" srcset="https://devops.com/wp-content/uploads/2020/09/infrastructureascode.jpg 769w, https://devops.com/wp-content/uploads/2020/09/infrastructureascode-290x124.jpg 290w, https://devops.com/wp-content/uploads/2020/09/infrastructureascode-150x64.jpg 150w, https://devops.com/wp-content/uploads/2020/09/infrastructureascode-500x215.jpg 500w" sizes="(max-width: 769px) 100vw, 769px" /></div><img width="150" height="150" src="https://devops.com/wp-content/uploads/2020/09/infrastructureascode-150x150.jpg" class="attachment-thumbnail size-thumbnail wp-post-image" alt="" decoding="async" srcset="https://devops.com/wp-content/uploads/2020/09/infrastructureascode-150x150.jpg 150w, https://devops.com/wp-content/uploads/2020/09/infrastructureascode-50x50.jpg 50w, https://devops.com/wp-content/uploads/2020/09/infrastructureascode-266x266.jpg 266w" sizes="(max-width: 150px) 100vw, 150px" />Pavlo Baron, co-founder and CEO of Platform Engineering Labs, unpacks what’s changing in platform engineering as AI reshapes who gets to build, and how infrastructure actually gets managed. Baron traces the origin story back to his time building high-scale systems at Instana (which exited to IBM in 2020), where the reality of “always-on” platforms made [&#8230;]
  72. Qodo Adds Multiple AI Agents to Code Review Platform

    Fri, 06 Feb 2026 12:15:05 -0000

    <div><img width="770" height="330" src="https://devops.com/wp-content/uploads/2023/06/news2.jpg" class="attachment-large size-large wp-post-image" alt="PagerDuty, Harness, Qwiet, HashiCorp, Harness, Kong, API, sentry, Wiz, Veracode, ASPM," style="margin-bottom: 0px;" decoding="async" srcset="https://devops.com/wp-content/uploads/2023/06/news2.jpg 770w, https://devops.com/wp-content/uploads/2023/06/news2-290x124.jpg 290w, https://devops.com/wp-content/uploads/2023/06/news2-360x154.jpg 360w, https://devops.com/wp-content/uploads/2023/06/news2-400x171.jpg 400w" sizes="(max-width: 770px) 100vw, 770px" /></div><img width="150" height="150" src="https://devops.com/wp-content/uploads/2023/06/news2-150x150.jpg" class="attachment-thumbnail size-thumbnail wp-post-image" alt="PagerDuty, Harness, Qwiet, HashiCorp, Harness, Kong, API, sentry, Wiz, Veracode, ASPM," decoding="async" />Qodo 2.0 adds memory-enabled, task-specific AI agents to its LLM-based code-review platform, improving defect recall and F1 performance to help DevOps scale code quality as AI-generated code rises.
  73. Beyond Test Case Generation: How to Create Intelligent Quality Ecosystems 

    Fri, 06 Feb 2026 11:35:35 -0000

    <div><img width="770" height="330" src="https://devops.com/wp-content/uploads/2020/11/How-to-Improve-Code-Quality-Efficiently-with-Static-Analysis.jpg" class="attachment-large size-large wp-post-image" alt="" style="margin-bottom: 0px;" decoding="async" srcset="https://devops.com/wp-content/uploads/2020/11/How-to-Improve-Code-Quality-Efficiently-with-Static-Analysis.jpg 770w, https://devops.com/wp-content/uploads/2020/11/How-to-Improve-Code-Quality-Efficiently-with-Static-Analysis-290x124.jpg 290w, https://devops.com/wp-content/uploads/2020/11/How-to-Improve-Code-Quality-Efficiently-with-Static-Analysis-150x64.jpg 150w, https://devops.com/wp-content/uploads/2020/11/How-to-Improve-Code-Quality-Efficiently-with-Static-Analysis-500x214.jpg 500w" sizes="(max-width: 770px) 100vw, 770px" /></div><img width="150" height="150" src="https://devops.com/wp-content/uploads/2020/11/How-to-Improve-Code-Quality-Efficiently-with-Static-Analysis-150x150.jpg" class="attachment-thumbnail size-thumbnail wp-post-image" alt="" decoding="async" srcset="https://devops.com/wp-content/uploads/2020/11/How-to-Improve-Code-Quality-Efficiently-with-Static-Analysis-150x150.jpg 150w, https://devops.com/wp-content/uploads/2020/11/How-to-Improve-Code-Quality-Efficiently-with-Static-Analysis-50x50.jpg 50w, https://devops.com/wp-content/uploads/2020/11/How-to-Improve-Code-Quality-Efficiently-with-Static-Analysis-266x266.jpg 266w" sizes="(max-width: 150px) 100vw, 150px" />Move GenAI in QA from test‑factory to life‑cycle intelligence: AI proposes coverage and data, humans review, deterministic automation executes—focus on risk‑aligned coverage, drift detection, and governance.
  74. The 10-Layer Monitoring Framework That Saved Our Clients From 3 a.m. Pages 

    Fri, 06 Feb 2026 08:33:08 -0000

    <div><img width="770" height="330" src="https://devops.com/wp-content/uploads/2020/06/Application-performance-monitoring.jpg" class="attachment-large size-large wp-post-image" alt="framework, alerts, monitoring" style="margin-bottom: 0px;" decoding="async" srcset="https://devops.com/wp-content/uploads/2020/06/Application-performance-monitoring.jpg 770w, https://devops.com/wp-content/uploads/2020/06/Application-performance-monitoring-290x124.jpg 290w, https://devops.com/wp-content/uploads/2020/06/Application-performance-monitoring-150x64.jpg 150w, https://devops.com/wp-content/uploads/2020/06/Application-performance-monitoring-500x214.jpg 500w" sizes="(max-width: 770px) 100vw, 770px" /></div><img width="150" height="150" src="https://devops.com/wp-content/uploads/2020/06/Application-performance-monitoring-150x150.jpg" class="attachment-thumbnail size-thumbnail wp-post-image" alt="framework, alerts, monitoring" decoding="async" srcset="https://devops.com/wp-content/uploads/2020/06/Application-performance-monitoring-150x150.jpg 150w, https://devops.com/wp-content/uploads/2020/06/Application-performance-monitoring-50x50.jpg 50w, https://devops.com/wp-content/uploads/2020/06/Application-performance-monitoring-266x266.jpg 266w" sizes="(max-width: 150px) 100vw, 150px" />A practical 10-layer monitoring framework for Kubernetes and VM environments that prioritizes what to watch—system, application, HTTP/RUM, databases, caches, queues, tracing, SSL, external deps, and log patterns—to prevent outages and reduce noisy alerts.
  75. Is Claude Opus 4.6 the Best Security Researcher Ever?

    Fri, 06 Feb 2026 06:45:10 -0000

    <div><img width="770" height="330" src="https://devops.com/wp-content/uploads/2026/02/Screenshot-2026-02-06-07.42.40-1.png" class="attachment-large size-large wp-post-image" alt="" style="margin-bottom: 0px;" decoding="async" srcset="https://devops.com/wp-content/uploads/2026/02/Screenshot-2026-02-06-07.42.40-1.png 770w, https://devops.com/wp-content/uploads/2026/02/Screenshot-2026-02-06-07.42.40-1-290x124.png 290w, https://devops.com/wp-content/uploads/2026/02/Screenshot-2026-02-06-07.42.40-1-360x154.png 360w, https://devops.com/wp-content/uploads/2026/02/Screenshot-2026-02-06-07.42.40-1-400x171.png 400w" sizes="(max-width: 770px) 100vw, 770px" /></div><img width="150" height="150" src="https://devops.com/wp-content/uploads/2026/02/Screenshot-2026-02-06-07.42.40-1-150x150.png" class="attachment-thumbnail size-thumbnail wp-post-image" alt="" decoding="async" />Anthropic’s Claude Opus 4.6 uncovered more than 600 previously unknown vulnerabilities in widely used open source software, raising new questions about AI-driven security research, vulnerability management, and defensive readiness.
  76. Survey Surfaces More Focus on Software Security Testing and API Security

    Wed, 04 Feb 2026 19:29:12 -0000

    <div><img width="770" height="330" src="https://devops.com/wp-content/uploads/2020/09/API.jpg" class="attachment-large size-large wp-post-image" alt="GraphQL API" style="margin-bottom: 0px;" decoding="async" srcset="https://devops.com/wp-content/uploads/2020/09/API.jpg 770w, https://devops.com/wp-content/uploads/2020/09/API-290x124.jpg 290w, https://devops.com/wp-content/uploads/2020/09/API-150x64.jpg 150w, https://devops.com/wp-content/uploads/2020/09/API-500x214.jpg 500w" sizes="(max-width: 770px) 100vw, 770px" /></div><img width="150" height="150" src="https://devops.com/wp-content/uploads/2020/09/API-150x150.jpg" class="attachment-thumbnail size-thumbnail wp-post-image" alt="GraphQL API" decoding="async" srcset="https://devops.com/wp-content/uploads/2020/09/API-150x150.jpg 150w, https://devops.com/wp-content/uploads/2020/09/API-50x50.jpg 50w, https://devops.com/wp-content/uploads/2020/09/API-266x266.jpg 266w" sizes="(max-width: 150px) 100vw, 150px" />A global survey of 828 enterprise IT professionals conducted by the Futurum Group finds well over a third of respondents expect their organization to increase spending on software security testing (39%) and application programming interface (API) security (36%) over the next 12 to 18 months. Overall, about 35% said they also plan to make some [&#8230;]
  77. Project Genie: Experimenting with infinite, interactive worlds

    Thu, 29 Jan 2026 17:01:05 -0000

    Google AI Ultra subscribers in the U.S. can try out Project Genie, an experimental research prototype that lets you create and explore worlds.
  78. D4RT: Teaching AI to see the world in four dimensions

    Fri, 16 Jan 2026 10:39:00 -0000

    D4RT: Unified, efficient 4D reconstruction and tracking up to 300x faster than prior methods.
  79. Veo 3.1 Ingredients to Video: More consistency, creativity and control

    Tue, 13 Jan 2026 17:00:18 -0000

    Our latest Veo update generates lively, dynamic clips that feel natural and engaging — and supports vertical video generation.
  80. Google's year in review: 8 areas with research breakthroughs in 2025

    Tue, 23 Dec 2025 17:01:02 -0000

    Google 2025 recap: Research breakthroughs of the year
  81. Gemini 3 Flash: frontier intelligence built for speed

    Wed, 17 Dec 2025 11:58:17 -0000

    Gemini 3 Flash offers frontier intelligence built for speed at a fraction of the cost.
  82. Gemma Scope 2: helping the AI safety community deepen understanding of complex language model behavior

    Tue, 16 Dec 2025 10:14:24 -0000

    Open interpretability tools for language models are now available across the entire Gemma 3 family with the release of Gemma Scope 2.
  83. Improved Gemini audio models for powerful voice experiences

    Fri, 12 Dec 2025 17:50:50 -0000

  84. Deepening our partnership with the UK AI Security Institute

    Thu, 11 Dec 2025 00:06:40 -0000

    Google DeepMind and UK AI Security Institute (AISI) strengthen collaboration on critical AI safety and security research
  85. Strengthening our partnership with the UK government to support prosperity and security in the AI era

    Wed, 10 Dec 2025 14:59:21 -0000

    Deepening our partnership with the UK government to support prosperity and security in the AI era
  86. FACTS Benchmark Suite: Systematically evaluating the factuality of large language models

    Tue, 09 Dec 2025 11:29:03 -0000

    Systematically evaluating the factuality of large language models with the FACTS Benchmark Suite.
  87. Engineering more resilient crops for a warming climate

    Thu, 04 Dec 2025 16:23:24 -0000

    Scientists are using AlphaFold to strengthen a photosynthesis enzyme for resilient, heat-tolerant crops.
  88. AlphaFold: Five years of impact

    Tue, 25 Nov 2025 16:00:12 -0000

    Explore how AlphaFold has accelerated science and fueled a global wave of biological discovery.
  89. Revealing a key protein behind heart disease

    Tue, 25 Nov 2025 15:52:51 -0000

    AlphaFold has revealed the structure of a key protein behind heart disease
  90. Google DeepMind supports U.S. Department of Energy on Genesis: a national mission to accelerate innovation and scientific discovery

    Mon, 24 Nov 2025 14:12:03 -0000

    Google DeepMind and the DOE partner on Genesis, a new effort to accelerate science with AI.
  91. How we’re bringing AI image verification to the Gemini app

    Thu, 20 Nov 2025 15:13:19 -0000

  92. Build with Nano Banana Pro, our Gemini 3 Pro Image model

    Thu, 20 Nov 2025 15:11:14 -0000

  93. Introducing Nano Banana Pro

    Thu, 20 Nov 2025 15:05:02 -0000

  94. Start building with Gemini 3

    Tue, 18 Nov 2025 17:49:13 -0000

  95. We’re expanding our presence in Singapore to advance AI in the Asia-Pacific region

    Tue, 18 Nov 2025 17:00:00 -0000

    Google DeepMind opens a new Singapore research lab, accelerating AI progress in the Asia-Pacific region.
  96. A new era of intelligence with Gemini 3

    Tue, 18 Nov 2025 16:06:41 -0000

  97. Introducing Google Antigravity

    Tue, 18 Nov 2025 16:06:32 -0000

  98. WeatherNext 2: Our most advanced weather forecasting model

    Mon, 17 Nov 2025 15:09:23 -0000

    The new AI model delivers more efficient, more accurate and higher-resolution global weather predictions.
  99. SIMA 2: An Agent that Plays, Reasons, and Learns With You in Virtual 3D Worlds

    Thu, 13 Nov 2025 14:52:18 -0000

    Introducing SIMA 2, a Gemini-powered AI agent that can think, understand, and take actions in interactive environments.
  100. Teaching AI to see the world more like we do

    Tue, 11 Nov 2025 11:49:13 -0000

    Our new paper analyzes the important ways AI systems organize the visual world differently from humans.
  101. How AI is giving Northern Ireland teachers time back

    Mon, 10 Nov 2025 16:50:39 -0000

    A six-month long pilot program with the Northern Ireland Education Authority’s C2k initiative found that integrating Gemini and other generative AI tools saved participating teachers an average of 10 hours per week.
  102. Mapping, modeling, and understanding nature with AI

    Wed, 05 Nov 2025 16:59:46 -0000

    AI models can help map species, protect forests and listen to birds around the world
  103. Accelerating discovery with the AI for Math Initiative

    Wed, 29 Oct 2025 14:31:13 -0000

    The initiative brings together some of the world's most prestigious research institutions to pioneer the use of AI in mathematical research.
  104. T5Gemma: A new collection of encoder-decoder Gemma models

    Sat, 25 Oct 2025 18:14:00 -0000

    Introducing T5Gemma, a new collection of encoder-decoder LLMs.
  105. MedGemma: Our most capable open models for health AI development

    Sat, 25 Oct 2025 18:02:50 -0000

    We’re announcing new multimodal models in the MedGemma collection, our most capable open models for health AI development.
  106. Introducing Gemma 3n: The developer guide

    Sat, 25 Oct 2025 17:54:47 -0000

    Gemma 3n is designed for the developer community that helped shape Gemma.
  107. Gemini 2.5 Flash-Lite is now ready for scaled production use

    Sat, 25 Oct 2025 17:34:32 -0000

    Gemini 2.5 Flash-Lite, previously in preview, is now stable and generally available. This cost-efficient model provides high quality in a small size, and includes 2.5 family features like a 1 million-token context window and multimodality.
  108. Behind “ANCESTRA”: combining Veo with live-action filmmaking

    Sat, 25 Oct 2025 17:27:10 -0000

    We partnered with Darren Aronofsky, Eliza McNitt and a team of more than 200 people to make a film using Veo and live-action filmmaking.
  109. AlphaEarth Foundations helps map our planet in unprecedented detail

    Fri, 24 Oct 2025 19:06:32 -0000

    New AI model integrates petabytes of Earth observation data to generate a unified data representation that revolutionizes global mapping and monitoring
  110. Exploring the context of online images with Backstory

    Fri, 24 Oct 2025 03:17:11 -0000

    New experimental AI tool helps people explore the context and origin of images seen online.
  111. Advanced version of Gemini with Deep Think officially achieves gold-medal standard at the International Mathematical Olympiad

    Fri, 24 Oct 2025 03:12:29 -0000

    The International Mathematical Olympiad (“IMO”) is the world’s most prestigious competition for young mathematicians, and has been held annually since 1959. Each country taking part is represented by six elite, pre-university mathematicians who compete to solve six exceptionally difficult problems in algebra, combinatorics, geometry, and number theory.
  112. Aeneas transforms how historians connect the past

    Fri, 24 Oct 2025 02:58:37 -0000

    Introducing the first model for contextualizing ancient inscriptions, designed to help historians better interpret, attribute and restore fragmentary texts.
  113. Genie 3: A new frontier for world models

    Fri, 24 Oct 2025 02:54:30 -0000

    Genie 3 can generate dynamic worlds that you can navigate in real time at 24 frames per second, retaining consistency for a few minutes at a resolution of 720p.
  114. How AI is helping advance the science of bioacoustics to save endangered species

    Fri, 24 Oct 2025 02:30:54 -0000

    Our new Perch model helps conservationists analyze audio faster to protect endangered species, from Hawaiian honeycreepers to coral reefs.
  115. Using AI to perceive the universe in greater depth

    Fri, 24 Oct 2025 02:21:07 -0000

    Using AI to perceive the universe in greater depth
  116. Gemini achieves gold-medal level at the International Collegiate Programming Contest World Finals

    Fri, 24 Oct 2025 00:22:10 -0000

    Gemini 2.5 Deep Think achieves breakthrough performance at the world’s most prestigious computer programming competition, demonstrating a profound leap in abstract problem solving.
  117. Discovering new solutions to century-old problems in fluid dynamics

    Fri, 24 Oct 2025 00:02:06 -0000

    Our new method could help mathematicians leverage AI techniques to tackle long-standing challenges in mathematics, physics and engineering.
  118. Strengthening our Frontier Safety Framework

    Thu, 23 Oct 2025 23:44:10 -0000

    We’re strengthening the Frontier Safety Framework (FSF) to help identify and mitigate severe risks from advanced AI models.
  119. Gemini Robotics 1.5 brings AI agents into the physical world

    Thu, 23 Oct 2025 23:33:58 -0000

    We’re powering an era of physical agents — enabling robots to perceive, plan, think, use tools and act to better solve complex, multi-step tasks.
  120. Introducing CodeMender: an AI agent for code security

    Thu, 23 Oct 2025 23:05:51 -0000

    Using advanced AI to fix critical software vulnerabilities
  121. Bringing AI to the next generation of fusion energy

    Thu, 23 Oct 2025 22:04:14 -0000

    We’re partnering with Commonwealth Fusion Systems (CFS) to bring clean, safe, limitless fusion energy closer to reality.
  122. Try Deep Think in the Gemini app

    Thu, 23 Oct 2025 18:54:19 -0000

    We're rolling out Deep Think in the Gemini app for Google AI Ultra subscribers, and we're giving select mathematicians access to the full version of the Gemini 2.5 Deep Think model entered into the IMO competition.
  123. Rethinking how we measure AI intelligence

    Thu, 23 Oct 2025 18:52:06 -0000

    Game Arena is a new, open-source platform for rigorous evaluation of AI models. It allows for head-to-head comparison of frontier systems in environments with clear winning conditions.
  124. Introducing Gemma 3 270M: The compact model for hyper-efficient AI

    Thu, 23 Oct 2025 18:50:11 -0000

    Today, we're adding a new, highly specialized tool to the Gemma 3 toolkit: Gemma 3 270M, a compact, 270-million parameter model.
  125. Image editing in Gemini just got a major upgrade

    Thu, 23 Oct 2025 18:48:30 -0000

    Transform images in amazing new ways with updated native image editing in the Gemini app.
  126. VaultGemma: The world's most capable differentially private LLM

    Thu, 23 Oct 2025 18:42:54 -0000

    We introduce VaultGemma, the most capable model trained from scratch with differential privacy.
  127. Introducing the Gemini 2.5 Computer Use model

    Thu, 23 Oct 2025 18:40:34 -0000

    Available in preview via the API, our Computer Use model is a specialized model built on Gemini 2.5 Pro’s capabilities to power agents that can interact with user interfaces.
  128. Introducing Veo 3.1 and advanced creative capabilities

    Thu, 23 Oct 2025 18:38:55 -0000

    We’re rolling out significant updates to Veo that give people even more creative control.
  129. How a Gemma model helped discover a new potential cancer therapy pathway

    Thu, 23 Oct 2025 18:22:55 -0000

    We’re launching a new 27 billion parameter foundation model for single-cell analysis built on the Gemma family of open models.
  130. AlphaGenome: AI for better understanding the genome

    Wed, 25 Jun 2025 13:59:00 -0000

    Introducing a new, unifying DNA sequence model that advances regulatory variant-effect prediction and promises to shed new light on genome function — now available via API.
  131. Gemini Robotics On-Device brings AI to local robotic devices

    Tue, 24 Jun 2025 14:00:00 -0000

    We’re introducing an efficient, on-device robotics model with general-purpose dexterity and fast task adaptation.
  132. Gemini 2.5: Updates to our family of thinking models

    Tue, 17 Jun 2025 16:00:00 -0000

    Explore the latest Gemini 2.5 model updates with enhanced performance and accuracy: Gemini 2.5 Pro now stable, Flash generally available, and the new Flash-Lite in preview.
  133. We’re expanding our Gemini 2.5 family of models

    Tue, 17 Jun 2025 16:00:00 -0000

    Gemini 2.5 Flash and Pro are now generally available, and we’re introducing 2.5 Flash-Lite, our most cost-efficient and fastest 2.5 model yet.
  134. How we're supporting better tropical cyclone prediction with AI

    Thu, 12 Jun 2025 15:00:00 -0000

    We’re launching Weather Lab, featuring our experimental cyclone predictions, and we’re partnering with the U.S. National Hurricane Center to support their forecasts and warnings this cyclone season.
  135. Advanced audio dialog and generation with Gemini 2.5

    Tue, 03 Jun 2025 17:15:47 -0000

    Gemini 2.5 has new capabilities in AI-powered audio dialog and generation.
  136. Gemini 2.5: Our most intelligent models are getting even better

    Tue, 20 May 2025 09:45:00 -0000

    Gemini 2.5 Pro continues to be loved by developers as the best model for coding, and 2.5 Flash is getting even better with a new update. We’re bringing new capabilities to our models, including Deep Think, an experimental enhanced reasoning mode for 2.5 Pro.
  137. SynthID Detector — a new portal to help identify AI-generated content

    Tue, 20 May 2025 09:45:00 -0000

    Learn about the new SynthID Detector portal we announced at I/O to help people understand how the content they see online was generated.
  138. Advancing Gemini's security safeguards

    Tue, 20 May 2025 09:45:00 -0000

    We’ve made Gemini 2.5 our most secure model family to date.
  139. Announcing Gemma 3n preview: Powerful, efficient, mobile-first AI

    Tue, 20 May 2025 09:45:00 -0000

    Gemma 3n is a cutting-edge open model designed for fast, multimodal AI on devices, featuring optimized performance, unique flexibility with a 2-in-1 model, and expanded multimodal understanding with audio, empowering developers to build live, interactive applications and sophisticated audio-centric experiences.
  140. Fuel your creativity with new generative media models and tools

    Tue, 20 May 2025 09:45:00 -0000

    Introducing Veo 3 and Imagen 4, and a new tool for filmmaking called Flow.
  141. Our vision for building a universal AI assistant

    Tue, 20 May 2025 09:45:00 -0000

    We’re extending Gemini to become a world model that can make plans and imagine new experiences by simulating aspects of the world.
  142. AlphaEvolve: A Gemini-powered coding agent for designing advanced algorithms

    Wed, 14 May 2025 14:59:00 -0000

    New AI agent evolves algorithms for math and practical applications in computing by combining the creativity of large language models with automated evaluators
  143. Gemini 2.5 Pro Preview: even better coding performance

    Tue, 06 May 2025 15:06:55 -0000

    We’ve seen developers doing amazing things with Gemini 2.5 Pro, so we decided to release an updated version a couple of weeks early to get into developers hands sooner.
  144. Build rich, interactive web apps with an updated Gemini 2.5 Pro

    Tue, 06 May 2025 15:00:00 -0000

    Our updated version of Gemini 2.5 Pro Preview has improved capabilities for coding.
  145. Music AI Sandbox, now with new features and broader access

    Thu, 24 Apr 2025 15:01:00 -0000

    Helping music professionals explore the potential of generative AI
  146. Introducing Gemini 2.5 Flash

    Thu, 17 Apr 2025 19:02:00 -0000

    Gemini 2.5 Flash is our first fully hybrid reasoning model, giving developers the ability to turn thinking on or off.
  147. Generate videos in Gemini and Whisk with Veo 2

    Tue, 15 Apr 2025 17:00:00 -0000

    Transform text-based prompts into high-resolution eight-second videos in Gemini Advanced and use Whisk Animate to turn images into eight-second animated clips.
  148. DolphinGemma: How Google AI is helping decode dolphin communication

    Mon, 14 Apr 2025 17:00:00 -0000

    DolphinGemma, a large language model developed by Google, is helping scientists study how dolphins communicate — and hopefully find out what they're saying, too.
  149. Taking a responsible path to AGI

    Wed, 02 Apr 2025 13:31:00 -0000

    We’re exploring the frontiers of AGI, prioritizing technical safety, proactive risk assessment, and collaboration with the AI community.
  150. Evaluating potential cybersecurity threats of advanced AI

    Wed, 02 Apr 2025 13:30:00 -0000

    Our framework enables cybersecurity experts to identify which defenses are necessary—and how to prioritize them
  151. Gemini 2.5: Our most intelligent AI model

    Tue, 25 Mar 2025 17:00:36 -0000

    Gemini 2.5 is our most intelligent AI model, now with thinking built in.
  152. Gemini Robotics brings AI into the physical world

    Wed, 12 Mar 2025 15:00:00 -0000

    Introducing Gemini Robotics and Gemini Robotics-ER, AI models designed for robots to understand, act and react to the physical world.
  153. Experiment with Gemini 2.0 Flash native image generation

    Wed, 12 Mar 2025 14:58:00 -0000

    Native image output is available in Gemini 2.0 Flash for developers to experiment with in Google AI Studio and the Gemini API.
  154. Introducing Gemma 3

    Wed, 12 Mar 2025 08:00:00 -0000

    The most capable model you can run on a single GPU or TPU.
  155. Start building with Gemini 2.0 Flash and Flash-Lite

    Tue, 25 Feb 2025 18:02:12 -0000

    Gemini 2.0 Flash-Lite is now generally available in the Gemini API for production use in Google AI Studio and for enterprise customers on Vertex AI
  156. Gemini 2.0 is now available to everyone

    Wed, 05 Feb 2025 16:00:00 -0000

    We’re announcing new updates to Gemini 2.0 Flash, plus introducing Gemini 2.0 Flash-Lite and Gemini 2.0 Pro Experimental.
  157. Updating the Frontier Safety Framework

    Tue, 04 Feb 2025 16:41:00 -0000

    Our next iteration of the FSF sets out stronger security protocols on the path to AGI
  158. FACTS Grounding: A new benchmark for evaluating the factuality of large language models

    Tue, 17 Dec 2024 15:29:00 -0000

    Our comprehensive benchmark and online leaderboard offer a much-needed measure of how accurately LLMs ground their responses in provided source material and avoid hallucinations
  159. State-of-the-art video and image generation with Veo 2 and Imagen 3

    Mon, 16 Dec 2024 17:01:16 -0000

    We’re rolling out a new, state-of-the-art video model, Veo 2, and updates to Imagen 3. Plus, check out our new experiment, Whisk.
  160. Introducing Gemini 2.0: our new AI model for the agentic era

    Wed, 11 Dec 2024 15:30:40 -0000

    Today, we’re announcing Gemini 2.0, our most capable multimodal AI model yet.
  161. Google DeepMind at NeurIPS 2024

    Thu, 05 Dec 2024 17:45:00 -0000

    Advancing adaptive AI agents, empowering 3D scene creation, and innovating LLM training for a smarter, safer future
  162. GenCast predicts weather and the risks of extreme conditions with state-of-the-art accuracy

    Wed, 04 Dec 2024 15:59:00 -0000

    New AI model advances the prediction of weather uncertainties and risks, delivering faster, more accurate forecasts up to 15 days ahead
  163. Genie 2: A large-scale foundation world model

    Wed, 04 Dec 2024 14:23:00 -0000

    Generating unlimited diverse training environments for future general agents
  164. AlphaQubit tackles one of quantum computing’s biggest challenges

    Wed, 20 Nov 2024 18:00:00 -0000

    Our new AI system accurately identifies errors inside quantum computers, helping to make this new technology more reliable.
  165. The AI for Science Forum: A new era of discovery

    Mon, 18 Nov 2024 19:57:00 -0000

    The AI Science Forum highlights AI's present and potential role in revolutionizing scientific discovery and solving global challenges, emphasizing collaboration between the scientific community, policymakers, and industry leaders.
  166. Pushing the frontiers of audio generation

    Wed, 30 Oct 2024 15:00:00 -0000

    Our pioneering speech generation technologies are helping people around the world interact with more natural, conversational and intuitive digital assistants and AI tools.
  167. New generative AI tools open the doors of music creation

    Wed, 23 Oct 2024 16:53:00 -0000

    Our latest AI music technologies are now available in MusicFX DJ, Music AI Sandbox and YouTube Shorts
  168. Demis Hassabis & John Jumper awarded Nobel Prize in Chemistry

    Wed, 09 Oct 2024 11:45:00 -0000

    The award recognizes their work developing AlphaFold, a groundbreaking AI system that predicts the 3D structure of proteins from their amino acid sequences.
  169. How AlphaChip transformed computer chip design

    Thu, 26 Sep 2024 14:08:00 -0000

    Our AI method has accelerated and optimized chip design, and its superhuman chip layouts are used in hardware around the world.
  170. Updated production-ready Gemini models, reduced 1.5 Pro pricing, increased rate limits, and more

    Tue, 24 Sep 2024 16:03:03 -0000

    We’re releasing two updated production-ready Gemini models
  171. Empowering YouTube creators with generative AI

    Wed, 18 Sep 2024 14:30:06 -0000

    New video generation technology in YouTube Shorts will help millions of people realize their creative vision
  172. Our latest advances in robot dexterity

    Thu, 12 Sep 2024 14:00:00 -0000

    Two new AI systems, ALOHA Unleashed and DemoStart, help robots learn to perform complex tasks that require dexterous movement
  173. AlphaProteo generates novel proteins for biology and health research

    Thu, 05 Sep 2024 15:00:00 -0000

    New AI system designs proteins that successfully bind to target molecules, with potential for advancing drug design, disease understanding and more.
  174. FermiNet: Quantum physics and chemistry from first principles

    Thu, 22 Aug 2024 19:00:00 -0000

    Using deep learning to solve fundamental problems in computational quantum chemistry and explore how matter interacts with light
  175. Mapping the misuse of generative AI

    Fri, 02 Aug 2024 10:50:58 -0000

    New research analyzes the misuse of multimodal generative AI today, in order to help build safer and more responsible technologies.
  176. Gemma Scope: helping the safety community shed light on the inner workings of language models

    Wed, 31 Jul 2024 15:59:19 -0000

    Announcing a comprehensive, open suite of sparse autoencoders for language model interpretability.