The Path to Autopilot Runs Through Automation and Augmentation for SREs‍

Insights
|
July 24, 2025
|
Ananda Rajagopal
|
Read time:
5 Mins

Ananda Rajagopal, Co-founder and Chief Product Officer, Ciroos

In every industry and team today, the debate is how far AI can take human tasks toward full autonomy. Site reliability engineers (SREs) are asking the same question. At SRECon25 this year, the closing plenary — “AIOps: Prove It!” — called out vendor hype and warned of fatigue inside SRE teams. Such skepticism is healthy; when claims outrun reality, engineers pay the price. This skepticism from SREs is largely due to the reality they face — lots of tribal knowledge, CMDB systems and runbooks that are rarely (never?) up to date, and specialized context that often comes with experience. Turns out there is no gzip for experience!

Having spent more than 15 years in observability and monitoring, my perspective is that autopilot—that fully autonomous operations state—is an inspiring destination. However, focusing solely on it misses important nuances. We first need to examine the day‑to‑day workflow: which steps are pure toil and which are rewarding problem‑solving? That lens reveals three distinct yet connected modes in which AI can assist an SRE:

  1. Automate – AI executes well-defined tasks at scale.
  2. Augment – AI expands human decision-making with large-scale analysis and richer workflows.
  3. Autopilot – AI autonomously completes the job end‑to‑end once the user is ready to delegate control.

AI tools for SREs must enable a smooth flow from automation through augmentation to full autopilot, matching how teams naturally work. Basic regex rules and systems that create user overload are inadequate in the AI era; instead, the system needs to spot patterns, propose automations, and turn intent into action—correlating issues across domains and suggesting concrete fixes. To understand this better, let’s break down the definitions of automate, augment, and autopilot and what each of them entails.

‍

Automate Augment Autopilot
Definition A deterministic task fires on a trigger, takes known inputs, and performs specific actions. AI performs broad analysis so the user can decide with confidence. The desired outcome is clear, but AI is entrusted with the exact steps.
Best-fit use cases The user knows when, what, and how. The user knows when and how but needs help with what. The user knows when but needs help with what and how.
Role of user Validate Collaborate Govern
Key capabilities of AI system System with a highly functional UX that surfaces and executes new automation opportunities. System with a highly functional UX and a human-in-the-loop with rich contextual engagement. System makes accurate decisions, has high coverage of user workflows, and earns user trust.
Primary benefit Efficiency at scale Decision-making at scale Efficiency and decisions at scale

‍

Smart automation is already here

Automation is anything but “dumb.” Modern AI-native systems spot opportunities a rule-based engine never could. A large language model (LLM) with memory, for example, can infer a user’s preferred responses and suggest new automations without explicit prompts. Reasoning LLMs that crawl hundreds of sources and distill the findings turn hours of toil into key insights in minutes. These improvements, while incremental, stack up quickly.

‍

A detour behind the wheel

Let me borrow an analogy from the automotive world, since this industry has seen massive leaps in the past two decades. The table below views examples of vehicle automation through the lens of Automate-Augment-Autopilot rather than SAE’s five levels of automation.

‍

Automate Augment Autopilot
• Cruise control • Parallel-parking assist • Full self-driving
• Power doors/power windows • Lane-keeping assist • Driverless robotaxis
• Over-the-air software updates • Adaptive cruise control
• Distance to destination
• Range-to-empty (charge/gas)
• Blind-spot detectors
• Rear-view cameras
• Per-driver preference recall

‍

Notice how many more features live in the Automate and Augment columns than in Autopilot. Each reduces driver toil and improves safety long before society embraces widespread driverless cars. Indeed, even in the tech hotbed of the San Francisco Bay Area, full self‑driving is still far from mainstream.

What one terms as “toil” can also be situation-dependent. For example, I enjoy driving under normal circumstances, but when driving after a long day, the activity could feel like toil. Engaging full self-driving alleviates stress and makes the drive safer for both me and fellow drivers.

‍

Mapping back to day jobs for SREs and IT Operations

Firefighting at 3 a.m. or when paged for an incident while your family is opening holiday gifts feels different from troubleshooting during business hours. Yet the workflow is the same: acknowledge an alert, review dashboards, pivot on dimensions, widen time windows, ask “what changed,” and so on. Indeed, on-call is stressful for most humans due to a rapid context switch that breaks their “flow” from what they are currently doing. Imagine if answers to standard questions that SREs need were at their fingertips the moment they were paged. The analysis could be summarized as plausible root causes. Due to this swift change in context, even triage—eliminating possible causes that the AI has eliminated—allows the SRE to focus on other possible causes for an anomaly. Better yet, imagine if AI handled the entire incident for issues that meet strict guardrails, paging the human only when governance demands it.

Across interviews with hundreds of SREs and in-depth design-partner sessions, we have learned that there are plenty of opportunities across all three modes. These customers have also underscored the need for a seamless user experience that lets teams glide from automation to augmentation to autopilot without context switches.

To earn that trust, the AI system must clear a high bar. Simple string‑matching rules and regexes no longer suffice. The system should detect patterns, generate candidate automations, and translate user intent into action. Identifying the root cause of a Kubernetes issue is a helpful capability, but by itself, it rarely moves the needle for enterprise SRE teams. The real value emerges when the system uses that finding to correlate with other domains to suggest a concrete, automated fix or a clear, augmented workflow.

SREs experimenting with AI systems should give them the same fair shot they’d give a new teammate. Does the AI system have the full context that a human has? It is reasonable to provide access to data progressively. Start with read-only access, layer in broader data sources, and grant access to sensitive systems (e.g., code repositories) or edit privileges only after the system has earned your trust.

‍

Conclusion

Autopilot promises an order-of-magnitude leap and that should always be the goal. However, one shouldn’t ignore the steady progress made possible by automation and augmentation in reducing SRE toil, which paves the road to that destination. Like the principle of 1% daily improvement compounding to a 38x gain in a year, these steady enhancements are what make the revolutionary leap possible.

Enterprise SRE teams should view the three modes as a continuum that is always under human control:

  1. Automate wherever possible to remove repetitive toil.
  2. Augment the SRE when new insight can improve the quality and speed of decisions.
  3. Autopilot when possible and policy allows full delegation.

That is the ethos behind an AI SRE Teammate, a collaborative partner that scales efficiency and insight while keeping human SREs firmly in charge.

‍