Plugging the “ops” gap left by DevOps

Plugging the “ops” gap left by DevOps

DevOps is designed to automate the build deployment and integration process using CI/CD pipelines, containers and automation orchestration tools like Jenkins, Chef, Ansible or Terreform amongst others. However, DevOps falls short in the area of system operations, security and compliance. Organizations are coming up stop-gap options like Site Reliability Engineers or Platform Engineering groups. However these efforts are not well defined and do not have a strong framework driven by best practices. The infographic below provides an overview of the current state.

Hard-core systems operations and management that typically include security event and incident management, vulnerability management, continuous monitoring and incident response, systems operations and financial operations are largely left out or not well defined with ambiguity and duplication. This is a serious gap in the current “state-of-the-art” given than typically 60–70% of a total system cost is associated with the Operations & Maintenance activity. Just like DevOps has streamlined the build and deploy process using the CI/CD pipelines, AIOps provides a scalable operations, automation and management framework. AIOps starts where DevOps stops.

stackArmor’s founders and engineers have been been migrating and managing systems in the cloud since 2009 and have first-hand experience in developing cloud operations and management best practices. We have developed stackArmor StackOps as a holistic cloud operations and security management framework that covers the full-stack and incorporates AI to deliver automated response and incident management.

The stackArmor StackOps framework covers four key dimensions that are critical for a strong operations and management framework. Key areas for monitoring and management include:
  1.  System Operations – The robust monitoring and automated management of compute, storage and network components and services should also include automated response. Typical metrics to report include CPU utilization, unused EBS volumes, network traffic monitoring, IO ops amongst others.
  2.  Security Operations – Security operations in cloud computing platforms requires a strong focus on policies, procedures and rules to ensure that the infrastructure components are configured optimally to meet compliance and security requirements. The continuous monitoring of AWS configurations, vulnerability management and penetration scanning with associated metrics must be tracked.
  3. Financial Operations – Cloud platforms offer elastic pricing models that require strong operational oversight and management of optimal consumption and utilization. Typical metrics that must be tracked include underutilized instances, unoptimized instance families, and orphaned storage volumes are just some examples.
  4. User Experience – Monitoring end-user experience parameters like response times, uptime and throughput are just some examples of common metrics that must be tracked to deliver a superior user experience.

There is an increasing focus to ensure that cloud operations are managed in a holistic manner and integrates platform operations also known as SRE as well as traditional DevOps functions. Terms like DevSecOps, AIOps, SOAR/SOAPA and many others are attempts to help organizations manage cloud operations and security at scale. stackArmor StackOps is a framework designed and implemented by experts in cloud operations and security with successful results for security focused customers.