stackArmor has developed the Well-Managed Cloud Framework based on over 10 years of cloud operations experience. Current cloud frameworks are focused on migrations, establishing landing zones and providing optimized hosting models – they do not address issues around post-migration operations. stackArmor’s Well-Managed Cloud framework draws upon best practices from Amazon’s Well Architected Framework, Microsoft’s Cloud Operating Model, Google’s Site Reliability Engineering (SRE), ITIL and NIST publications. The stackArmor Well-Managed Cloud framework provides a cloud operators point of view and incorporates best practices based on real operational experience.
A Well-Managed Cloud is one that operates efficiently and securely in alignment with business and security objectives of the organization. Operational excellence is the only way to a well-managed cloud. Operational excellence is achieved through consistent management oversight using simple metrics that are easily understood. In order to ensure that cloud computing delivers the desired business results it is critical to establish a vision with clearly defined business objectives.
The stackArmor Well-Managed Cloud framework is focused on the platform operations and governance team and has the following components:
The infographic below provides an overview of the stackArmor Well-Managed Cloud framework.
stackArmor Well-Managed Cloud framework for cloud operations management and excellence for security and optimized cloud performance with integrated management platform
The Well-Managed Cloud framework provides an actionable blueprint that is described in greater detail below.
It is critical to establish a cross-functional organization with well-defined roles & responsibilities. This organization must have a clear mission statement that outlines desired business outcomes in concrete and measurable terms. The goals and objectives must cover customer/user, financial, technical, and security metrics.
Cloud computing technology is evolving rapidly and constantly changing. Serverless, containers, microservices and pay-as-you-go AI enabled services are emerging rapidly. While technologies may change, the principles governing the consumption of cloud services stay constant. The underlying principles that drive operational excellence are steadfast. These are categorized into five areas.
Security Operations (SecOps): Digital organizations are operating in an increasingly hostile environment. Cybersecurity threats and incidents are increasing rapidly and will continue to grow in ferocity and velocity. A strong Cloud Security Operations (SecOps) program is critical. Security operations (SecOps) covers all operational and tactical activities to ensure confidentiality, integrity and availability of data. Tracking security activities, time to resolution and continuous monitoring by criticality are key metrics that must be reviewed as part of the cloud operations scrum.
System Operations (SysOps): System operations includes tracking the optimal consumption and performance of cloud computing services including compute, storage, network and developing meaningful metrics. Cloud computing is not like traditional infrastructure and usage must be actively reviewed.
Financial Operations (FinOps): Cloud computing offers a pay as you go consumption model. Most organizations perform tactical cost optimization using reserved instances, spot instances etc but do not adequately focus on consumption and utilization. Cloud financial operations are focused on managing overall cloud product margins and profitability by creating meaningful metrics that empower executive oversight of cloud utilization and costs.
UX Operations (UXOps): Monitoring and managing the user experience is critical to identifying surges and capacity issues that may impact system usage. Monitoring business services and endpoints using easy to interpret metrics that can be used to detect and remediate user experience issues.
Compliance Operations (ComplianceOps): Monitoring and reporting compliance with security controls, reviews and submitting critical data are essential for organizations in regulated markets. Compliance operations includes performing time-bound activities based on FedRAMP, FISMA, PCI-DSS, HIPAA, CJIS or SOC2 requirements.
Most organizations consider cloud computing an “IT thing”. That is often a mistake as cloud computing is a larger strategic enabler requiring the attention of the larger C-suite especially the CFO. Defining and establishing simple to use metrics that are understood by business managers is critical to ensuring accountability and, hence improving performance. Management by Objectives (MBO) provides a structured framework with clear vision, roles & responsibilities and outcomes is essential to delivering ROI on enterprise cloud investments.
In order to ensure adequate oversight, it is essential to create separation of duties between the financial and technical organizations. The financial organization must establish clear financial operational and efficiency metrics. These metrics are then “flowed down” into the business and product owner organization for accountability and tracking. The product owners are now armed with the right management objectives and can orchestrate activities to ensure that corporate objectives are met by the technical teams. For example, the organization may set a stackArmor Cloud Idle ScoreR objective of no more than 20%. Product managers that attain the goal are suitably rewarded while others are encouraged to meet the corporate goals. Organizations should encourage benchmarking, gamification and incentives for team members that demonstrate a willing to align with corporate management objectives.
stackArmor’s Well-Managed Cloud platform consists of three key solution accelerators –
Organizations can integrate and utilize each of these services independently of each to ensure a highly tailored deployment to best fit the concept of operations. Given the focus on business operations and management we describe the stackArmor OpsAlert accelerator in greater detail.
stackArmor OpsAlert is a cloud operations management solution for busy product managers, product owners and business managers with a stake in financially optimized cloud hosting. stackArmor OpsAlert enables complex data interpretation and data driven decision making for business managers. The infographic shown below provides a high-level view of the concept of operations using the Well-Managed Cloud framework enabled by stackArmor OpsAlert.
stackArmor OpsAlert enables operational excellence with simple business metrics that drive greater efficiency and utilization of cloud services
stackArmor OpsAlert provides easy to use dashboards with actionable insights. For example, the Cloud utilization scorecard provides a snapshot view of how hard the cloud is working for your business. The stackArmor Cloud Idle Score provides an instant and easy to understand metric for detecting idle capacity that is being paid for but not utilized. Simple categorization of cloud instances into highly idle, moderately idle or low idle allow managers to quickly ask the right questions and drill-down to eliminate wastage.
Easy access to information enables effective management oversight which is critical for operational excellence. Also, the native integration with a ticketing system provides the ability to assign and track activities for accountability.
The ability to quickly interpret complex cost, operations and utilization data is essential for effective management. Using advanced data aggregation, integration and data science methods, stackArmor OpsAlert shows utilization, idle and cost data side by side to enable decision making. It is very clear from the example below – the highlighted instances are costing money but not being fully utilized. Business and technical managers must perform utilization and cost trade off analysis and devise appropriate optimization strategies. These may vary depending on application and usage scenario – tactics might include scheduling instances; containerization; downsizing or plain – shutting them down.