107 lines
5.0 KiB
Markdown
107 lines
5.0 KiB
Markdown
|
||
## Cloud Service Delivery
|
||
|
||
Cloud Service Delivery encompasses **the entire lifecycle of making cloud services operational, available, secure, performant, and valuable to end-users and customers.**
|
||
**In essence, Cloud Service Delivery is the bridge between the raw capabilities of cloud technology (IaaS, PaaS, SaaS) and the reliable, secure, performant, and cost-effective services that businesses and users actually consume.**
|
||
|
||
Cloud Service Delivery Team:
|
||
- Cloud Infrastructure Engineer
|
||
- Cloud Operation Engineer (DevOps/SRE)
|
||
- Cloud Security Specialists
|
||
- Cloud Support Engineer
|
||
- Cloud FinOps Engineer
|
||
-
|
||
|
||
1. **Service Provisioning & Deployment:**
|
||
- Setting up cloud infrastructure (servers, storage, networking).
|
||
- Automating deployment of applications and platforms.
|
||
- Configuring services according to customer requirements.
|
||
- Managing resource allocation and scaling
|
||
- Best Practice
|
||
-
|
||
|
||
2. **Infrastructure Management:**
|
||
- Monitoring health, performance, and capacity of compute, storage, network resources.
|
||
- Patching and updating underlying infrastructure (hypervisors, hosts).
|
||
- Managing physical data center aspects (power, cooling, hardware lifecycle) _if using private/hybrid cloud_.
|
||
- Ensuring high availability and disaster recovery setups.
|
||
- Best Practice:
|
||
- AWS CloudWatch as a data source in Grafana Monitoring Tool
|
||
-
|
||
3. **Platform Management (for PaaS):**
|
||
- Managing middleware, databases, development tools, and runtime environments.
|
||
- Ensuring platform scalability, security, and performance.
|
||
- Applying patches and updates to platform components.
|
||
4. **Application Operations & Management (for SaaS/IaaS-hosted apps):**
|
||
- Monitoring application performance, uptime, and user experience.
|
||
- Deploying application updates and bug fixes.
|
||
- Managing application configuration and secrets.
|
||
- Ensuring application scalability and resilience.
|
||
-
|
||
5. **Security & Compliance Management:**
|
||
- Implementing and managing security controls (firewalls, IDS/IPS, encryption, IAM).
|
||
- Vulnerability scanning and patch management.
|
||
- Security incident monitoring and response.
|
||
- Ensuring compliance with regulations (GDPR, HIPAA, PCI-DSS, etc.).
|
||
- Auditing and logging management.
|
||
- Best Practice
|
||
- Cloud Application WAF management
|
||
- IP white list support to tenant level
|
||
- Security Scanning
|
||
- Security Guidance
|
||
|
||
6. **Performance & Availability Monitoring:**
|
||
- 24/7 monitoring of all service components (infrastructure, platform, application).
|
||
- Setting and tracking SLAs (Service Level Agreements) and SLOs (Service Level Objectives).
|
||
- Proactive detection and resolution of performance bottlenecks and potential failures.
|
||
- Managing incident response to outages or degradation.
|
||
- Best Practice:
|
||
- Service Availability Check (APM/BPM, New Relic, AWS CloudWatch Synthetic, Health Page)
|
||
- SLA -Service Level Agreement - 99.9% vs 99.99% [uptime](https://uptime.is/)
|
||
- SLO - Service Level Objective
|
||
- Proactive detection (Grafana Alerting different severity)
|
||
|
||
7. **Incident & Problem Management:**
|
||
- Responding to alerts and service disruptions.
|
||
- Troubleshooting issues across the stack.
|
||
- Restoring service quickly (incident management).
|
||
- Identifying root causes and implementing permanent fixes (problem management).
|
||
- Best Practice
|
||
|
||
8. **Change & Configuration Management:**
|
||
- Controlling and documenting changes to the cloud environment.
|
||
- Managing configurations consistently and securely (Infrastructure as Code - IaC).
|
||
- Minimizing risk associated with changes through testing and rollback plans.
|
||
|
||
9. **Cost Management & Optimization:**
|
||
- Monitoring cloud resource consumption and spending.
|
||
- Identifying and eliminating waste (idle resources, over-provisioning).
|
||
- Right-sizing resources.
|
||
- Utilizing reserved instances or savings plans effectively.
|
||
- Providing cost visibility and reporting.
|
||
|
||
10. **Customer Onboarding & Support:**
|
||
- Guiding new customers/users through setup and access.
|
||
- Providing user documentation and training resources.
|
||
- Operating a service desk/helpdesk for user issues and requests (ticketing system).
|
||
- Handling billing inquiries and account management.
|
||
-
|
||
11. **Service Governance & Lifecycle Management:**
|
||
- Defining service catalogs and service levels (SLAs).
|
||
- Managing the lifecycle of services (introduction, operation, retirement).
|
||
- Continuous service improvement based on metrics and feedback.
|
||
- Vendor management (for public cloud providers or third-party tools).
|
||
- Best Practice:
|
||
-
|
||
|
||
12. **Backup, Recovery & Disaster Management:**
|
||
- Implementing and managing data backup strategies.
|
||
- Testing restore procedures.
|
||
- Maintaining and testing disaster recovery (DR) plans and infrastructure.
|
||
- Executing failover and failback procedures during disasters.
|
||
## Cloud DevOps Maturity Model
|
||
|
||
## AIOps
|
||
|
||
|