Files
nexus/#recycle/Technical/Cloud & DevOps/What I know about Cloud Service Delivery.md
2026-03-23 20:57:45 +08:00

5.0 KiB

Cloud Service Delivery

Cloud Service Delivery encompasses the entire lifecycle of making cloud services operational, available, secure, performant, and valuable to end-users and customers. In essence, Cloud Service Delivery is the bridge between the raw capabilities of cloud technology (IaaS, PaaS, SaaS) and the reliable, secure, performant, and cost-effective services that businesses and users actually consume.

Cloud Service Delivery Team:

  • Cloud Infrastructure Engineer
  • Cloud Operation Engineer (DevOps/SRE)
  • Cloud Security Specialists
  • Cloud Support Engineer
  • Cloud FinOps Engineer
  1. Service Provisioning & Deployment:

    • Setting up cloud infrastructure (servers, storage, networking).
    • Automating deployment of applications and platforms.
    • Configuring services according to customer requirements.
    • Managing resource allocation and scaling
    • Best Practice

  2. Infrastructure Management:

    • Monitoring health, performance, and capacity of compute, storage, network resources.
    • Patching and updating underlying infrastructure (hypervisors, hosts).
    • Managing physical data center aspects (power, cooling, hardware lifecycle) if using private/hybrid cloud.
    • Ensuring high availability and disaster recovery setups.
    • Best Practice:
      • AWS CloudWatch as a data source in Grafana Monitoring Tool
  3. Platform Management (for PaaS):

    • Managing middleware, databases, development tools, and runtime environments.
    • Ensuring platform scalability, security, and performance.
    • Applying patches and updates to platform components.
  4. Application Operations & Management (for SaaS/IaaS-hosted apps):

    • Monitoring application performance, uptime, and user experience.
    • Deploying application updates and bug fixes.
    • Managing application configuration and secrets.
    • Ensuring application scalability and resilience.
  5. Security & Compliance Management:

    • Implementing and managing security controls (firewalls, IDS/IPS, encryption, IAM).
    • Vulnerability scanning and patch management.
    • Security incident monitoring and response.
    • Ensuring compliance with regulations (GDPR, HIPAA, PCI-DSS, etc.).
    • Auditing and logging management.
    • Best Practice
      • Cloud Application WAF management
      • IP white list support to tenant level
      • Security Scanning
      • Security Guidance
  6. Performance & Availability Monitoring:

    • 24/7 monitoring of all service components (infrastructure, platform, application).
    • Setting and tracking SLAs (Service Level Agreements) and SLOs (Service Level Objectives).
    • Proactive detection and resolution of performance bottlenecks and potential failures.
    • Managing incident response to outages or degradation.
    • Best Practice:
      • Service Availability Check (APM/BPM, New Relic, AWS CloudWatch Synthetic, Health Page)
      • SLA -Service Level Agreement - 99.9% vs 99.99% uptime
      • SLO - Service Level Objective
      • Proactive detection (Grafana Alerting different severity)
  7. Incident & Problem Management:

    • Responding to alerts and service disruptions.
    • Troubleshooting issues across the stack.
    • Restoring service quickly (incident management).
    • Identifying root causes and implementing permanent fixes (problem management).
    • Best Practice
  8. Change & Configuration Management:

    • Controlling and documenting changes to the cloud environment.
    • Managing configurations consistently and securely (Infrastructure as Code - IaC).
    • Minimizing risk associated with changes through testing and rollback plans.
  9. Cost Management & Optimization:

    • Monitoring cloud resource consumption and spending.
    • Identifying and eliminating waste (idle resources, over-provisioning).
    • Right-sizing resources.
    • Utilizing reserved instances or savings plans effectively.
    • Providing cost visibility and reporting.
  10. Customer Onboarding & Support:

    • Guiding new customers/users through setup and access.
    • Providing user documentation and training resources.
    • Operating a service desk/helpdesk for user issues and requests (ticketing system).
    • Handling billing inquiries and account management.
  11. Service Governance & Lifecycle Management:

    • Defining service catalogs and service levels (SLAs).
    • Managing the lifecycle of services (introduction, operation, retirement).
    • Continuous service improvement based on metrics and feedback.
    • Vendor management (for public cloud providers or third-party tools).
    • Best Practice:

  12. Backup, Recovery & Disaster Management:

    • Implementing and managing data backup strategies.
    • Testing restore procedures.
    • Maintaining and testing disaster recovery (DR) plans and infrastructure.
    • Executing failover and failback procedures during disasters.

Cloud DevOps Maturity Model

AIOps