Cloud Load Balancing: Mermaid Flowchart Diagram

About Source

Cloud load balancing is the distribution of incoming network traffic across multiple backend targets — such as virtual machines, containers, or serverless functions — to ensure high availability, horizontal scalability, and fault tolerance.

Modern cloud providers offer multiple load balancer tiers that operate at different layers of the network stack:

Global load balancer (Layer 7): Routes traffic across regions based on latency, geo-proximity, or weighted routing policies. Terminates TLS at the edge and provides a single global anycast IP. Examples: AWS Global Accelerator, GCP Global HTTP(S) LB.

Application Load Balancer / ALB (Layer 7): Operates within a region, routing HTTP/HTTPS requests based on URL path, hostname, headers, or query parameters. Integrates directly with auto scaling groups, ECS tasks, Lambda functions, and Kubernetes ingress. Supports sticky sessions via cookies.

Network Load Balancer / NLB (Layer 4): Handles TCP/UDP at ultra-low latency with millions of requests per second. Used for non-HTTP workloads, gaming servers, or when preserving source IP is required.

Health checks are fundamental to all load balancers. Each backend target is polled at a configured interval (e.g., every 10 seconds). Targets failing consecutive checks (e.g., 2 of 3) are removed from the rotation; recovered targets are re-added after passing successive checks.

See Auto Scaling Workflow for how instances join and leave the load balancer's target group, Kubernetes Ingress Routing for in-cluster traffic distribution, and Cloud Monitoring Pipeline for observing load balancer metrics.

Frequently asked questions

Cloud load balancing is the distribution of incoming network traffic across multiple backend targets — virtual machines, containers, or serverless functions — to ensure high availability, horizontal scalability, and fault tolerance. It removes single points of failure and routes around unhealthy backends automatically.

An Application Load Balancer (ALB) operates at Layer 7 and routes based on HTTP attributes like URL path, hostname, and headers — ideal for web applications. A Network Load Balancer (NLB) operates at Layer 4, handling raw TCP/UDP at ultra-low latency for non-HTTP workloads, gaming servers, or cases where preserving the source IP is required.

Use a global load balancer when serving users across multiple geographic regions and you want traffic directed to the lowest-latency or healthiest region automatically. Global load balancers also provide a single anycast IP and handle TLS at the edge, reducing round-trip times for internationally distributed user bases.

mermaid

flowchart TD
    Client([Client Request]) --> GLB[Global Load Balancer\nAnycast IP, geo-routing]
    GLB -->|Region: us-east| ALB1[Application Load Balancer\nus-east-1]
    GLB -->|Region: eu-west| ALB2[Application Load Balancer\neu-west-1]
    ALB1 --> PathRouter{URL Path\nRouting Rules}
    PathRouter -->|/api/*| TG_API[Target Group: API\n3 instances]
    PathRouter -->|/static/*| TG_Static[Target Group: Static\n2 instances]
    PathRouter -->|/ws/*| TG_WS[Target Group: WebSocket\n2 instances]
    TG_API --> HC1{Health Check\nPassing?}
    HC1 -->|Healthy| Inst1[Instance A]
    HC1 -->|Healthy| Inst2[Instance B]
    HC1 -->|Unhealthy| Remove1[Removed from rotation]
    TG_Static --> Inst3[Instance C]
    TG_Static --> Inst4[Instance D]
    TG_WS --> NLB[Network Load Balancer\nLayer 4, TCP]
    NLB --> WSInst[WebSocket Instance]