Karpenter vs Cluster Autoscaler: The Complete Decision Matrix

Both Karpenter and Cluster Autoscaler scale EKS nodes automatically, but they use fundamentally different approaches. Cluster Autoscaler adjusts existing node groups based on pending pods, while Karpenter provisions right-sized nodes on-demand from a fleet of instance types. Choosing the wrong autoscaler can cost you thousands in wasted capacity or cause availability issues during scale events.

Key Takeaways

  • Cluster Autoscaler scales pre-configured node groups; Karpenter provisions diverse instance types dynamically
  • Karpenter typically scales faster (30-60 seconds vs 2-5 minutes) and bins pods more efficiently
  • Cluster Autoscaler offers mature, predictable behavior; Karpenter provides aggressive consolidation and Spot optimization
  • Migration from Cluster Autoscaler to Karpenter requires careful planning to avoid disruption
  • Most teams benefit from Karpenter’s cost savings, but Cluster Autoscaler remains valid for stability-first environments

How Each Autoscaler Works

Cluster Autoscaler

Cluster Autoscaler monitors for pending pods that cannot be scheduled due to insufficient resources. When it detects unschedulable pods, it increases the desired capacity of matching Auto Scaling Groups. When nodes become underutilized (typically below 50% for 10+ minutes), it cordons, drains, and terminates them.

The workflow:

  1. Pod enters Pending state (no node has capacity)
  2. Cluster Autoscaler checks which node group could satisfy the pod’s requirements
  3. Increases desired capacity on the matching Auto Scaling Group
  4. AWS launches a new EC2 instance (2-4 minutes)
  5. Node joins cluster and pod is scheduled

Key constraint: You must pre-define node groups with specific instance types. If your node groups only have m5.large and a pod needs 16GB RAM, Cluster Autoscaler launches m5.large (which has 8GB) and the pod stays Pending.

Karpenter

Karpenter watches for unschedulable pods and directly provisions EC2 instances without Auto Scaling Groups. It evaluates pod requirements (CPU, memory, architecture, zones) and selects the best-fit instance type from a configurable fleet.

The workflow:

  1. Pod enters Pending state
  2. Karpenter calculates exact resource needs
  3. Selects optimal instance type from Provisioner configuration (can choose from dozens of types)
  4. Calls EC2 RunInstances directly (30-90 seconds)
  5. Node joins and pod schedules immediately

Key advantage: Karpenter can provision an m5.xlarge for one pod and an r6i.2xlarge for another—whatever fits best. It’s not limited to pre-configured groups.

Feature Comparison Matrix

FeatureCluster AutoscalerKarpenter
Scaling speed2-5 minutes (ASG launch time)30-90 seconds (direct EC2 API)
Instance selectionPre-configured node groups onlyDynamic from configured fleet
Spot handlingSeparate Spot node groups requiredMixed Spot/On-Demand in single Provisioner
ConsolidationLimited (terminates idle nodes)Aggressive bin-packing and node replacement
Multi-AZ supportRequires per-AZ ASGs for PV localityBuilt-in topology awareness
Configuration complexityMedium (ASG + flags)Medium-High (Provisioner CRDs)
MaturityStable (7+ years)Rapidly evolving (2021+, GA 2023)
Community supportLarge, establishedGrowing, AWS-backed
Interruption handlingRequires separate termination handlerBuilt-in Spot interruption handling
Scale-down safetyConfigurable thresholds + PDB respectConsolidation can be aggressive; requires tuning

When to Choose Cluster Autoscaler

Best for:

  • Predictable workload patterns where you can define 2-3 node group types that cover all use cases
  • Stability over optimization — you prefer well-tested behavior and don’t want cutting-edge features
  • Simpler operational model — your team is already familiar with Auto Scaling Groups and prefers that abstraction
  • Regulated environments where change control favors mature, widely-adopted tools
  • Low Spot usage — if you run mostly On-Demand or Reserved capacity, Cluster Autoscaler’s simpler Spot handling is sufficient

Example configuration for Cluster Autoscaler:

apiVersion: v1 kind: ServiceAccount metadata: name: cluster-autoscaler namespace: kube-system annotations: eks.amazonaws.com/role-arn: arn:aws:iam::ACCOUNT:role/cluster-autoscaler --- # Deploy via Helm with key flags helm upgrade --install cluster-autoscaler autoscaler/cluster-autoscaler \ --namespace kube-system \ --set autoDiscovery.clusterName=my-cluster \ --set extraArgs.balance-similar-node-groups=true \ --set extraArgs.skip-nodes-with-system-pods=false \ --set extraArgs.scale-down-delay-after-add=10m

Real-world scenario: A financial services company runs stateful workloads with strict compliance requirements. They use three node groups (general-purpose m5.large, memory-optimized r5.xlarge, compute-optimized c5.2xlarge) and scale predictably during business hours. Cluster Autoscaler provides the stability and auditability they need without introducing rapid node churn.

When to Choose Karpenter

Best for:

  • Cost optimization priority — you want maximum Spot usage and efficient bin-packing
  • Diverse workloads — batch jobs, APIs, ML training, analytics all running in one cluster with different resource profiles
  • Fast scaling requirements — sub-minute provisioning matters for your SLAs
  • Spot-heavy strategies — you want to maximize Spot coverage with automatic fallback to On-Demand
  • Dynamic instance type selection — you don’t want to maintain separate node groups for every workload type

Example Karpenter Provisioner configuration:

apiVersion: karpenter.sh/v1alpha5 kind: Provisioner metadata: name: default spec: requirements: - key: karpenter.sh/capacity-type operator: In values: ["spot", "on-demand"] - key: kubernetes.io/arch operator: In values: ["amd64"] - key: karpenter.k8s.aws/instance-category operator: In values: ["c", "m", "r"] - key: karpenter.k8s.aws/instance-generation operator: Gt values: ["4"] limits: resources: cpu: 1000 memory: 1000Gi providerRef: name: default ttlSecondsAfterEmpty: 30 consolidation: enabled: true --- apiVersion: karpenter.k8s.aws/v1alpha1 kind: AWSNodeTemplate metadata: name: default spec: subnetSelector: karpenter.sh/discovery: my-cluster securityGroupSelector: karpenter.sh/discovery: my-cluster instanceProfile: KarpenterNodeInstanceProfile

This Provisioner allows Karpenter to choose from c5, c6i, m5, m6i, r5, r6i families (generation 5+) across Spot and On-Demand, automatically selecting the best price-performance option.

Real-world scenario: A SaaS company runs microservices with highly variable traffic. Some services need 2GB RAM, others need 32GB. They use Karpenter to provision exactly the right instance types on-demand, achieving 70% Spot coverage with automatic On-Demand fallback. Karpenter’s consolidation feature replaces three underutilized m5.xlarge nodes with one m5.2xlarge, saving 30% on compute costs.

Cost Impact Comparison

Cluster Autoscaler cost characteristics:

  • Predictable spending patterns (fixed instance types)
  • Potential over-provisioning due to instance type constraints
  • Scale-down conservatism reduces waste but may leave idle nodes longer
  • Typical savings: 10-30% versus no autoscaling

Karpenter cost characteristics:

  • Dynamic right-sizing reduces over-provisioning waste
  • Aggressive consolidation increases utilization (often 60-80% vs 40-50%)
  • Better Spot diversification improves interruption resilience and savings
  • Typical savings: 30-50% versus no autoscaling; 15-25% versus Cluster Autoscaler

Example calculation: A cluster spending $10,000/month on EC2:

  • No autoscaling: $10,000/month baseline
  • Cluster Autoscaler (20% reduction): $8,000/month
  • Karpenter (40% reduction): $6,000/month

Karpenter’s additional savings come from better bin-packing, more aggressive Spot usage, and consolidation replacing multiple small nodes with fewer large ones.

Spot Instance Handling Comparison

Cluster Autoscaler Approach

Create separate Spot and On-Demand node groups. Use node affinity to prefer Spot for tolerant workloads. Requires aws-node-termination-handler DaemonSet for graceful spot interruption handling.

eksctl create nodegroup \ --cluster=my-cluster \ --name=spot-workers \ --spot \ --instance-types=m5.large,m5a.large,m5n.large \ --nodes-min=0 \ --nodes-max=20 \ --node-labels="workload-type=batch" # Must also install termination handler separately kubectl apply -f https://github.com/aws/aws-node-termination-handler/releases/download/v1.19.0/all-resources.yaml

Karpenter Approach

Single Provisioner handles both Spot and On-Demand. Karpenter automatically diversifies across instance types and handles interruptions without separate tooling.

apiVersion: karpenter.sh/v1alpha5 kind: Provisioner metadata: name: spot-optimized spec: requirements: - key: karpenter.sh/capacity-type operator: In values: ["spot"] - key: karpenter.k8s.aws/instance-family operator: In values: ["m5", "m5a", "m5n", "m6i", "m6a"] # Karpenter handles interruptions automatically # No separate termination handler needed

Interruption handling: Karpenter monitors EC2 Spot interruption notices and EventBridge events, automatically cordoning nodes and triggering replacement. Cluster Autoscaler requires aws-node-termination-handler as a separate component.

Consolidation Deep Dive

Consolidation is where Karpenter truly differentiates itself. It actively replaces underutilized nodes to improve packing efficiency.

How Karpenter Consolidation Works

  1. Karpenter identifies nodes with low utilization
  2. Simulates whether pods could fit on fewer, larger instances
  3. If consolidation saves money, cordons the target nodes
  4. Provisions replacement node(s)
  5. Drains old nodes once replacements are ready

Example scenario:

  • Three m5.large nodes (2 vCPU, 8GB each) running at 40% utilization
  • Karpenter calculates all pods fit on one m5.2xlarge (8 vCPU, 32GB)
  • Provisions m5.2xlarge, migrates pods, terminates the three m5.large
  • Cost reduction: $0.096/hour × 3 = $0.288/hour → $0.384/hour = 20% savings + better utilization

Consolidation Risks and Safeguards

Risk: Aggressive consolidation can cause pod churn and temporary unavailability.

Safeguards:

  • Set do-not-evict annotation on critical pods
  • Use PodDisruptionBudgets to control eviction rate
  • Configure ttlSecondsAfterEmpty to delay consolidation
  • Monitor consolidation events and rollback if issues arise
# Prevent specific pods from being consolidated apiVersion: v1 kind: Pod metadata: name: critical-database annotations: karpenter.sh/do-not-evict: "true" --- # Control consolidation timing apiVersion: karpenter.sh/v1alpha5 kind: Provisioner metadata: name: default spec: ttlSecondsAfterEmpty: 300 # Wait 5 minutes before consolidating empty nodes ttlSecondsUntilExpired: 604800 # Replace nodes weekly for security patches consolidation: enabled: true

Cluster Autoscaler alternative: No built-in consolidation. Scale-down happens when nodes fall below utilization threshold (default 50%) for 10+ minutes, but it doesn’t actively re-pack pods.

Migration Playbook: Cluster Autoscaler to Karpenter

Migrating requires careful planning to avoid disrupting production workloads.

Phase 1: Preparation (Week 1)

  1. Install Karpenter in non-production cluster and test Provisioner configurations
  2. Document current node groups — instance types, labels, taints, AZ distribution
  3. Identify workload dependencies — which pods require specific node types or zones
  4. Set up monitoring — create dashboards for node count, pod scheduling latency, costs

Phase 2: Parallel Operation (Week 2-3)

  1. Deploy Karpenter to production but don’t create Provisioners yet
  2. Create initial Provisioner that matches your existing node group characteristics
  3. Label a subset of workloads (e.g., dev namespace) to prefer Karpenter nodes
  4. Run both autoscalers — Cluster Autoscaler manages existing groups, Karpenter handles new workloads
# Gradually migrate workloads with node affinity apiVersion: apps/v1 kind: Deployment metadata: name: test-app spec: template: spec: nodeSelector: karpenter.sh/provisioner-name: default # Prefer Karpenter nodes

Phase 3: Full Migration (Week 4)

  1. Expand Karpenter Provisioner capacity limits to handle full cluster load
  2. Cordon Cluster Autoscaler managed nodes to prevent new pod scheduling
  3. Drain nodes gradually (one AZ at a time to maintain availability)
  4. Monitor pod rescheduling — ensure Karpenter provisions nodes successfully
  5. Delete old Auto Scaling Groups once all workloads have migrated
  6. Uninstall Cluster Autoscaler

Phase 4: Optimization (Week 5+)

  1. Enable consolidation in Provisioner (start conservatively)
  2. Expand instance type diversity to maximize Spot options
  3. Tune ttlSecondsAfterEmpty based on workload patterns
  4. Implement do-not-evict annotations for critical workloads
  5. Measure cost savings and adjust Provisioner requirements

Rollback Plan

If issues arise during migration:

  1. Increase Auto Scaling Group desired capacity back to original levels
  2. Remove node selectors that prefer Karpenter
  3. Delete Karpenter Provisioners to stop new node creation
  4. Re-enable Cluster Autoscaler
  5. Drain Karpenter nodes and migrate pods back

Decision Tree

Start here: What is your primary optimization goal?

  • Stability and predictability → Choose Cluster Autoscaler
  • Maximum cost savings → Choose Karpenter

Do you run diverse workloads with different resource profiles?

  • No, 2-3 node types cover everything → Cluster Autoscaler is sufficient
  • Yes, wide variation in CPU/memory needs → Karpenter provides better bin-packing

What is your Spot usage target?

  • Low (< 30%) → Either works; Cluster Autoscaler is simpler
  • High (> 50%) → Karpenter’s diversification and interruption handling shine

How important is sub-minute scaling?

  • Not critical → Cluster Autoscaler’s 2-5 minute scale is acceptable
  • Essential for SLAs → Karpenter’s 30-90 second provisioning helps

Do you have stateful workloads with AZ-bound PersistentVolumes?

  • Yes, many StatefulSets → Both work, but Karpenter’s topology awareness is easier to configure than per-AZ ASGs
  • No, mostly stateless → Either works

What is your team’s operational maturity with Kubernetes?

  • Early in Kubernetes journey → Start with Cluster Autoscaler, migrate to Karpenter later
  • Experienced, comfortable with CRDs and rapid iteration → Karpenter is a good fit

Common Pitfalls

Cluster Autoscaler pitfalls:

  • Forgetting –balance-similar-node-groups: Scale-outs concentrate in one AZ, causing imbalance
  • Too few instance types in Spot groups: High interruption correlation when single type is reclaimed
  • Missing per-AZ ASGs for StatefulSets: Pods stuck Pending after cross-AZ interruptions
  • Insufficient IAM permissions: Autoscaler can’t modify ASGs, scaling fails silently

Karpenter pitfalls:

  • Overly aggressive consolidation: Pod churn impacts availability; start with consolidation disabled, enable gradually
  • No do-not-evict annotations on critical pods: Databases get evicted during consolidation
  • Insufficient Provisioner capacity limits: Karpenter provisions unlimited nodes during runaway scaling
  • Missing PodDisruptionBudgets: Consolidation violates availability requirements
  • Wrong instance families in requirements: Over-provisioning if you include only large instance types

Hybrid Approach: Running Both

Some teams run both autoscalers temporarily during migration or permanently for different workload classes:

  • Cluster Autoscaler manages stable, production node groups
  • Karpenter handles batch jobs, dev/test, and experimental workloads

Use node selectors and taints to separate workloads clearly. This reduces risk while gaining Karpenter’s benefits for appropriate workloads.

# Production pods stay on Cluster Autoscaler nodes nodeSelector: node-group: production # Batch jobs use Karpenter nodeSelector: karpenter.sh/provisioner-name: batch

Measuring Success

Track these metrics after deploying either autoscaler:

  • Node utilization: Target 60-80% average CPU/memory (Karpenter typically achieves higher)
  • Pod scheduling latency: Time from Pending to Running (Karpenter faster)
  • Scale-up time: Time to provision new capacity when needed
  • Spot instance percentage: Karpenter usually achieves higher safe Spot coverage
  • Cost per pod: Use Kubecost to measure before/after
  • Node churn rate: Consolidation increases churn; monitor for acceptable levels

Expected outcomes:

  • Cluster Autoscaler: 10-30% cost reduction, predictable behavior, 2-5 minute scale-up
  • Karpenter: 30-50% cost reduction, 30-90 second scale-up, requires tuning for stability

Conclusion

Cluster Autoscaler and Karpenter solve the same problem with different philosophies. Cluster Autoscaler prioritizes stability through pre-configured node groups and conservative scaling, making it ideal for teams wanting predictable, well-tested behavior. Karpenter optimizes for cost and efficiency through dynamic provisioning, aggressive consolidation, and superior Spot handling—delivering 15-25% additional savings but requiring more operational maturity. Most teams benefit from starting with Cluster Autoscaler and migrating to Karpenter once they’re comfortable with autoscaling fundamentals. Use the decision tree to evaluate your priorities: choose Cluster Autoscaler if stability matters most, choose Karpenter if you want maximum optimization and can handle its complexity. Both are valid choices—the wrong decision is running neither and leaving nodes over-provisioned.