Off-Prem

PaaS + IaaS

Companies flush money down the drain with overfed Kubernetes cloud clusters

Just 13% of provisioned CPUs, 20% of memory utilized, study finds


Cloud optimization biz CAST AI says that companies are still overprovisioning resources and paying too much as a consequence. It claims that in Kubernetes clusters of 50 or more CPUs, only 13 percent of provisioned CPUs and 20 percent of memory is typically utilized.

CAST develops a platform to monitor use of Kubernetes resources and compare them with what it calculates the workload actually requires. The figures in its latest study are based on an analysis of more than 4,000 clusters operated by customers prior to optimization.

The reason for this overprovisioning is largely down to an excess of caution by users, with DevOps teams anxious to avoid running out of memory, and also because of the difficulty in knowing exactly how much resources may be needed at the outset. Users can also be confused by the sheer choice available, with AWS offering 600 different EC2 instances, the study says.

CAST reckons this can be seen by the percentage of unutilized memory which is virtually identical across the three major cloud platforms – AWS, Azure, and Google Cloud – meaning that it is not due to the peculiarities of a particular cloud.

CPU utilization is likewise little different, with clusters on AWS and Azure at an average 11 percent utilization, while those on Google Cloud tended to be higher, at 17 percent.

Google Kubernetes Engine (GKE) users have access to custom instances, which allow a more precise CPU/memory ratio than the other two clouds, CAST's study states.

In large deployments - mega clusters of 30,000 CPUs or more - utilization tended to be higher at 44 percent, which CAST attributes to these being run by larger teams that can pay more attention to management.

As well as overprovisioning, CAST points to a reluctance to use spot instances as another driver of overspending. Many are hesitant to use this type of virtual machine, which is essentially spare capacity made available at less cost than the standard on-demand price for the same instance.

The average on-demand cost per CPU is $6.70 per hour, according to CAST, while for spot instances it is $1.80 per hour.

The reluctance is because the cloud provider can reclaim the instance at any time, with minimal warning. However, the CAST AI platform can move a customer's workload to another instance automatically.

CAST provides free analysis for organizations to see how much their cloud resources are being overprovisioned, while subscribers can have the platform take action to optimize things.

"This year's report makes it clear that companies running applications on Kubernetes are still in the early stages of their optimization journeys, and they're grappling with the complexity of manually managing cloud-native infrastructure," CAST AI co-founder and chief product officer Laurent Gil said in a statement. ®

Send us news
13 Comments

The elusive dream of cloud portability: Why migrating workloads isn't so simple

Despite early promises, moving between providers remains a complex and costly endeavor

Admins wonder if the cloud was such a good idea after all

As AWS, Microsoft, and Google hike some prices, it's time to open up the ROI calculator

UK competition regulator's cloud probe remedies have global implications

Egress fees? Ticked. Spend discounts? Not yet. Software licensing? Might need to shape up, Microsoft

When it comes to cloud, it's China against the world

Amazon, Microsoft, and Google dominate the west, but the Middle Kingdom plays by its own rules

China AI devs use cloud services to game US chip sanctions

Orgs are accessing restricted tech, raising concerns about more potential loopholes

110K domains targeted in 'sophisticated' AWS cloud extortion campaign

If you needed yet another reminder of what happens when security basics go awry

Alibaba Cloud boosts failure prediction with logfile timestamps

Machine learning helps, but more data catches more faults - so Chinese champ has shared its data

Broadcom has brought VMware down to earth and that’s welcome

But users aren’t optimistic it will land softly

If the world had a hyperscale datacenter capital, it would be... Northern Virginia

If you guessed Beijing, sorry – but it is number 2, according to Synergy Research figures

Google-commissioned report claims early adopters already enjoying fruits of gen-AI labor

43% of the time, it really, really works 45% of the time

HPE nabs long-time ally Morpheus Data

The CMP boasts to be the orchestration platform behind GreenLake since 2022

Alibaba Cloud claims K8s service meshes can require more resources than the apps they run

Built its own replacement – Canal Mesh – that it says leaves Google's Istio and Ambient eating dust