Off-Prem

Alibaba Cloud details storage tech that's doubled its VMs per host

Using one disk as a write cache eases stresses created by manycore CPUs


Exclusive Alibaba Cloud has detailed the tech it developed to run local storage in its servers and bust bottlenecks created by new-generation manycore processors.

The tech was detailed last week in a paper titled "CSAL: the Next-Gen Local Disks for the Cloud" published in the April edition of Proceedings of the Nineteenth European Conference on Computer Systems. Eleven authors work at Alibaba Cloud, and another six work at Solidigm – Intel's old SSD business now mostly owned by SK hynix.

The paper sets the scene by reminding readers that cloud servers typically use local storage, and that local capacity determines how many VMs each cloudy host can handle. It then notes that modern manycore CPUs encourage clouds and users to run more VMs on each host.

The obvious way to run more VMs, the paper notes, is to pack cloud servers full of colossal hard disks, storage-class memory, or fast solid state disks. But hard disks have bandwidth limits, storage-class memory mostly failed (the paper mentions the Optane tech Intel snuffed), and fast SSDs have capacity problems and big price tags.

What's a cloud to do? Quad-level cell (QLC) SSDs are an obvious answer, the paper suggests, because they offer high capacity and decent prices.

Alibaba Cloud therefore tried QLC disk in three scenarios: as a drop in replacement for other disks, as part of a layered system alongside high-speed SSDs, and using the dm-zoned a kernel device mapper.

The paper explains that QLC failed as a drop-in replacement because of "the two levels of write amplification caused by device-level address mapping with Indirection Unit and NAND-level garbage collection."

A layered system that used a write-back cache to handle small writes in one SSD helped, but didn't match hard disk performance.

dm-zoned didn't help either, because under load it constantly needed to move data – which smashed performance.

Alibaba therefore devised the Cloud Storage Acceleration Layer (CSAL), which the paper explains sees the most recently used data stored in DRAM and swapped to a fast SSD, which also handles all incoming writes. When possible and sensible, data from that SSD is shunted into the QLC disk.

The paper explains CSAL's workings in sufficient detail that that even our storage-centric sibling site Blocks and Files might find its attention wavering.

The impact of CSAL on Alibaba Cloud ops is easier to understand and is outlined as follows:

Compared to last-gen HDD-based local disks (24× 2TB HDDs with a 48-core Xeon Cascade CPU), CSAL-ready servers (an 800GB HP-SSD and a 15.36TB QLC SSD with a 64-core Xeon Ice Lake CPU) can host twice more instances while achieving the same Service Level Objects.

That's a doubling of VM density from second-gen to third-gen Xeons, despite the extra 16 cores in the newer processor stressing storage more than the older silicon. Also, 64 is not double 48.

CSAL is in production across "thousands of Elastic Compute Service (ECS) nodes in Alibaba Cloud." Maybe you could run it too: Alibaba Cloud has open-sourced CSAL into the Storage Performance Development Kit.

Alibaba Cloud has racked up a few wins lately. After cutting prices, its homebrew Yitian 710 was recently rated the fastest Arm CPU in the cloud. We've also covered an in-house networking tool that slashed the number of personnel the Chinese concern needed to dedicate to troubleshooting, and research suggesting Alibaba Cloud's operations could be more efficient than Google's.

Which is great news for Chinese cloud users, who have no qualms about working with Alibaba Cloud. For the rest of us, the decision to consider Alibaba cloud is doubtless more complicated. ®

Send us news
3 Comments

Alibaba Cloud boosts failure prediction with logfile timestamps

Machine learning helps, but more data catches more faults - so Chinese champ has shared its data

Broadcom has brought VMware down to earth and that’s welcome

But users aren’t optimistic it will land softly

The elusive dream of cloud portability: Why migrating workloads isn't so simple

Despite early promises, moving between providers remains a complex and costly endeavor

Thailand spins up approval for Western Digital to make more spinning rust

Kingdom sees growing demand for hard disks and drives to maintain global dominance

China AI devs use cloud services to game US chip sanctions

Orgs are accessing restricted tech, raising concerns about more potential loopholes

Alibaba and Tencent clouds see demand for CPUs level off, GPUs accelerate

Lenovo also cashes in on AI demand, without being able to turn it into profit

When it comes to cloud, it's China against the world

Amazon, Microsoft, and Google dominate the west, but the Middle Kingdom plays by its own rules

Alibaba Cloud claims K8s service meshes can require more resources than the apps they run

Built its own replacement – Canal Mesh – that it says leaves Google's Istio and Ambient eating dust

HPE nabs long-time ally Morpheus Data

The CMP boasts to be the orchestration platform behind GreenLake since 2022

If the world had a hyperscale datacenter capital, it would be... Northern Virginia

If you guessed Beijing, sorry – but it is number 2, according to Synergy Research figures

Need to move 1.2 exabytes across the world every day? Just Effingo

That’s what Google calls its massively parallel data copy service operating on dozens of clusters

Cloud growth puts hyperscalers on track to take up 60% of datacenter capacity by 2029

Enterprises used to spend more on own kit than cloud infra services... now it's the other way around