Off-Prem

PaaS + IaaS

Superclusters too big, but single servers too small? Oracle offers AI Goldilocks zone

Adds L40 bare metal option to the O-Cloud, plus A100 and H100 VMs. And teases a GH200 beast


Oracle has created a pair of for-rent AI infrastructure options aimed at medium-scale AI training and inference workloads – and teased the arrival of Nvidia's GH200 superchip in its cloud

On Wednesday, Big Red's product marketing director Akshai Parthasarathy and principal product manager Sagar Zanwar detailed the two new "shapes" – Oracle-speak for cloud instance types – for mid-range AI workloads.

One bears the snappy moniker of BM.GPU.L40S.4. The BM stands for bare metal, and in this shape the boxes come equipped with four Nvidia L40S GPUs – each with 48GB of GDDR6 memory, 7.38TB of local NVMe capacity, 4th Generation Intel Xeon CPUs with 112 cores, and a terabyte of system memory.

The BM.GPU.L40S.4 shape is "orderable now."

If you prefer virtual machines, Oracle has defined two more shapes, but isn't ready to rent them just yet – instead billing them as coming "soon."

The VM.GPU.A100.1 and VM.GPU.H100.1 shapes support either an Nvidia A100 or H100 accelerator. The H100 shape will include up to 80GB of HBM3 memory, 2x 3.84TB of NVMe drive capacity, 13 cores from 4th Gen Intel Xeon processors, and 246GB of system memory.

The A100 offering will pack either 40GB or 80GB of HBM2e memory.

Parthasarathy and Zanwar pitched the offerings as suitable for users who feel Oracle's AI superclusters are too big, but single-node offerings packing one to four GPUs are too small.

FLOP for FLOP, the L40S looks to outperform Nvidia's older, Ampere-based A100, which is also offered in a higher-end Oracle cluster. The L40S boasts 183 TFLOPs in TF32 to the A100's 156 TFLOPs, but the L40S has a major disadvantage in its relatively paltry 864GB/sec memory bandwidth compared to the A100's 1,555GB/sec for the 40GB variant and 2,039GB/sec for the 80GB version.

Memory bandwidth is crucial for AI inferencing, especially when it comes to token-per-second performance – which is presumably why Oracle considers the A100 more powerful than the L40S.

Given the memory buffer size of 48GB, L40S superclusters will probably be best suited for large language models with up to 14 billion parameters, so as to allow up to 2GB per billion parameters and 20GB reserved for the context window and batch parts of the equation.

Technically, the combined memory size of multiple L40S GPUs would permit larger models, but since the L40S lacks NVLink and instead uses slower PCIe 4.0, the performance is likely to be less than optimal. Quantizing could, however, increase the number of parameters for L40S clusters without running into memory constraints.

It's not clear if Oracle is using individual PCIe versions of the A100 and H100 or their SXM variants, which allow multiple GPUs on the same board. We suspect it's using the SXM model, which is intended for sharing among VMs.

Whatever powers these VMs, they are a substantial improvement from the A10-powered VM Oracle has offered previously and now casts as workstation-grade oferings. With 80GB of memory, these new VMs ought to be able to run LLMs with 30 billion parameters without any quantization, and the high memory bandwidth should allow for relatively high token-per-second rates.

Oracle also teased a BM.GPU.GH200 compute shape, currently in customer testing.

It features the Nvidia Grace Hopper Superchip and NVLink C2C – a high-bandwidth cache-coherent 900GB/sec connection between Nvidia's Grace CPU and Hopper GPU that provides over 600GB of accessible memory, enabling up to 10X higher performance for AI and HPC workloads. Customers interested in the Grace architecture and upcoming Grace Blackwell Superchip can ask Big Red for access. ®

Send us news
1 Comment

Nvidia admits Blackwell defect, but Jensen Huang pledges Q4 shipments as promised

The setback won't stop us from banking billions, CFO insists

Buying a PC for local AI? These are the specs that actually matter

If you guessed TOPS and FLOPS, that's only half right

DoJ reportedly advances Nvidia antitrust probe

Uncle Sam apparently worried GPU giant may be punishing customers who shop around

Nvidia's growth slows to a mere 122 percent but it’s still topping expectations

Still growing in China, ramping Hopper prods and predicting Blackwell billions soon

Canadian artist wants Anthropic AI lawsuit corrected

Tim Boucher objects to the mischaracterization of his work in authors' copyright claim

Benchmarks show even an old Nvidia RTX 3090 is enough to serve LLMs to thousands

For 100 concurrent users, the card delivered 12.88 tokens per second—just slightly faster than average human reading speed

Tenstorrent's Blackhole chips boast 768 RISC-V cores and almost as many FLOPS

Shove 32 of 'em in a box and you've got nearly 24 petaFLOPS of FP8 perf

Nvidia's latest AI climate model takes aim at severe weather

That tornado warning couldn't possibly be a hallucination... could it?

LiquidStack says its new CDU can chill more than 1MW of AI compute

So what’s that good for? Like eight of Nvidia’s NVL-72s?

Copper's reach is shrinking so Broadcom is strapping optics directly to GPUs

What good is going fast if you can't get past the next rack?

AI firms propose 'personhood credentials' … to fight AI

It's going to take more than CAPTCHA to prove you're real

AMD's Victor Peng: AI thirst for power underscores the need for efficient silicon

Moore's Law may be running out of steam, but there are still knobs to turn and levers to pull