Special Features

Cloud Infrastructure Month

What AI bubble? Groq rakes in $640M to grow inference cloud

In the gold rush, be the one handing out the shovels


Even as at least some investors begin to question the return on investment of AI infrastructure and services, venture capitalists appear to be doubling down. On Monday, AI chip startup Groq — not to be confused with xAI's Grok chatbot — announced it had scored $640 million in series-D funding to bolster its inference cloud.

Founded in 2016, the Mountain View, California-based startup began its life as an AI chip slinger targeting high throughput, low cost inferencing as opposed to training. Since then the company has transitioned to an AI infrastructure-as-a-service provider and walked away from selling hardware.

In total, Groq has raised more than $1 billion and now boasts a valuation of $2.8 billion, with its latest funding round led by the likes of BlackRock, Neuberger Berman, Type One Ventures, Cisco Investments, Global Brain, and Samsung Catalyst.

The firm's main claim to fame is that its chips can generate more tokens faster, while using less energy, than GPU-based equipment. At the heart of all of this, is Groq's Language Processing Unit (LPU), which approaches the problem of running LLMs a little differently.

AI has yet to pay off – or is transforming business

READ MORE

As our sibling site The Next Platform previously explored, Groq's LPUs don't require gobs of pricy high-bandwidth memory or advantaged packaging — both factors that have contributed to bottlenecks in the supply of AI infrastructure.

Instead, Groq's strategy is to stitch together hundreds of LPUs, each packed with on-die SRAM, using a fiber optic interconnect. Using a cluster of 576 LPUs, Groq claims it was able to achieve generation rates of more than 300 tokens per second on Meta's Llama 2 70B model, 10x that of an HGX H100 system with eight GPUs, while consuming a tenth of the power.

Groq now intends to use its millions to expand headcount and bolster its inference cloud to support more customers. As it stands, Groq purports to have more than 360,000 developers build on GroqCloud creating applications using openly available models.

Training AI models is solved, now it's time to deploy these models so the world can use them

"This funding will enable us to deploy more than 100,000 additional LPUs into GroqCloud," CEO Jonathan Ross said Monday.

"Training AI models is solved, now it's time to deploy these models so the world can use them. Having secured twice the funding sought, we now plan to significantly expand our talent density.

These won't, however, be Groq's next-gen LPUs. Instead, they'll be built using GlobalFoundries' 14nm process node, and delivered by the end of Q1 2025. Nvidia's next-gen Blackwell GPUs are expected to be arriving within the next 12 or so months, depending on how delayed they turn out to be.

Groq is said to be working on two new generations of LPUs, which, last we heard, would utilize Samsung's 4nm process tech and deliver somewhere between 15x and 20x higher power efficiency.

You can find a deeper dive on Groq's LPU strategy and performance claims on The Next Platform.

VC Capital continues to flow into AI startups

Groq isn't the only infrastructure vendor that's managed to capitalize on all the AI hype. In fact, $640 billion is far from the largest chunk of change we've seen startups walk away with in recent memory.

As you may recall, back in May, GPU bit barn CoreWeave scored $1.1 billion in series-C funding weeks before it managed to talk Blackstone, Blackrock, and others into a loan for $7.5 billion using its GPUs as collateral.

Meanwhile, Lambda labs, another GPU cloud operator, used its cache of GPUs to secure a combined $820 million in fresh funding and debt financing since February, and it doesn't look like it is satisfied yet. Last month we learned Lambda was reportedly in talks with VCs for another $800 million in funding to support the deployment of yet more Nvidia GPUs.

While VC funding continues to flow into AI startups, it seems some on Wall Street are increasingly nervous about whether these multi-billion-dollar investments in AI infrastructure will ever pay off.

Still that hasn't stopped ML upstarts, such as Cerebras, from pursuing an initial public offering (IPO). Last week the outfit, best known for its dinner plate-sized accelerators aimed at model training, revealed it had confidentially filed for a public listing.

Is AI going to pay its way? Wall Street wants tech world to show it the money

LISTEN IN

The size and price range of the IPO have yet to be determined. Cerebras' rather unusual approach to the problem of AI training has helped it win north of $900 million in commitments from the likes of G42.

Meanwhile, with the rather notable exception in Intel, which saw its profits plunge $1.6 billion year-over-year in Q2 amid plans to lay off at least 15 percent of its workforce, chip vendors and the cloud providers reselling access to their accelerators have been among the biggest beneficiaries of the AI boom. Last week, AMD revealed its MI300X GPUs accounted for more than $1 billion of its datacenter sales.

However, it appears that the real litmus test for whether the AI hype train is about to derail won't come until the market leader Nvidia announces its earnings and outlook later this month. ®

Send us news
3 Comments

Canadian artist wants Anthropic AI lawsuit corrected

Tim Boucher objects to the mischaracterization of his work in authors' copyright claim

Buying a PC for local AI? These are the specs that actually matter

If you guessed TOPS and FLOPS, that's only half right

Cerebras gives waferscale chips inferencing twist, claims 1,800 token per sec generation rates

Faster than you can read? More like blink and you'll miss the hallucination

AI firms propose 'personhood credentials' … to fight AI

It's going to take more than CAPTCHA to prove you're real

Tenstorrent's Blackhole chips boast 768 RISC-V cores and almost as many FLOPS

Shove 32 of 'em in a box and you've got nearly 24 petaFLOPS of FP8 perf

The future of AI/ML depends on the reality of today – and it's not pretty

The return of Windows Recall is more than a bad flashback

Dell's all-in bet on AI pays off in latest earnings

The term was mentioned over 140 times during the earnings call

A quick guide to tool-calling in large language models

A few lines of Python is all it takes to get a model to use a calculator or even automate your hypervisor

Brit teachers are getting AI sidekicks to help with marking and lesson plans

Isn't the education system in enough trouble already?

Benchmarks show even an old Nvidia RTX 3090 is enough to serve LLMs to thousands

For 100 concurrent users, the card delivered 12.88 tokens per second—just slightly faster than average human reading speed

Fintech outfit Klarna swaps humans for AI by not replacing departing workers

Insists it's not cutting jobs and pays harder-to-automate people more with AI savings

IBM reveals upcoming chips to power large-scale AI on next-gen big iron

Telum II Processor and Spyre Accelerator set to boost performance and expand IO capacity