Off-Prem

Edge + IoT

Handling inference at the edge

One way to minimise AI latency is to process ML models closer to end users and the data being ingested


Sponsored Post Any organisation accessing AI models hosted in the cloud knows what a challenge it can be to ensure that the large volumes of data needed to build and train those type of workloads can be accessed and ingested quickly to help avoid any potential performance lags.

Chatbots and virtual assistants, map generation, AI tools for software engineers, analytics, defect detection and generative AI applications – these are just some of the use cases which can benefit from real-time performance which can help eliminate those delays. And Gcore Inference at the Edge service is designed to give businesses across diverse industries, including IT, retail, gaming and manufacturing just that.

Latency is an issue that tends to be exacerbated when the collection and processing of datasets distributed across multiple geographical sources via the network is involved. It can be particularly problematic when deploying and scaling real-time AI applications in smart cities, TV translation, and autonomous vehicles. Taking those workloads out of a centralised data centre and hosting them at the network edge, closer to where the data actually resides, is one way around the problem.

That's what the Gcore Inference at the Edge solution is specifically designed to do. It distributes customers' pre-trained or custom machine learning models (including Mistral 7B, Stable-Diffusion XL, and LLaMA Pro 8B open source models for example) to 'edge inference nodes' located in over 180 locations on the company's content delivery network (CDN).

These nodes are built on servers running NVIDIA L40S GPUs designed to run AI inference workloads, interconnected by Gcore's low-latency smart-routing mechanism to minimise packet delay and better support real-time applications. Options for edge node servers built on Ampere® Altra® Max CPUs are planned for a later date.

The ML endpoints also feature built in distributed denial of service (DDoS) protection to help thwart cyber security attacks and keep applications up and running in the event of an incident. That's a crucial layer of cyber defence which aids compliance with various data protection rules and regulations, including the GDPR, PCI DSS and ISO/IEC 27001, says the company.

The service works by providing customers with an endpoint they can integrate into their applications which diverts subsequent access requests and queries to the nearest edge node using anycast balancing (clients trying to reach a specific IP address are routed to the nearest host).

That helps keep latencies down to as little as 30 milliseconds on average says Gcore, with application performance boosted further by the NVIDIA GPU server infrastructure. Customers pay only for the resources their AI models need, saving money on building their own AI ready infrastructure, while additional compute resources can be quickly scaled up to handle any spikes in demand.

You can find out more about the Gcore Inference at the Edge solution by clicking here.

Sponsored by GCore.

Send us news

To patch this server, we need to get someone drunk

When maintenance windows are hard to open, a little lubrication helps

SQL king Larry Ellison becomes sequel sultan with controlling interest in Paramount Global

Oh, great: another tech billionaire owns a media company – although his son probably runs the show

NASA's solar sailing spacecraft is tumbling – but that's part of the plan

Who needs fuel – or even engines – when you could use the sun to push a spacecraft along?

VMware revenue bounces for Broadcom, chips were a little undercooked

CEO says market for non-AI silicon has bottomed out

Nvidia and chums inject $160M into Applied Digital to keep GPU sales rolling

Datacenters are the lifeline for its $30B ML-fueled boom

US tightens export controls on quantum kit and chips for China, Iran, Russia

Alloys make the list too, as allies try to ensure foes can't weaponise tech

Homeland security hopes to scuttle maritime cyber-threats with port infosec testbed

Supply chains, 13M jobs and $649B a year at risk, so Uncle Sam is fighting back - with a request for info

White House’s new fix for cyber job gaps: Serve the nation in infosec

Now do your patriotic duty and fill one of those 500k open roles, please?

Raspberry Pi 4 bugs throw wrench in the works for Fedora 41

Problems also afflict the Pi Pico 2's chip

Of course the Internet Archive’s digital lending broke the law, appeals court says

Sorry, no, you can’t just digitize, share copyrighted books without permission

AI-pushing Adobe says AI-shy office workers will love AI if it saves them time

knowledge workers, overwhelmed by knowledge tasks? We know what you need

Uncle Sam charges Russian GRU cyber-spies behind 'WhisperGate intrusions'

Feds post $10M bounty for each of the six's whereabouts