Handling inference at the edge

One way to minimise AI latency is to process ML models closer to end users and the data being ingested

David Gordon Tue 18 Jun 2024 // 08:43 UTC

Sponsored Post Any organisation accessing AI models hosted in the cloud knows what a challenge it can be to ensure that the large volumes of data needed to build and train those type of workloads can be accessed and ingested quickly to help avoid any potential performance lags.

Chatbots and virtual assistants, map generation, AI tools for software engineers, analytics, defect detection and generative AI applications – these are just some of the use cases which can benefit from real-time performance which can help eliminate those delays. And Gcore Inference at the Edge service is designed to give businesses across diverse industries, including IT, retail, gaming and manufacturing just that.

Latency is an issue that tends to be exacerbated when the collection and processing of datasets distributed across multiple geographical sources via the network is involved. It can be particularly problematic when deploying and scaling real-time AI applications in smart cities, TV translation, and autonomous vehicles. Taking those workloads out of a centralised data centre and hosting them at the network edge, closer to where the data actually resides, is one way around the problem.

That's what the Gcore Inference at the Edge solution is specifically designed to do. It distributes customers' pre-trained or custom machine learning models (including Mistral 7B, Stable-Diffusion XL, and LLaMA Pro 8B open source models for example) to 'edge inference nodes' located in over 180 locations on the company's content delivery network (CDN).

These nodes are built on servers running NVIDIA L40S GPUs designed to run AI inference workloads, interconnected by Gcore's low-latency smart-routing mechanism to minimise packet delay and better support real-time applications. Options for edge node servers built on Ampere® Altra® Max CPUs are planned for a later date.

The ML endpoints also feature built in distributed denial of service (DDoS) protection to help thwart cyber security attacks and keep applications up and running in the event of an incident. That's a crucial layer of cyber defence which aids compliance with various data protection rules and regulations, including the GDPR, PCI DSS and ISO/IEC 27001, says the company.

The service works by providing customers with an endpoint they can integrate into their applications which diverts subsequent access requests and queries to the nearest edge node using anycast balancing (clients trying to reach a specific IP address are routed to the nearest host).

That helps keep latencies down to as little as 30 milliseconds on average says Gcore, with application performance boosted further by the NVIDIA GPU server infrastructure. Customers pay only for the resources their AI models need, saving money on building their own AI ready infrastructure, while additional compute resources can be quickly scaled up to handle any spikes in demand.

You can find out more about the Gcore Inference at the Edge solution by clicking here.

Sponsored by GCore.

Off-Prem

Edge + IoT

Handling inference at the edge

One way to minimise AI latency is to process ML models closer to end users and the data being ingested

To patch this server, we need to get someone drunk

SQL king Larry Ellison becomes sequel sultan with controlling interest in Paramount Global

NASA's solar sailing spacecraft is tumbling – but that's part of the plan

VMware revenue bounces for Broadcom, chips were a little undercooked

Nvidia and chums inject $160M into Applied Digital to keep GPU sales rolling

US tightens export controls on quantum kit and chips for China, Iran, Russia

Homeland security hopes to scuttle maritime cyber-threats with port infosec testbed

White House’s new fix for cyber job gaps: Serve the nation in infosec

Raspberry Pi 4 bugs throw wrench in the works for Fedora 41

Of course the Internet Archive’s digital lending broke the law, appeals court says

AI-pushing Adobe says AI-shy office workers will love AI if it saves them time

Uncle Sam charges Russian GRU cyber-spies behind 'WhisperGate intrusions'

About Us

Our Websites

You Privacy