MLCommons Unveils MLPerf Inference Benchmark 2024: Edge, Data Center, Mobile, and Notebook Results

MLCommons, the independent organization that curates the MLPerf benchmark suite, has announced the latest round of MLPerf Inference scores. The new results are neatly split into four device classes—data center, edge, mobile, and notebook—making it easier for buyers, vendors, and researchers to compare performance across real‑world workloads.

Device‑Class Segmentation

Separating the scores by device class reflects the distinct form factors and performance profiles that characterize each market. As David Kanter, executive director of MLCommons told EE Times, “If you’re talking about a notebook, it’s probably running Windows; if you’re talking about a smartphone, you’re probably running iOS or Android.” By isolating mobile and notebook results, the benchmark provides clearer insights into the capabilities of the most relevant accelerators for each use case.

State‑of‑the‑Art Models

Unlike the previous round, which focused primarily on vision tasks, the current benchmark incorporates five modern, production‑grade models:

DLRM – a large‑scale recommendation engine
3D‑UNet – a medical imaging model for tumor detection in MRI scans
RNN‑T – a speech‑to‑text engine
BERT – a natural‑language‑processing transformer
MobileNetEdge, SSD‑MobileNetv2, Deeplabv3, and Mobile BERT – mobile‑optimized variants of the above

These choices were driven by an advisory board that includes industry experts from recommendation, medical imaging, and speech domains, ensuring the benchmarks reflect the most demanding workloads in production today.

Data Center Performance

As anticipated, Nvidia’s GPUs dominate the data‑center class. In fact, 85% of the total submissions were powered by Nvidia hardware, and the company swept every category it entered. The benchmark shows a widening performance gap: on a basic computer‑vision model, the Nvidia A100 delivers roughly 30× the throughput of an Intel Cooper Lake CPU, and 237× faster than a comparable CPU on recommendation‑system workloads. Paresh Kharya, Nvidia’s senior director of product management, highlights that a single DGX‑A100 can match the performance of a 1,000‑node CPU cluster for recommendation tasks, offering tremendous value for scale.

The only non‑CPU, non‑GPU entrant in this class was Mipsology’s Zebra accelerator, running on a Xilinx Alveo U250 FPGA. Zebra achieved 4,096 ResNet queries per second in server mode versus 5,563 for an Nvidia T4, and 5,011 samples per second offline compared to 6,112 for the T4.

In the research‑development category, Taiwanese company Neuchips’ RecAccel—an FPGA‑based accelerator for DLRM—performed on par with or below Intel’s Cooper Lake CPUs and fell short of Nvidia’s results. These early‑stage prototypes demonstrate the breadth of emerging AI hardware, even if they are not yet commercially available.

Edge‑Tier Outcomes

Edge systems were largely powered by Nvidia’s A100, T4, AGX Xavier, and Xavier NX chips. Centaur Technology’s reference design, featuring its proprietary x86 processor and a separate AI coprocessor, beat Nvidia’s Tesla T4 in single‑stream ResNet latency but lagged behind on throughput. British consultancy dividiti presented a wide range of scores—from Raspberry Pi to Nvidia AGX Xavier—highlighting how operating system choices and the use of on‑chip accelerators can influence results.

New entrants from the research‑development pool included Russian company IVA Technologies and Korean startup Mobilint. While neither achieved breakthrough performance, their inclusion shows that prototype accelerators are reaching a level of software maturity that warrants benchmarking against commercial solutions.

Mobile and Notebook Benchmarks

The mobile class saw three comparable submissions: MediaTek’s Dimensity 820 (in the Xiaomi Redmi 10X 5G), Qualcomm’s Snapdragon 865+ (tested on an Asus ROG Phone 3), and Samsung’s Exynos 990 (used in the Galaxy Note 20 Ultra). Samsung led in image classification and NLP, MediaTek excelled in object detection, and Samsung again outperformed in image segmentation. The results illustrate that no single SoC dominates across all tasks, underscoring the importance of workload‑specific optimization.

The notebook category contained a single Intel reference design featuring the forthcoming Xe‑LP GPU. Although the limited data set makes detailed comparison difficult, the Xe‑LP achieved 2.5× higher throughput than the best mobile SoC on image‑segmentation (DeeplabV3) and 1.15× better on object detection (SSD‑MobileNetv2).

Looking Ahead

Kanri emphasizes MLCommons’ commitment to diversity in the benchmark ecosystem. The organization encourages non‑Nvidia, non‑Intel, and small‑company participation, with an open division that allows any network or model to be tested. “We’re also working on adding a power‑measurement dimension for the next round,” Kanter says, inviting the community to help develop the necessary tooling.

For a complete, detailed list of MLPerf Inference results, click here.

Source: EE Times

Edge results dominated by Nvidia GPUs, including the Jetson Xavier NX

Renesas & Altran Release First Low‑Rate‑Pulse UWB Wristwatch for Social Distancing Arbe Unveils 2K‑Resolution 4D Imaging Radar Platform for Tier‑1 OEMs

Embedded

Sensor

Cloud Computing

Internet of Things Technology