Groq’s TSP AI Chip Now Powers Cloud Workloads via Nimbix

Groq’s tensor streaming processor (TSP) silicon has been added to the cloud, enabling customers to accelerate AI workloads on demand. Nimbix now offers TSP‑based machine‑learning acceleration exclusively to selected customers.

In the competitive arena of data‑center AI silicon, Groq stands alongside Graphcore as the sole startups with commercially available accelerators integrated into cloud services. Graphcore’s chips are already offered through Microsoft Azure.

"Groq’s simplified processing architecture is unique, providing unprecedented, deterministic performance for compute intensive workloads, and is an exciting addition to our cloud-based AI and Deep Learning platform," said Steve Hebert, Nimbix CEO.

Groq is only the second AI accelerator startup to make its hardware available in the cloud (Image: Groq)

Launched last fall, Groq’s TSP chip delivers a staggering 1,000 TOPS (1 peta operations per second). In recent benchmarks, the chip achieved 21,700 ResNet‑50 v2 inferences per second—more than twice the speed of contemporary GPU‑based systems—underscoring its status as one of the fastest commercial neural‑network processors.

"These ResNet‑50 results are a validation that Groq’s unique architecture and approach to machine learning acceleration delivers substantially faster inference performance than our competitors," said Jonathan Ross, Groq co‑founder and CEO. "These real‑world proof points, based on industry‑standard benchmarks and not simulations or hardware emulation, confirm the measurable performance gains for machine learning and artificial intelligence applications made possible by Groq’s technologies."

Groq’s design achieves the massive parallelism needed for deep‑learning acceleration without the synchronization overhead typical of CPUs and GPUs. By offloading control logic to the compiler—a hallmark of Groq’s software‑driven strategy—the company delivers fully deterministic, predictable performance that can be quantified at compile time.

A standout advantage is that Groq’s performance gains do not depend on batching, a common data‑center tactic that processes multiple samples simultaneously to boost throughput. The TSP chip reaches peak performance even at batch = 1, ideal for real‑time inference streams. While the chip offers a moderate 2.5× latency advantage over GPUs at large batch sizes, the advantage climbs to 17× at batch = 1.

Shield96: Arrow’s Secure Linux Board Cuts IoT Edge Time‑to‑Market by 6 Months Lynsyn Lite: Affordable Power Measurement for Energy‑Efficient Embedded Systems

Embedded

Sensor

Cloud Computing

Internet of Things Technology