Quadric’s Hybrid Data‑Flow & Von Neumann Chip Accelerates AI and Vision Workloads

A novel hybrid data‑flow and Von Neumann architecture can accelerate workloads including neural networks, machine learning, computer vision, DSP and basic linear algebra subprograms.

Quadric, a Silicon Valley startup, has engineered an accelerator that boosts both AI inference and traditional computer‑vision algorithms on edge devices such as robots, factory automation and medical imaging. The hardware’s unique hybrid design unifies data‑flow and Von Neumann principles, enabling efficient execution of neural networks, machine learning, DSP, and linear‑algebra routines.

“From the outset, we recognized that edge devices demand more than AI alone,” CEO Veerbhan Kheterpal told EE Times. “Developers need a single system that can run classical high‑performance computing algorithms alongside AI.”

Kheterpal emphasized that the architecture is not a collection of discrete accelerators but a single, data‑parallel instruction set that can accelerate diverse workloads, including AI inference.

“Recent work shows that replacing entire layers with fast Fourier transforms (FFT) can dramatically speed up transformers,” said Chief Product Officer Daniel Firu. Quadric is poised to accelerate such workloads, citing a Google paper where a transformer encoder’s self‑attention sub‑layer was swapped for an FFT, achieving 92 % accuracy on the BERT benchmark and training up to seven times faster on GPUs or twice as fast on TPUs.

Quadric’s developer kit, an M.2 card with the Q16 processor and 4 GB of external memory (Source: Quadric)

Vineyard Robots

The founders—Veerbhan Kheterpal, Daniel Firu and Nigel Drego—previously founded 21, a bitcoin‑mining company acquired by Coinbase. Quadric’s early focus was on agricultural robots that traversed Napa Valley vineyards, detecting irrigation leaks and pests. However, the high cost of drone‑based solutions pushed the team to explore a custom accelerator.

Quadric’s Hybrid Data‑Flow & Von Neumann Chip Accelerates AI and Vision Workloads

Veerbhan Kheterpal (Source: Quadric)

“We realized that a drone‑based system would cost $5–10 k, but a tractor‑based platform would exceed $50 k with large GPUs and numerous cameras. That insight led us to design the accelerator chip we needed,” Kheterpal explained.

Following a seed round in 2017 and a $13 M Series A led by automotive Tier‑One Denso, Quadric has raised $18 M to date.

Turing‑Complete Design

Quadric’s instruction‑driven architecture blends data‑flow and Von Neumann concepts to replace heterogeneous edge systems with a single, flexible solution. The Turing‑complete Vortex cores deliver acceleration while retaining programmability. Scalable arrays and compatibility with 7‑ or 5‑nm processes make the design suitable for edge devices with power budgets from hundreds of milliwatts to 20 W.

The Q16 chip hosts a 16 × 16 array of Vortex cores. Each core can perform matrix multiplication and AI operations, and includes a multifunctional ALU for logic, reduction, shift, and more. Software exposes a broad algorithmic spectrum—from LSTM activation functions to custom control flow—using if‑then‑else statements across the array to exploit fine‑grained sparsity.

Every core enjoys single‑cycle access to its neighbors and a 4 Kb in‑core memory, while the on‑chip memory provides low‑latency, deterministic data access.

The cores execute in a “single instruction, multiple decode” mode: the same instruction is broadcast each cycle, but dynamic runtime data lets individual cores interpret it differently, enabling heterogeneous function within a homogeneous array.

A dedicated broadcast bus efficiently streams constants—such as neural‑network weights—into all cores simultaneously. Static, software‑controlled load‑store units allow deterministic kernel runtimes, with simultaneous dual‑edge loads and triple‑edge stores reducing execution time.

Daniel Firu (Source: Quadric)

“Loading from one side and storing from another lets us perform data re‑mappings and image rotations at the hardware level,” Firu said.

On‑chip static memories, not caches, provide ample space for large data structures. The Q16’s 8 GB of memory can hold multiple HD frame buffers or an entire neural‑network weight set.

Software Stack

Quadric built its software stack before silicon, allowing developers to prototype on a simulator or FPGA. The stack abstracts the architecture via an LLVM‑based compiler and a C++ API. Source Mode supports data‑parallel algorithms with C++ control over architectural features, and can express custom operations as AI models grow more complex.

Quadric’s software stack (Source: Quadric)

A forthcoming Graph Mode will provide a no‑code interface for TensorFlow or ONNX models, backed by a TVM‑based DNN compiler that auto‑generates code.

“Most platforms limit you to their own AI compiler, but we offer full Turing‑complete cores that can execute any operation,” Kheterpal explained. “This flexibility lets developers deploy custom algorithms or unsupported operators without restriction.”

Chip Roadmap

The Q16 delivers 256 Vortex cores in a 16 × 16 array fabricated in 16 nm silicon, achieving 4 INT8 DNN TOPS. It runs ResNet‑50 at 200 inferences per second on 224 × 224 images, drawing an average of 2 W.

Quadric’s roadmap includes a second‑generation architecture and a Q32 chip—an array of 1,000 cores—targeted for 7 nm manufacturing. The Q32 may also integrate ARM or RISC‑V cores to serve as a host processor.

An M.2 developer kit featuring the Q16 processor and 4 GB of external memory is available today.

>> This article was originally published on our sister site, EE Times.

Related Contents:

Hardware accelerators serve AI applications
When a DSP beats a hardware accelerator
A guide to accelerating applications with just‑right RISC‑V custom instructions
Inference chip performance builds on optimized memory subsystem design
New AI acceleration modules enhance edge performance
Edge AI challenges memory technology

For more Embedded, subscribe to Embedded’s weekly email newsletter.

Ambarella Introduces AI‑Powered SoCs for Multi‑Sensor Video Streams Adaptive ANC Solutions Deliver Superior Audio Performance

Embedded

Sensor

Cloud Computing

Internet of Things Technology

Quadric’s Hybrid Data‑Flow & Von Neumann Chip Accelerates AI and Vision Workloads

Quadric’s Hybrid Data‑Flow & Von Neumann Chip Accelerates AI and Vision Workloads