Low‑Power Silicon Cochlea Enables Ultra‑Efficient Voice‑Activated Sensing

As voice‑activated assistants become ever more common, the demand for energy‑efficient, always‑on audio sensing has surged. Cutting‑edge neuromorphic engineering now offers a compelling answer: a silicon cochlea that mirrors the human ear’s bio‑mechanics to deliver low‑latency, low‑power speech detection.

The Silicon Cochlea: A Bio‑Inspired Audio Front‑End

Professor Shih‑Chii Liu and her team at the Institute of Neuroinformatics (INI) secured the 2020 Misha Mahowald Prize for Neuromorphic Engineering for their dynamic audio sensor (DAS). The core of the DAS is a silicon cochlea that emulates the ear’s hair cells: incoming sound is first filtered into a bank of analog band‑pass filters, then half‑wave rectified to replicate the mechanical transduction of cochlear hair cells.

Conventional audio systems first digitize sound, then extract features with FFT and BPF before feeding them to a DSP. In contrast, the INI‑Zurich DAS transmits analog audio bands as asynchronous spike trains that are directly processed by downstream neural networks.

Following the initial filtering, the system converts the rectified signals into electrical spikes—either via a classic integrate‑and‑fire mechanism or an asynchronous delta modulator (ADM). The ADM compares the signal against two thresholds, emitting events only when changes occur. This event‑driven encoding suppresses static information, dramatically reducing data traffic and, consequently, power consumption.

From an energy standpoint, the silicon cochlea remains virtually idle when no sound is present, but its power draw scales with activity. For applications that involve long periods of silence punctuated by brief speech bursts—typical of many voice‑activated scenarios—this dynamic power profile offers a distinct advantage. Moreover, the continuous‑time spike representation supports an exceptionally wide dynamic range, as spikes can be spaced arbitrarily far apart or clustered tightly.

From Spikes to Speech Recognition

The DAS’s event streams can be converted into 2D cochleagram frames—histograms of spikes binned by frequency over 5‑ms intervals. These cochleagrams are then fed to a lightweight deep neural network that decodes spoken words. Liu notes that “deep networks on a sensor are of great interest to the IEEE ISSCC community and are timely given the current boom in audio edge computing.” While many low‑power ASICs for keyword spotting rely on conventional spectrogram features, Liu’s hybrid analog‑digital approach promises even lower power consumption and faster latency.

Last year INI released a demonstration video in which the system recognizes spoken digits. Although still in early stages, the prototype showcases the potential of silicon cochleas for real‑world keyword spotting. Liu’s team has also explored sensor fusion, combining audio with visual cues to enhance classification accuracy, and has published design guidelines that help engineers decide when analog sensing is most advantageous.

Misha Mahowald, inventor of the address‑event representation and namesake of the Neuromorphic Engineering Prize.

Ongoing research focuses on further reducing power and improving performance. Innovations include optimizing source‑follower band‑pass filters and refining analog feature extractors. To mitigate variability inherent in analog circuits, the team built a hardware emulator that accelerates testing and allows them to train binary neural nets in software, accurately predicting real‑chip performance. They are also experimenting with injecting controlled noise to make designs more robust.

Recognition and Impact

Professor Liu, a pioneer in neuromorphic engineering who trained in Carver Mead’s Caltech lab, highlighted the significance of the Mahowald Prize: “This honor underscores decades of foundational work—from Dick Lyon to Carver Mead and beyond.” She emphasizes that, as Moore’s Law slows, hybrid analog‑digital systems like the DAS will become increasingly vital, potentially surpassing digital computation by orders of magnitude in energy efficiency.

References

[1] D. Neil and S. C. Liu, “Effective sensor fusion with event‑based sensors and deep network architectures,” in Proceedings – IEEE International Symposium on Circuits and Systems, Jul. 2016, vol. 2016-July, pp. 2282–2285, doi: 10.1109/ISCAS.2016.7539039.

[2] S. C. Liu, B. Rueckauer, E. Ceolini, A. Huber, and T. Delbruck, “Event-Driven Sensing for Efficient Perception: Vision and audition algorithms,” IEEE Signal Process. Mag., vol. 36, no. 6, pp. 29–37, Nov. 2019, doi: 10.1109/MSP.2019.2928127.

[3] M. Yang, S.-C. Liu, M. Seok, and C. Enz, “Ultra‑Low‑Power Intelligent Acoustic Sensing using Cochlea‑Inspired Feature Extraction and DNN Classification.”

[4] M. Yang, C. H. Chien, T. Delbruck, and S. C. Liu, “A 0.5 V 55 µW 64 × 2 Channel Binaural Silicon Cochlea for Event-Driven Stereo-Audio Sensing,” IEEE J. Solid‑State Circuits, vol. 51, no. 11, pp. 2554–2569, Nov. 2016, doi: 10.1109/JSSC.2016.2604285.

> > This article was originally published on our sister site, EE Times.

Edge AI Gains Momentum: Cloud Leaders AWS and Microsoft Show Integrated Inference & Management Solutions How MIPI Alliance Standards Power the Industrial Internet of Things

Internet of Things Technology

Embedded

Sensor

Cloud Computing

Internet of Things Technology