Build a Speech‑Controlled Robot with Windows 10 IoT Core on Raspberry Pi 2

Story

Early computing relied on punch cards, trackballs, and keyboards—each requiring direct physical contact. As technology evolved, wireless and touch interfaces streamlined user interaction. Today, visual and voice input offer even more intuitive ways to control devices.

This guide demonstrates how to leverage Windows 10 IoT Core’s built‑in Speech Recognition to command a robot built on a Raspberry Pi 2. By the end, you’ll have a robot that moves, turns, and manages obstacle detection through spoken commands.

New to Windows 10 IoT Core? Start with this overview.

Updated March 30, 2016

What is Speech Recognition?

Speech Recognition translates spoken words into text. It typically involves two core components: signal processing and a speech decoder. Microsoft’s Speech SDK abstracts these complexities, allowing developers to focus on application logic.

Step 1

Getting Started with Speech Recognition

The basic workflow is:

Create a Speech Recognition Grammar (SRGS)
Instantiate SpeechRecognizer and load the grammar
Subscribe to recognition events and implement handlers

Create Speech Recognition Grammar

Defining a custom grammar lets the app understand the specific commands you want the robot to accept. For this project, the vocabulary includes:

Move Forward
Move Reverse
Turn Right
Turn Left
Stop
Engage Obstacle Detection
Disengage Obstacle Detection

The grammar is expressed in SRGS XML. Key structural rules:

The root element must be <grammar>.
It must include version, xml:lang, and the SRGS namespace.
At least one <rule> element is required.
Each rule must have a unique id attribute.

For detailed SRGS guidance, consult the MSDN and W3C documentation.

Initialize Speech Recognizer and Load Grammar

SpeechRecognizer resides in the Windows.Media.SpeechRecognition namespace. Import it, then load your SRGS XML file. If compilation fails, verify that a microphone is connected and recognized by IoT Core.

Register for Speech Recognizer Events and Create Handler

Once the recognizer is running, it emits ResultGenerated when it successfully parses speech. Use this event to extract args.Result.Text and map it to robot actions. The StateChanged event informs you when the recognizer starts or stops listening.

Visual Studio can auto‑generate handler methods using the Tab key. Alternatively, register handlers immediately after creating the SpeechRecognizer instance.

Step 2

How to Drive on Parsed Speech

In the ResultGenerated handler, inspect args.Result.Text and perform conditional logic to control the robot’s motors. The MotorDriver class (included in the sample) abstracts GPIO manipulation. Full source is provided at the end of the article.

Step 3

Update Device Capability

Before deploying to the Raspberry Pi 2, add the microphone capability to your app’s package manifest. This grants the app permission to access audio input.

Once the software is ready, wire the hardware as described below.

Step 4

Deploy & Register App as Startup Application

To ensure the robot listens for commands immediately after boot, register your app as a startup application. Deploy it first, then use either PowerShell or the IoT Core Web‑Management Portal.

It’s a good idea to change the package family name before deployment to avoid conflicts.

After deployment, register the app as a startup service using the Web‑Management Portal.

If you encounter registration issues, refer to this troubleshooting guide.

After a successful registration, reboot the Raspberry Pi 2 and confirm that the app starts automatically.

Schematic

The robot’s hardware consists of a chassis with DC motors, a Raspberry Pi 2 running Windows 10 IoT Core, a 9‑12 V motor battery, a distance sensor, and power supplies. The motor battery feeds the H‑Bridge driver, while the Raspberry Pi 2 requires a dedicated 5 V source—either a USB PowerBank or a 7805 regulator.

Why Resistors with Ultrasonic Distance Sensor?

The ultrasonic sensor outputs 5 V on its Echo pin, which exceeds the Raspberry Pi 2’s 3.3 V logic level. A voltage divider (R1 = 1 kΩ, R2 = 2 kΩ) reduces the voltage to 3.3 V:

V_out = 5 × (2 kΩ / (1 kΩ + 2 kΩ)) = 3.3 V

WARNING: Directly connecting the Echo pin to a Pi GPIO will damage the board. Always use a level shifter or divider.

Final Assembly

Known Issues

Speech Recognition Won’t Work (Build 10586)

Speech recognition fails on IoT devices running Windows IoT Core build 10586.

Solution: Revert to build 10240 until Microsoft releases an update that resolves the issue.

Microphone Problem

Recognition accuracy drops with low‑quality microphones, especially at distances over 1–2 m.

Solution: Use a high‑quality or wireless microphone. If necessary, amplify the signal or consider a noise‑cancelling headset.

Recognizer Processing Delay

Speech recognition introduces a latency of 600–2000 ms. For fast‑moving robots, this delay can cause misalignment between command and action.

Solution: Current SDK versions do not reduce this delay. Future releases may offer optimizations.

Pronunciation Difference

Accents and regional pronunciations can affect recognition. Specify the language and region in the SRGS file (e.g., xml:lang="en-GB" for UK English).

Environmental Noise

Background noise can reduce accuracy. While it’s hard to eliminate, using a noise‑cancelling microphone can mitigate the issue.

USB Microphone / USB SoundCard Won’t Recognize

Starting with build 10531, Windows IoT Core supports generic USB audio devices. If your device uses a proprietary driver, it may not work.

Try a different USB microphone or sound card that uses a generic driver.

Future Enhancements

Extend the robot with visual feedback—e.g., a green LED lights up for successful commands and a red LED indicates errors. You can also add “listening” and “sleep” states to avoid accidental activations.

Did You Notice?

The animated title showcases a feature not covered in this text. Explore the animation carefully and try to implement the hidden capability.

Good luck!

Source: Speech Controlled Robot

Archimedes: The Emotion‑Sensing AI Robot Owl Build a Windows IoT Core Rover with Raspberry Pi 2 – Beginner to Advanced Guide

Manufacturing process

3D printing

Automation Control System

Industrial Technology

Build a Speech‑Controlled Robot with Windows 10 IoT Core on Raspberry Pi 2