This is the story of getting the omega-office voice satellite from breadboard to working wake word detection. If you’re here because you’ve got an ESP32-S3 AI Smart Speaker board, ESPHome, and a screen full of ESP_ERR_TIMEOUT errors — you’re in the right place.

The hardware

Waveshare ESP32-S3 AI Smart Speaker Development Board. Dual microphones, ES7210 ADC, ES8311 DAC, seven WS2812 RGB LEDs, 8MB PSRAM. Decent hardware, not a lot of current ESPHome documentation, and reference configs that don’t quite work on modern HA versions.

GPIO assignments, since you’ll need them:

  • I2C SDA/SCL: 11/10
  • MCLK/BCLK/LRCLK: 12/13/14
  • Mic DIN: 15
  • Speaker DOUT: 16
  • LED strip: 38

The first mistake: trusting the reference config

The most referenced community config for this board uses a two-bus I2S topology — one bus for the microphone, one for the speaker — with the ESP32-S3 in secondary/slave mode. On paper it’s fine. In practice, on ESPHome 2026.x, it produces persistent ESP_ERR_TIMEOUT errors on mic reads.

The errors are consistent enough that you’ll think something is wrong with the hardware. It isn’t. The two-bus secondary topology simply doesn’t behave correctly on this board with current ESPHome. Move on.

The fix: single shared bus, ESP32-S3 as primary

The working topology is a single i2s_shared bus instance with the ESP32-S3 driving all clocks — MCLK, BCLK, and LRCLK. Both the microphone and speaker components reference the same bus. No allow_other_uses, no force_master, no mclk_multiple on the DAC.

Use the official built-in ES8311 component, not the Dan333 external component. The built-in handles primary mode correctly. The external component introduces the secondary/slave behaviour that causes the timeout errors.

The second mistake: skipping PSRAM

The board has 8MB PSRAM. ESPHome won’t use it unless you explicitly enable it:

psram:
  mode: octal
  speed: 80MHz

Without this, the API server becomes unstable under the combined memory pressure of WiFi, audio processing, and wake word inference. You’ll see the device connect and immediately drop, or refuse connections on port 6053. Enable PSRAM before you do anything else.

The third gotcha: wake word never starting

Once the hardware was stable, wake word detection still wasn’t firing. The device was online, HA could see it, but nothing happened when I said the wake word.

The cause: micro_wake_word wasn’t starting. The on_client_connected trigger calls micro_wake_word.start, but that only works if you’ve explicitly linked the component with an ID:

micro_wake_word:
  id: mww
  ...

voice_assistant:
  micro_wake_word: mww
  ...

Without the explicit ID linkage, the component exists but the voice assistant doesn’t know to start it. This one isn’t in the docs. You either find it in a working config or you spend a while staring at logs wondering why nothing is happening.

Boot sequencing

Two more timing issues worth knowing about:

The LED crash. On first flash (not on restart), LED automations fire before the API server finishes initialising. The fix is a 500ms boot delay on the LED automation. This one is particularly annoying because it works fine on restart, so you flash, think it’s broken, power cycle, and it works — and you never quite identify the cause.

The cold boot detection issue. Starting micro_wake_word immediately on boot produces unreliable detection. A 30-second delayed start via on_client_connected gives the I2S bus time to stabilise. Skip this and you’ll get inconsistent wake word response on cold boots.

The openWakeWord detour

Before landing on on-device micro_wake_word, I tried server-side wake word detection via the openWakeWord HA add-on. This used to be the recommended path for ESPHome satellites.

It no longer works reliably. openWakeWord v2.0.0 removed preload model support, which breaks the model negotiation between the add-on and ESPHome satellites on current HA versions. You’ll get no_wake_word in the entity state and an empty wake word dropdown regardless of what models you install.

On-device micro_wake_word is the correct path for current ESPHome versions. It’s also architecturally better — audio stays on the chip until the wake word fires.

What’s working

  • ESPHome 2026.5.3
  • micro_wake_word with okay_nabu_20241226.3
  • Single shared I2S bus, primary mode
  • PSRAM enabled, octal, 80MHz
  • Wake word detection confirmed, API stable

Next up: full voice assistant pipeline — VAD, speech-to-text, and response playback on top of this confirmed architecture. That’ll be another post.