Whisper.cpp (PCM Fork)
AvailableWhisper.cpp with added stdin/PCM streaming support
Overview
A focused fork of whisper.cpp that adds whisper-stream-pcm: a stdin and pipe-friendly streaming binary for raw PCM audio. It is useful when audio is already coming from another process, service, named pipe, or agent runtime and you do not want SDL microphone capture in the loop.
Key additions
- stdin and pipe input - read from stdin by default, or from a named pipe/file with
--input. - Raw PCM formats - accepts little-endian
s16orf32PCM. - No SDL dependency - designed for process pipelines instead of microphone device capture.
- Optional VAD segmentation - use VAD mode for speech bursts, or fixed-step windows for continuous streams.
Input contract
Normalize audio before it reaches the binary: mono, 16 kHz, raw PCM. The tool does not decode compressed audio, parse WAV headers, resample, or mix channels. ffmpeg works well as the normalization step when the source is a file, stream, or device.
Build target
git clone --branch stream-pcm https://github.com/rmorse/whisper.cpp
cd whisper.cpp
cmake -B build
cmake --build build --config Release Synced from rmorse/whisper.cpp stream-pcm README
Usage
Stream raw PCM (16 kHz, mono) into the tool (non-VAD):
./build/bin/whisper-stream-pcm -m ./models/ggml-base.en.bin --format s16 --sample-rate 16000 --step 1000 --length 10000 --keep 500
Enable VAD-based segmentation (optional, recommended for speech bursts):
./build/bin/whisper-stream-pcm -m ./models/ggml-base.en.bin --format s16 --sample-rate 16000 --vad --vad-probe-ms 200 --vad-silence-ms 800 --vad-pre-roll-ms 300 --length 8000
You can also read from a named pipe (FIFO):
mkfifo /tmp/whisper.pcm
./build/bin/whisper-stream-pcm -m ./models/ggml-base.en.bin --input /tmp/whisper.pcm --format s16 --sample-rate 16000 --step 1000 --length 10000 --keep 500
Example of piping a WAV file using ffmpeg (optional, -re for realtime pacing):
ffmpeg -re -i samples/jfk.wav -f s16le -ac 1 -ar 16000 - | \
./build/bin/whisper-stream-pcm -m ./models/ggml-base.en.bin --format s16 --sample-rate 16000 --step 1000 --length 10000 --keep 500
Windows (PowerShell + cmd /c) pipe example:
cmd /c "ffmpeg -re -hide_banner -loglevel error -i samples\jfk.wav -f s16le -ac 1 -ar 16000 - | build-cpu\bin\Release\whisper-stream-pcm.exe -m models\ggml-base.en.bin --format s16 --sample-rate 16000 --step 1000 --length 10000 --keep 500"
Notes
- Input must be raw PCM, mono, 16 kHz. The tool does not resample.
- Supported formats:
f32ors16(little-endian). - Use
--input -(default) for stdin. --stepmust be > 0 unless--vadis enabled.- For VAD,
--vad-probe-msshould be at least 200 ms; very small probes can fail to trigger.
Building
whisper-stream-pcm does not depend on SDL and builds with the default examples:
cmake -B build
cmake --build build --config Release