Running Whisper ASR on Android Phone/Tablet with Termux
Automatic speech recognition on an android samsung tablet. Using whisper model from openai. It accept an audio file , one can record an audio file through microphone, and then let the asr model transcribe it. Same process will work on any android based mobile phone.
$$
This guide shows how to install Termux, build Whisper (ggml / whisper.cpp), record audio, and transcribe it โ all locally on your Android device.
Install Termux
Install Termux from F-Droid (not Play Store): ๐ https://f-droid.org/packages/com.termux/
Open Termux once to finish setup.
- Install dependencies & build Whisper
Update & basic tools
pkg update -y && pkg upgrade -y pkg install -y git cmake clang make ffmpeg curl
Clone whisper.cpp
cd ~ git clone --depth 1 https://github.com/ggerganov/whisper.cpp.git cd whisper.cpp
Download a model (base.en is light, small.en/medium.en/large-v2 are more accurate)
bash ./models/download-ggml-model.sh base.en
Build without OpenMP (stable on Termux)
cmake -S . -B build -DGGML_NO_OPENMP=ON cmake --build build -j"$(nproc)"
- Download a WAV file & test transcription
Download a test WAV
curl -L -o demo.wav https://huggingface.co/datasets/Narsil/asr_dummy/resolve/main/1.wav
Transcribe with whisper.cpp
./build/bin/whisper-cli -m models/ggml-base.en.bin -f demo.wav -l en -otxt -of demo cat demo.txt
Install Termux:API for microphone recording
Install Termux:API app from F-Droid: ๐ https://f-droid.org/packages/com.termux.api/
Give it Microphone permission in Android Settings.
Install the Termux package:
pkg install -y termux-api
- Record audio with microphone
Start recording (in one Termux session)
termux-microphone-record -f mic_raw.wav -l 60 start
๐ Records up to 60 seconds. (Change -l 60 for duration in seconds.)
Stop recording (in another Termux session)
termux-microphone-record -q
- Transcribe your microphone audio
Convert to 16 kHz mono (Whisper format)
ffmpeg -y -loglevel error -i mic_raw.wav -ar 16000 -ac 1 mic16.wav
Transcribe
./build/bin/whisper-cli -m models/ggml-base.en.bin -f mic16.wav -l en -otxt -of mic cat mic.txt
- Optional: Upgrade to larger Whisper models
If your phone or tablet has enough RAM (2โ4 GB free) you can run larger models for better accuracy.
Small English-only (better than base.en, still fast)
bash ./models/download-ggml-model.sh small.en
Medium English-only (~769 MB, high accuracy)
bash ./models/download-ggml-model.sh medium.en
Large-v2 multilingual (~1.5 GB, best accuracy, slower)
bash ./models/download-ggml-model.sh large-v2
Use them the same way by changing the -m option. Example with small.en:
./build/bin/whisper-cli -m models/ggml-small.en.bin -f mic16.wav -l en -otxt -of mic_small cat mic_small.txt
For medium:
./build/bin/whisper-cli -m models/ggml-medium.en.bin -f mic16.wav -l en -otxt -of mic_med cat mic_med.txt
For large:
./build/bin/whisper-cli -m models/ggml-large-v2.bin -f mic16.wav -l en -otxt -of mic_large cat mic_large.txt
โ Summary
You now have a complete offline automatic speech recognition setup on an Android phone or tablet.
Base models are light and fast.
Small/Medium/Large models trade speed for better accuracy.
You can record live speech, transcribe WAV files, and even process long recordings โ all without internet.
This shows that modern ASR can run directly on small devices, proving phones and tablets can be standalone speech recognition machines.