Running Whisper ASR on Android Phone/Tablet with Termux

Community Article Published August 27, 2025

Automatic speech recognition on an android samsung tablet. Using whisper model from openai. It accept an audio file , one can record an audio file through microphone, and then let the asr model transcribe it. Same process will work on any android based mobile phone.

$$

This guide shows how to install Termux, build Whisper (ggml / whisper.cpp), record audio, and transcribe it โ€” all locally on your Android device.


  1. Install Termux

  2. Install Termux from F-Droid (not Play Store): ๐Ÿ”— https://f-droid.org/packages/com.termux/

  3. Open Termux once to finish setup.


  1. Install dependencies & build Whisper

Update & basic tools

pkg update -y && pkg upgrade -y pkg install -y git cmake clang make ffmpeg curl

Clone whisper.cpp

cd ~ git clone --depth 1 https://github.com/ggerganov/whisper.cpp.git cd whisper.cpp

Download a model (base.en is light, small.en/medium.en/large-v2 are more accurate)

bash ./models/download-ggml-model.sh base.en

Build without OpenMP (stable on Termux)

cmake -S . -B build -DGGML_NO_OPENMP=ON cmake --build build -j"$(nproc)"


  1. Download a WAV file & test transcription

Download a test WAV

curl -L -o demo.wav https://huggingface.co/datasets/Narsil/asr_dummy/resolve/main/1.wav

Transcribe with whisper.cpp

./build/bin/whisper-cli -m models/ggml-base.en.bin -f demo.wav -l en -otxt -of demo cat demo.txt


  1. Install Termux:API for microphone recording

  2. Install Termux:API app from F-Droid: ๐Ÿ”— https://f-droid.org/packages/com.termux.api/

  3. Give it Microphone permission in Android Settings.

  4. Install the Termux package:

pkg install -y termux-api


  1. Record audio with microphone

Start recording (in one Termux session)

termux-microphone-record -f mic_raw.wav -l 60 start

๐Ÿ‘‰ Records up to 60 seconds. (Change -l 60 for duration in seconds.)

Stop recording (in another Termux session)

termux-microphone-record -q


  1. Transcribe your microphone audio

Convert to 16 kHz mono (Whisper format)

ffmpeg -y -loglevel error -i mic_raw.wav -ar 16000 -ac 1 mic16.wav

Transcribe

./build/bin/whisper-cli -m models/ggml-base.en.bin -f mic16.wav -l en -otxt -of mic cat mic.txt


  1. Optional: Upgrade to larger Whisper models

If your phone or tablet has enough RAM (2โ€“4 GB free) you can run larger models for better accuracy.

Small English-only (better than base.en, still fast)

bash ./models/download-ggml-model.sh small.en

Medium English-only (~769 MB, high accuracy)

bash ./models/download-ggml-model.sh medium.en

Large-v2 multilingual (~1.5 GB, best accuracy, slower)

bash ./models/download-ggml-model.sh large-v2

Use them the same way by changing the -m option. Example with small.en:

./build/bin/whisper-cli -m models/ggml-small.en.bin -f mic16.wav -l en -otxt -of mic_small cat mic_small.txt

For medium:

./build/bin/whisper-cli -m models/ggml-medium.en.bin -f mic16.wav -l en -otxt -of mic_med cat mic_med.txt

For large:

./build/bin/whisper-cli -m models/ggml-large-v2.bin -f mic16.wav -l en -otxt -of mic_large cat mic_large.txt


โœ… Summary

You now have a complete offline automatic speech recognition setup on an Android phone or tablet.

Base models are light and fast.

Small/Medium/Large models trade speed for better accuracy.

You can record live speech, transcribe WAV files, and even process long recordings โ€” all without internet.

This shows that modern ASR can run directly on small devices, proving phones and tablets can be standalone speech recognition machines.


image/jpeg

Community

Sign up or log in to comment