View on GitHub

Podscripter

A Whisper-based tool for creating podcast-ready .srt and .txt transcripts locally using Docker.

podscripter

podscripter

Overview

podscripter is a lightweight tool designed to transcribe audio using OpenAI’s Whisper model inside a Docker container. It supports multiple languages with automatic language detection, including English (en), Spanish (es), French (fr), and German (de).

podscripter enables users to generate accurate transcriptions locally, making it perfect for platforms like LingQ where text and audio integration can boost comprehension.


Features


Quickstart

Minimal setup and a single run:

# Build image (Apple Silicon).
# On Intel Macs or other architectures, remove `--platform linux/arm64`.
docker build --platform linux/arm64 -t podscripter .

# Create cache folders (first time only)
mkdir -p audio-files models/sentence-transformers models/huggingface

# Transcribe one file (TXT output). Replace example.mp3 with your file.
docker run --rm --platform linux/arm64 \
  -v $(pwd):/app \
  -v $(pwd)/models/sentence-transformers:/root/.cache/torch/sentence_transformers \
  -v $(pwd)/models/huggingface:/root/.cache/huggingface \
  -v $(pwd)/audio-files:/app/audio-files \
  podscripter python3 /app/podscripter.py \
  /app/audio-files/example.mp3 --output_dir /app/audio-files

Notes:


Requirements


Installation

1. Install Prerequisites

Make sure you have the following tools installed on your system:

2. Clone the Repository

Open a terminal and run:

  git clone https://github.com/algernon725/podscripter.git
  cd podscripter

3. Set Up Required Folders

Create folders to store audio files and model data:

  mkdir -p audio-files
  mkdir -p models/sentence-transformers models/huggingface

This creates the necessary directory structure for caching models:

4. Build the Docker Image

Build the container image that will run the transcription tool:

  docker build --platform linux/arm64 -t podscripter .

💡 If you’re on an Intel Mac or other architecture, remove --platform linux/arm64

5. Start the Docker Container

Run the container and mount the folders you just created:

  docker run --platform linux/arm64 -it \
    -v $(pwd)/models/sentence-transformers:/root/.cache/torch/sentence_transformers \
    -v $(pwd)/models/huggingface:/root/.cache/huggingface \
    -v $(pwd)/audio-files:/app/audio-files \
    podscripter

This opens an interactive terminal inside the container. You’ll run all transcription commands from here.

💡 If you’re on an Intel Mac or other architecture, remove --platform linux/arm64

Alternative: Use the caching script

  ./docker-run-with-cache.sh

💡 Model Caching: The first run will download models (~1-2 GB). Subsequent runs will use cached models for faster startup.

⚙️ NLP Capitalization: The image enables spaCy-based capitalization by default (NLP_CAPITALIZATION=1). To disable per run, pass -e NLP_CAPITALIZATION=0 to docker run.

Usage

Basic Usage

From inside the Docker Container, run:

python podscripter.py <media_file> --output_dir <output_dir> \
  [--language <code>|auto] [--output_format {txt|srt}] [--single] \
  [--compute-type {auto,int8,int8_float16,int8_float32,float16,float32}] \
  [--beam-size <int>] [--no-vad] [--vad-speech-pad-ms <int>] \
  [--quiet|--verbose]

Example:

To transcribe example.mp3 using default settings (auto-detect language, txt output):

python podscripter.py audio-files/example.mp3 --output_dir audio-files

Example with video file:

To transcribe example.mp4:

python podscripter.py audio-files/example.mp4 --output_dir audio-files

Examples

One example per scenario to keep things concise.

TXT (default, auto-detect language)

python podscripter.py audio-files/example.mp3 --output_dir audio-files

SRT (subtitles)

python podscripter.py audio-files/example.mp3 --output_dir audio-files --output_format srt

Single-call (no manual chunking)

python podscripter.py audio-files/example.mp3 --output_dir audio-files --single

Use --single if your hardware can handle longer files in a single call for best context continuity. Default mode uses overlapped chunking with VAD.

Expected output snippets

English (TXT):

Hello everyone, welcome to our show!
Today, we’re going to talk about travel tips.

Spanish (TXT):

Hola a todos, ¡bienvenidos a Españolistos!
Hoy vamos a hablar de algunos consejos de viaje.

Options

Argument Description
media_file Path to the audio or video file (e.g. audio-files/example.mp3 or audio-files/example.mp4)
--output_dir Directory where the transcription file will be saved
--language Language code. Primary: en, es, fr, de. Others are experimental. Default auto (auto-detect)
--output_format Output format: txt or srt (default txt)
--single Bypass manual chunking and process the full file in one call
--compute-type Compute type for faster-whisper: auto, int8, int8_float16, int8_float32, float16, float32 (default auto)
--beam-size Beam size for decoding (default 3)
--no-vad Disable VAD filtering (default: VAD enabled)
--vad-speech-pad-ms Padding in milliseconds when VAD is enabled (default 200)
--quiet/--verbose Toggle log verbosity (default --verbose)

Supported Languages

PodScripter supports automatic language detection and manual language selection for the following languages:

Language Code Language Code
English en Spanish es
French fr German de

Note: Whisper can transcribe many additional languages, but only the four listed above have project-level optimization and tests. Other languages are considered experimental.

Optional NLP Capitalization (spaCy)

Punctuation restoration uses Sentence-Transformers. You can optionally enable an NLP capitalization pass (spaCy) that capitalizes named entities and proper nouns for English, Spanish, French, and German.

This pass is CPU-only and cached via spaCy “sm” models baked into the image.

Development

See tests/README.md for details on running tests and using the ad-hoc script tests/test_transcription.py (supports raw dumps with --dump-raw).

Batch Transcription: All Media Files

To transcribe all .mp3 and .mp4 files in the audio-files folder with auto-detection (default), run this from inside the container:

  for f in audio-files/*.{mp3,mp4}; do
    python podscripter.py "$f" --output_dir audio-files
  done

Why Use This?

When learning a new language, especially through podcasts, having accurate, aligned transcriptions is essential for comprehension and retention. Many language learning apps impose monthly transcription limits or rely on cloud-based AI. This tool gives you full control over your data, with no recurring costs, and the power of Whisper, all on your own hardware.

Model Caching

Podscripter caches models locally to avoid repeated downloads. Cache locations are created during Installation → “Set Up Required Folders” and are mounted into the container in the run commands above. In short:

Note: The Sentence-Transformers loader first attempts to load from the local cache and prefers offline use when the cache is present (avoids network calls). When caches are warm you may set HF_HOME and/or HF_HUB_OFFLINE=1 to run fully offline.

To clear cache and re-download models:

rm -rf models/sentence-transformers/* models/huggingface/*

Output

Transcriptions are saved in sentence-separated .txt or .srt


Testing

Run the test suite inside Docker with caches mounted. See tests/README.md for details.

Quick run (default selection):

docker run --rm --platform linux/arm64 \
  -e NLP_CAPITALIZATION=1 \
  -v $(pwd):/app \
  -v $(pwd)/models/sentence-transformers:/root/.cache/torch/sentence_transformers \
  -v $(pwd)/models/huggingface:/root/.cache/huggingface \
  -v $(pwd)/audio-files:/app/audio-files \
  podscripter python3 /app/tests/run_all_tests.py

Optional groups via env flags: RUN_ALL=1, RUN_MULTILINGUAL=1, RUN_TRANSCRIPTION=1, RUN_DEBUG=1.


Troubleshooting