Parakeet-API: High-Performance Local Speech Transcription Service
The parakeet-api project is a local speech transcription service based on the NVIDIA Parakeet-tdt-0.6b model. It provides an interface compatible with the OpenAI API and a clean Web UI, allowing you to easily and quickly convert any audio or video file into high-precision SRT subtitles. It is also compatible with pyVideoTrans v3.72+.
Project Open Source Address: https://github.com/jianchang512/parakeet-api
Windows All-in-One Package Download:
Download Link: https://pan.baidu.com/s/1cCUyLw93hrn40eFP_qV6Dw?pwd=1234
Usage: After extracting, double-click 启动.bat (Start.bat). Wait until the interface shown below appears and the browser opens automatically, indicating a successful launch. Successful Launch Interface

Usage with pyVideoTrans
Parakeet-API can be seamlessly integrated with the video translation tool pyVideoTrans (version v3.72 and above).

- Ensure your
parakeet-apiservice is running locally. - Open the
pyVideoTranssoftware. - In the menu bar, select Speech Recognition(R) -> Nvidia parakeet-tdt.
- In the pop-up configuration window, set the "http address" to:
http://127.0.0.1:5092/v1 - Click "Save", and you can start using it.
Source Code Deployment
🛠️ Installation and Configuration Guide
This project supports Windows, macOS, and Linux. Please follow the steps below for installation and configuration.
Step 0: Set up Python 3.10 Environment
If you don't have Python 3 installed locally, please install it following this tutorial: https://pvt9.com/_posts/pythoninstall
Step 1: Prepare FFmpeg
This project uses ffmpeg for audio/video format preprocessing.
Windows (Recommended):
- Download from FFmpeg GitHub Repository. After extracting, you will get
ffmpeg.exe. - Place the downloaded
ffmpeg.exefile directly in the root directory of this project (at the same level as theapp.pyfile). The program will automatically detect and use it; no environment variable configuration is needed.
- Download from FFmpeg GitHub Repository. After extracting, you will get
macOS (using Homebrew):
bashbrew install ffmpegLinux (Debian/Ubuntu):
bashsudo apt update && sudo apt install ffmpeg
Step 2: Create Python Virtual Environment and Install Dependencies
Download or clone this project's code to your local computer (recommended to place it in a folder with an English or numeric name on a non-system drive).
Open a terminal or command-line tool and navigate to the project root directory (on Windows, simply type
cmdin the folder address bar and press Enter).
Create a virtual environment:
python -m venv venvActivate the virtual environment:
- Windows (CMD/PowerShell):
.\venv\Scripts\activate - macOS / Linux (Bash/Zsh):
source venv/bin/activate
- Windows (CMD/PowerShell):
Install dependencies:
If you do NOT have an NVIDIA GPU (CPU only):
bashpip install -r requirements.txtIf you have an NVIDIA GPU (using GPU acceleration): a. Ensure you have the latest NVIDIA Driver and the corresponding CUDA Toolkit installed. b. Uninstall any existing old PyTorch versions:
pip uninstall -y torchc. Install PyTorch matching your CUDA version (using CUDA 12.6 as an example):bashpip install torch --index-url https://download.pytorch.org/whl/cu126
Step 3: Start the Service
In the terminal with the virtual environment activated, run the following command:
python app.pyYou will see prompts indicating the service is starting. The first run will download the model (approx. 1.2GB), please be patient. 
If a bunch of warnings appear, you can ignore them. 
Successful Launch Interface

🚀 How to Use
Method 1: Using the Web Interface
- Open in your browser: http://127.0.0.1:5092
- Drag and drop or click to upload your audio/video file.
- Click "Start Transcription", wait for processing to complete, and you can view and download the SRT subtitle below.

Method 2: API Call (Python Example)
You can easily call this service using the openai library.
from openai import OpenAI
client = OpenAI(
base_url="http://127.0.0.1:5092/v1",
api_key="any-key",
)
with open("your_audio.mp3", "rb") as audio_file:
srt_result = client.audio.transcriptions.create(
model="parakeet",
file=audio_file,
response_format="srt"
)
print(srt_result)