Speech Recognition to Text Tool
Speech Recognition to Text Tool Open Source Address
This is an offline, locally run speech recognition to text tool based on the openai-whisper open-source model. It can recognize human speech in video/audio and convert it to text, outputting in JSON format, SRT subtitle format with timestamps, or plain text format. It can be used as a self-deployed alternative to OpenAI's speech recognition API or Baidu Speech Recognition, with accuracy basically equivalent to the official OpenAI API interface.
After deployment or download, double-click
start.exeto automatically open the local web page in your browser.Drag and drop or click to select the audio/video file to recognize, then choose the spoken language, output text format, and the model to use (the base model is built-in). Click "Start Recognition", and after recognition is complete, the result will be output on the current web page in the selected format.
The entire process requires no internet connection, runs completely locally, and can be deployed on an intranet.
The openai-whisper open-source model has base/small/medium/large/large-v3 variants. The base model is built-in. From base to large-v3, the recognition effect improves, but it also requires more computer resources. You can download other models as needed and place them in the
modelsdirectory.
Pre-compiled Windows Version Usage / Linux and Mac Source Code Deployment
Click here to open the Releases page and download the pre-compiled files.
After downloading, extract the files to a location, e.g.,
E:/stt.Double-click
start.exeand wait for the browser window to open automatically.Click the upload area on the page, find the audio or video file you want to recognize in the pop-up window, or directly drag and drop the audio/video file into the upload area. Then select the spoken language, text output format, and the model to use. Click "Start Recognition Now". Wait a moment, and the recognition result will be displayed in the selected format in the bottom text box.
If the machine has an NVIDIA GPU and the CUDA environment is correctly configured, CUDA acceleration will be used automatically.
Source Code Deployment (Linux/Mac/Windows)
Requires Python 3.9 -> 3.11.
Create an empty directory, e.g.,
E:/stt. Open a command prompt window in this directory by typingcmdin the address bar and pressing Enter.Use git to pull the source code to the current directory:
git clone git@github.com:jianchang512/stt.git .Create a virtual environment:
python -m venv venv.Activate the environment. On Windows:
%cd%/venv/scripts/activate. On Linux and Mac:source ./venv/bin/activate.Install dependencies:
pip install -r requirements.txt. If you encounter version conflict errors, please runpip install -r requirements.txt --no-deps.On Windows, extract
ffmpeg.7zand placeffmpeg.exeandffprobe.exein the project root directory. On Linux and Mac, go to the ffmpeg official website to download the corresponding version of ffmpeg, extract it, and place theffmpegandffprobebinary programs in the project root directory.Download the model archive. Download the model as needed. After downloading, place the
xx.ptfile from the archive into themodelsfolder in the project root directory.Execute
python start.pyand wait for the local browser window to open automatically.
