What is the HuggingFace ASR Speech Recognition Channel?

HuggingFace ASR is a speech recognition channel added in pyVideoTrans v3.91, supporting the use of open-source models from HuggingFace. This channel covers recognition models for multiple languages, suitable for scenarios involving non-Chinese speech (such as English, Japanese, Vietnamese, Thai, etc.).

Prerequisites

Video translation software version >= v3.91
On first use, models are automatically downloaded from HuggingFace, requiring network connectivity
Users in certain regions may need to use a mirror site or download models manually (see below)
GPU acceleration significantly improves recognition speed (NVIDIA GPU recommended)

Automatic Download Information

On first use, the software automatically downloads from:

International site: https://huggingface.co
Mirror site: https://hf-mirror.com

Due to network issues, automatic downloads may fail for some users. If this happens, refer to the Manual Download section below.

Supported Models and Languages

English Models

Model	Language	Backend	Notes
nvidia/parakeet-ctc-1.1b	English	pipe_asr	NVIDIA product, high accuracy

Japanese Models

Model	Language	Backend	Notes
reazon-research/japanese-wav2vec2-large-rs35kh	Japanese	pipe_asr	wav2vec2 architecture
kotoba-tech/kotoba-whisper-v2.0	Japanese	pipe_asr	Whisper optimized for Japanese
zh-plus/faster-whisper-large-v2-japanese-5k-steps	Japanese	faster_whisper	Supports VAD pre-segmentation
JhonVanced/whisper-large-v3-japanese-4k-steps-ct2	Japanese	faster_whisper	Supports VAD pre-segmentation
jonatasgrosman/wav2vec2-large-xlsr-53-japanese	Japanese	pipe_asr	Multilingual wav2vec2

Vietnamese Models

Model	Language	Backend	Notes
suzii/vi-whisper-large-v3-turbo-v1	Vietnamese	pipe_asr	Fine-tuned whisper-large-v3-turbo

Thai Models

Model	Language	Backend	Notes
biodatlab/whisper-th-medium	Thai	pipe_asr	whisper-medium Thai version
biodatlab/whisper-th-large-v3	Thai	pipe_asr	whisper-large-v3 Thai version

Model Backend Types

pipe_asr: Standard inference backend based on HuggingFace Pipeline, with good compatibility — suitable for most models
faster_whisper: Inference backend based on faster-whisper, faster speed, supports whisper_prepare (VAD pre-segmentation) — can perform voice activity detection before recognition to improve efficiency

Manual Download Method

If automatic download fails, you can download model files manually. All models follow the same storage rule:

Create a corresponding folder inside the models folder next to sp.exe/sp.py
Open the model's download page and download all files
Place the downloaded files in the created folder

Note: Do not rename files after downloading. If the download directory has files with duplicate names (e.g., xxx(1)), delete the old file first, then rename.

nvidia/parakeet-ctc-1.1b (English)

Create folder: models--nvidia--parakeet-ctc-1.1b inside the models folder
Download: https://huggingface.co/nvidia/parakeet-ctc-1.1b/tree/main
Download all files from the page into the folder

reazon-research/japanese-wav2vec2-large-rs35kh (Japanese)

Create folder: models--reazon-research--japanese-wav2vec2-large-rs35kh inside the models folder
Download: https://huggingface.co/reazon-research/japanese-wav2vec2-large-rs35kh/tree/main
Download all files from the page into the folder

kotoba-tech/kotoba-whisper-v2.0 (Japanese)

Create folder: models--kotoba-tech--kotoba-whisper-v2.0 inside the models folder
Download: https://huggingface.co/kotoba-tech/kotoba-whisper-v2.0/tree/main
Download all files from the page into the folder

zh-plus/faster-whisper-large-v2-japanese-5k-steps (Japanese)

Create folder: models--zh-plus--faster-whisper-large-v2-japanese-5k-steps inside the models folder
Download: https://huggingface.co/zh-plus/faster-whisper-large-v2-japanese-5k-steps/tree/main
Download all files from the page into the folder

JhonVanced/whisper-large-v3-japanese-4k-steps-ct2 (Japanese)

Create folder: models--JhonVanced--whisper-large-v3-japanese-4k-steps-ct2 inside the models folder
Download: https://huggingface.co/JhonVanced/whisper-large-v3-japanese-4k-steps-ct2/tree/main
Download all files from the page into the folder

jonatasgrosman/wav2vec2-large-xlsr-53-japanese (Japanese)

Create folder: models--jonatasgrosman--wav2vec2-large-xlsr-53-japanese inside the models folder
Download: https://huggingface.co/jonatasgrosman/wav2vec2-large-xlsr-53-japanese/tree/main
Download all files from the page into the folder

suzii/vi-whisper-large-v3-turbo-v1 (Vietnamese)

Create folder: models--suzii--vi-whisper-large-v3-turbo-v1 inside the models folder
Download: https://huggingface.co/suzii/vi-whisper-large-v3-turbo-v1/tree/main
Download all files from the page into the folder

biodatlab/whisper-th-medium (Thai)

Create folder: models--biodatlab--whisper-th-medium inside the models folder
Download: https://huggingface.co/biodatlab/whisper-th-medium/tree/main
Download all files from the page into the folder

biodatlab/whisper-th-large-v3 (Thai)

Create folder: models--biodatlab--whisper-th-large-v3 inside the models folder
Download: https://huggingface.co/biodatlab/whisper-th-large-v3/tree/main
Download all files from the page into the folder

Common Errors and Troubleshooting

Automatic download fails

Cause: Network cannot access HuggingFace or mirror site is unstable
Solution: Use the manual download method above, or configure a proxy and retry

Incorrect model file names

Cause: Browser auto-renames during download (e.g., xxx(1))
Solution: Delete files with the (1) suffix and place correctly named files in the model folder

Empty recognition results or errors

Cause: Incomplete model files or incorrect folder structure
Solution: Verify the model folder contains all necessary files; re-download if needed

Downloading Models for the openai-whisper Channel

These models are single .pt files — download them and place them in the models folder next to sp.py/sp.exe.

tiny Model

Model	Download Link
tiny.en.pt	https://openaipublic.azureedge.net/main/whisper/models/d3dd57d32accea0b295c96e26691aa14d8822fac7d9d27d5dc00b4ca2826dd03/tiny.en.pt
tiny.pt	https://openaipublic.azureedge.net/main/whisper/models/65147644a518d12f04e32d6f3b26facc3f8dd46e5390956a9424a650c0ce22b9/tiny.pt

base Model

Model	Download Link
base.en.pt	https://openaipublic.azureedge.net/main/whisper/models/25a8566e1d0c1e2231d1c762132cd20e0f96a85d16145c3a00adf5d1ac670ead/base.en.pt
base.pt	https://openaipublic.azureedge.net/main/whisper/models/ed3a0b6b1c0edf879ad9b11b1af5a0e6ab5db9205f891f668f8b0e6c6326e34e/base.pt

small Model

Model	Download Link
small.en.pt	https://openaipublic.azureedge.net/main/whisper/models/f953ad0fd29cacd07d5a9eda5624af0f6bcf2258be67c92b79389873d91e0872/small.en.pt
small.pt	https://openaipublic.azureedge.net/main/whisper/models/9ecf779972d90ba49c06d968637d720dd632c55bbf19d441fb42bf17a411e794/small.pt

medium Model

Model	Download Link
medium.en.pt	https://openaipublic.azureedge.net/main/whisper/models/d7440d1dc186f76616474e0ff0b3b6b879abc9d1a4926b7adfa41db2d497ab4f/medium.en.pt
medium.pt	https://openaipublic.azureedge.net/main/whisper/models/345ae4da62f9b3d59415adc60127b97c714f32e89e936602e85993674d08dcb1/medium.pt

large Model

Model	Download Link
large-v1.pt	https://openaipublic.azureedge.net/main/whisper/models/e4b87e7e0bf463eb8e6956e646f1e277e901512310def2c24bf0e11bd3c28e9a/large-v1.pt
large-v2.pt	https://openaipublic.azureedge.net/main/whisper/models/81f7c96c852ee8fc832187b0132e569d6c3065a3252ed18e56effd0b6a73e524/large-v2.pt
large-v3.pt	https://openaipublic.azureedge.net/main/whisper/models/e5b1a55b89c1367dacf97e3e19bfd829a01529dbfdeefa8caeb59b3f1b81dadb/large-v3.pt
large-v3-turbo.pt	https://openaipublic.azureedge.net/main/whisper/models/aff26ae408abcba5fbf8813c21e62b0941638c5f6eebfb145be0c9839262a19a/large-v3-turbo.pt

Downloading Models for the faster-whisper Channel

By default, models are automatically downloaded from https://huggingface.co. If download fails, use the manual method below.

Systran Standard Models

Model	Download Link
tiny	https://huggingface.co/Systran/faster-whisper-tiny/tree/main
tiny.en	https://huggingface.co/Systran/faster-whisper-tiny.en/tree/main
base	https://huggingface.co/Systran/faster-whisper-base/tree/main
base.en	https://huggingface.co/Systran/faster-whisper-base.en/tree/main
small	https://huggingface.co/Systran/faster-whisper-small/tree/main
small.en	https://huggingface.co/Systran/faster-whisper-small.en/tree/main
medium	https://huggingface.co/Systran/faster-whisper-medium/tree/main
medium.en	https://huggingface.co/Systran/faster-whisper-medium.en/tree/main
large-v1	https://huggingface.co/Systran/faster-whisper-large-v1/tree/main
large-v2	https://huggingface.co/Systran/faster-whisper-large-v2/tree/main
large-v3	https://huggingface.co/Systran/faster-whisper-large-v3/tree/main

large-v3-turbo Model

Model	Download Link
large-v3-turbo	https://huggingface.co/mobiuslabsgmbh/faster-whisper-large-v3-turbo/tree/main

distil Distilled Models

Model	Download Link
distil-small.en	https://huggingface.co/Systran/faster-distil-whisper-small.en/tree/main
distil-medium.en	https://huggingface.co/Systran/faster-distil-whisper-medium.en/tree/main
distil-large-v2	https://huggingface.co/Systran/faster-distil-whisper-large-v2/tree/main
distil-large-v3	https://huggingface.co/Systran/faster-distil-whisper-large-v3/tree/main
distil-large-v3.5	https://huggingface.co/distil-whisper/distil-large-v3.5-ct2/tree/main

Manual Download — General Steps

In the models folder next to sp.exe/sp.py, create a folder named models--{org}--{model}
Open the corresponding download link above
Download all .json, .bin, and .txt files from the page and place them in the created folder

Downloading M2M100 Translation Model

Download link: https://modelscope.cn/models/himyworld/videotrans/resolve/master/m2m100_12b_model.zip

After extracting, you will get a folder named m2m100_12b. Copy this folder into the models folder next to sp.py/sp.exe.

VITS and Piper-TTS Dubbing Channel Model Downloads

VITS-TTS Dubbing Channel

Voice count: 175 Chinese voices, 109 English voices
Supported languages: Chinese and English only — no other languages supported

Model download: https://modelscope.cn/models/himyworld/videotrans/resolve/master/vits-tts.zip

After downloading and extracting, you will see a vits folder. Copy it into the models folder next to sp.exe (or sp.py for source deployment).

Piper-TTS Dubbing Channel

Supported languages: 20 languages
Default voices: To minimize model size, only 1 Chinese voice and 10 English voices are included by default
Extension: Download additional language voice models as needed

Model download: https://modelscope.cn/models/himyworld/videotrans/resolve/master/piper-tts.zip

After downloading and extracting, you will see a piper folder. Copy it into the models folder next to sp.exe (or sp.py for source deployment).

What is the HuggingFace ASR Speech Recognition Channel? ​

Prerequisites ​

Automatic Download Information ​

Supported Models and Languages ​

English Models ​

Japanese Models ​

Vietnamese Models ​

Thai Models ​

Model Backend Types ​

Manual Download Method ​

nvidia/parakeet-ctc-1.1b (English) ​

reazon-research/japanese-wav2vec2-large-rs35kh (Japanese) ​

kotoba-tech/kotoba-whisper-v2.0 (Japanese) ​

zh-plus/faster-whisper-large-v2-japanese-5k-steps (Japanese) ​

JhonVanced/whisper-large-v3-japanese-4k-steps-ct2 (Japanese) ​

jonatasgrosman/wav2vec2-large-xlsr-53-japanese (Japanese) ​

suzii/vi-whisper-large-v3-turbo-v1 (Vietnamese) ​

biodatlab/whisper-th-medium (Thai) ​

biodatlab/whisper-th-large-v3 (Thai) ​

Common Errors and Troubleshooting ​

Automatic download fails ​

Incorrect model file names ​

Empty recognition results or errors ​

Downloading Models for the openai-whisper Channel ​

tiny Model ​

base Model ​

small Model ​

medium Model ​

large Model ​

Downloading Models for the faster-whisper Channel ​

Systran Standard Models ​

large-v3-turbo Model ​

distil Distilled Models ​

Manual Download — General Steps ​

Downloading M2M100 Translation Model ​

VITS and Piper-TTS Dubbing Channel Model Downloads ​

VITS-TTS Dubbing Channel ​

Piper-TTS Dubbing Channel ​