Recommended Best Practices for Video Translation | pyVideoTrans Official - Open Source Free Video Translation & Dubbing Software pyvideotrans.com pyvideotrans github github.com/jianchang512/pyvideotrans

An ideal translated video should have the following characteristics: accurate subtitles of appropriate length, voiceover timbre matching the original, and perfect synchronization between subtitles, audio, and visuals.

This guide details the four steps of video translation and provides optimal configuration recommendations for each step.

Step 1: Speech Recognition

Goal: Convert the speech in the video into a subtitle file in the corresponding language.
Corresponding Control Element: The "Speech Recognition" row
Best Configuration for Non-Chinese:
- Free: faster-whisper(local) open-whisper(local), select model large-v3
- Paid: OpenAI API interface
Best Configuration for Chinese:
- Free: Alibaba FunASR
- Paid: Alibaba Bailian ASR, Doubao Speech Recognition Large Model
Best for Japanese:
- Free: Huggingface_ASR -> kotoba-tech/kotoba-whisper-v2.0 or reazon-research/japanese-wav2vec2-large-rs35kh
Best Configuration for Other Languages:
- Paid: Gemini Large Model Recognition
- Paid: openai-api
Note: If you do not have an Nvidia GPU or have not configured the CUDA environment to enable CUDA acceleration, processing with local models will be extremely slow. It may crash if the VRAM is insufficient.

Step 2: Subtitle Translation

Goal: Translate the subtitle file generated in Step 1 into the target language.
Corresponding Control Element: The "Translation Channel" row
Best Configuration:
- Preferred AI Channels (Paid): DeepSeek / Gemini / OpenAI ChatGPT / Alibaba Bailian

Step 3: Dubbing

Goal: Generate voiceover based on the translated subtitle file.
Corresponding Control Element: The "Dubbing Channel" row
Best Configuration:
- Free: Edge-TTS: Free and supports all languages.
- Free (Chinese, English, Japanese, Korean): F5-TTS/Index-TTS/GPT-SOVITS/CosyVoice(local)
- Paid: Doubao TTS / Qwen-TTS / 302.AI / Minimaxi / OpenAI-TTS
Additional installation of corresponding F5-TTS/CosyVoice/clone-voice/GPT-SOVITS integration packages is required. See documentation: https://pyvideotrans.com/f5tts.html

Step 4: Synchronization of Subtitles, Dubbing, and Video

Goal: Synchronize subtitles, dubbing, and video.
Corresponding Control Element: The Sync & Align row
Best Configuration:
- Enable Secondary Recognition. This will perform speech recognition on the final voiceover file to generate subtitles with a more precise timeline.
- When translating Chinese to English, you can set the Dubbing Speed value (e.g., 10 or 15) to speed up the voiceover, as English sentences are often longer.
- Enable both the Speed Up Dubbing and Slow Down Video options to force alignment of subtitles, audio, and video.

Output Video Quality Control

The default output uses lossy compression. If you need lossless output, go to Menu -> Tools -> Advanced Options -> Video Output Control section and set Video Transcoding Loss Control to 0:
Note: If the original video is not in mp4 format or uses embedded hardcoded subtitles, video encoding conversion will cause some loss, though it's usually minimal. Increasing video quality will significantly reduce processing speed and increase the output video file size.

Step 1: Speech Recognition ​

Step 2: Subtitle Translation ​

Step 3: Dubbing ​

Step 4: Synchronization of Subtitles, Dubbing, and Video ​

Output Video Quality Control ​

Step 1: Speech Recognition

Step 2: Subtitle Translation

Step 3: Dubbing

Step 4: Synchronization of Subtitles, Dubbing, and Video

Output Video Quality Control