Skip to content

An ideal translated video should have the following characteristics: accurate subtitles of appropriate length, voiceover timbre matching the original, and perfect synchronization between subtitles, audio, and visuals.

This guide details the four steps of video translation and provides optimal configuration recommendations for each step.

Step 1: Speech Recognition

  • Goal: Convert the speech in the video into a subtitle file in the corresponding language.

  • Corresponding Control Element: The "Speech Recognition" row image.png

  • Best Configuration for Non-Chinese:

    • Free: faster-whisper(local) open-whisper(local), select model large-v3
    • Paid: OpenAI API interface
  • Best Configuration for Chinese:

    • Free: Alibaba FunASR
    • Paid: Alibaba Bailian ASR, Doubao Speech Recognition Large Model
  • Best for Japanese:

    • Free: Huggingface_ASR -> kotoba-tech/kotoba-whisper-v2.0 or reazon-research/japanese-wav2vec2-large-rs35kh
  • Best Configuration for Other Languages:

    • Paid: Gemini Large Model Recognition
    • Paid: openai-api
  • Note: If you do not have an Nvidia GPU or have not configured the CUDA environment to enable CUDA acceleration, processing with local models will be extremely slow. It may crash if the VRAM is insufficient.

Step 2: Subtitle Translation

  • Goal: Translate the subtitle file generated in Step 1 into the target language.

  • Corresponding Control Element: The "Translation Channel" row image.png

  • Best Configuration:

    • Preferred AI Channels (Paid): DeepSeek / Gemini / OpenAI ChatGPT / Alibaba Bailian

Step 3: Dubbing

  • Goal: Generate voiceover based on the translated subtitle file.

  • Corresponding Control Element: The "Dubbing Channel" row image.png

  • Best Configuration:

    • Free: Edge-TTS: Free and supports all languages.
    • Free (Chinese, English, Japanese, Korean): F5-TTS/Index-TTS/GPT-SOVITS/CosyVoice(local)
    • Paid: Doubao TTS / Qwen-TTS / 302.AI / Minimaxi / OpenAI-TTS

    Additional installation of corresponding F5-TTS/CosyVoice/clone-voice/GPT-SOVITS integration packages is required. See documentation: https://pyvideotrans.com/f5tts.html

Step 4: Synchronization of Subtitles, Dubbing, and Video

  • Goal: Synchronize subtitles, dubbing, and video.
  • Corresponding Control Element: The Sync & Align row image.png
  • Best Configuration:
    • Enable Secondary Recognition. This will perform speech recognition on the final voiceover file to generate subtitles with a more precise timeline.
    • When translating Chinese to English, you can set the Dubbing Speed value (e.g., 10 or 15) to speed up the voiceover, as English sentences are often longer.
    • Enable both the Speed Up Dubbing and Slow Down Video options to force alignment of subtitles, audio, and video.

Output Video Quality Control

  • The default output uses lossy compression. If you need lossless output, go to Menu -> Tools -> Advanced Options -> Video Output Control section and set Video Transcoding Loss Control to 0: image.png
  • Note: If the original video is not in mp4 format or uses embedded hardcoded subtitles, video encoding conversion will cause some loss, though it's usually minimal. Increasing video quality will significantly reduce processing speed and increase the output video file size.