pyVideoTrans Common Issues and Solutions

To help you use pyVideoTrans better, we have compiled the following common issues and their solutions.

In Menu Bar -> Help/About, there are many links, such as model download addresses, CUDA configuration, etc. Try opening them if you encounter problems.

Part 1: Installation and Startup Issues

1. After double-clicking `sp.exe`, the software fails to open or shows no response for a long time?

This is usually normal. Please don't worry.

Reason: This software is developed based on PySide6. The main interface contains many components and requires initialization upon first load, which takes some time. Depending on your computer's performance, startup time can range from 5 seconds to 2 minutes.
Solutions:
1. Wait patiently: Please wait for a while after double-clicking.
2. Check security software: Some antivirus or security software may block the program from starting. Try temporarily disabling them or adding this software to the trust/whitelist.
3. Check file path: Ensure the software's storage path contains only English letters and numbers. It should not contain Chinese characters, spaces, or special symbols. For example, D:\pyVideoTrans is a good path, while D:\program file\video tools may cause issues.
4. Update package issue: If you cannot start after overwriting with an update package, the operation was incorrect. Please re-download the complete software package, extract it, and then overwrite it with the new update package.

2. What to do if prompted that the `python310.dll` file is missing on startup?

This indicates you only downloaded the update patch and not the main program.

Solution:
1. Please go to the official website and download the complete software package.
2. Extract the complete package to a specified directory.
3. Then download the latest update patch and overwrite it into the directory of the complete package.

3. Does the software need to be installed?

This software is a portable version and does not require installation. After downloading the complete package, extract it and double-click sp.exe to run directly.

4. Why does antivirus software report a virus or block it?

Reason: This software is packaged using the PyInstaller tool and does not have commercial digital signature certification. Some security software issues risk warnings based on this, which is a common false positive.
Solutions:
1. Add to trust: Add this software to the trust zone or whitelist of your antivirus software.
2. Run from source: If you are a developer, you can also choose to deploy and run directly from the source code to completely avoid this issue.

5. Does the software support Windows 7?

No, it does not. Many core components the software relies on no longer support Windows 7.

Part 2: Core Features and Settings

6. How to improve speech recognition accuracy?

Recognition accuracy mainly depends on the model size you choose.

Model Selection: In "faster" or "openai" mode, larger models offer higher accuracy but slower processing speed and higher resource consumption.
- tiny: Smallest size, fastest speed, but lower accuracy.
- base / small / medium: Moderate effect and resource consumption, commonly used options.
- large-v3: Largest size, best effect, highest hardware requirements.
Optimization Settings: Click Menu -> Tools -> Advanced Options

Find the faster/openai Speech Recognition Adjustment section and make the following modifications:

Speech Threshold set to 0.5
Minimum Duration / ms set to 0
Maximum Speech Duration / sec set to 5
Silence Split ms set to 140
Speech Padding set to 0

7. Why is the processed video clarity/quality reduced?

Any operation involving re-encoding will inevitably lead to video quality loss. If you want to preserve the original quality as much as possible, ensure all the following conditions are met:

Original Video Format: Use the most compatible MP4 file encoded with H.264 (libx264).
Disable Slow Processing: In the function options, do not check "Video Auto Slow".
Do Not Embed Hard Subtitles: You can choose not to embed subtitles or only embed soft subtitles. Hard subtitles force re-encoding of the entire video.
Do Not Change Audio or Duration: Do not perform dubbing, or when dubbing, disable the video end extension function.
Advanced Options - Video Output Quality Control: Default number is 23. It can be lowered to 18 or lower (minimum 0). Lower values mean higher output video quality but larger file size.
Advanced Options - Output Video Compression Level: Default is fast. You can choose slow or slower for higher quality, but output time will increase.
Advanced Options - 264/265 Encoding: Default is 264. You can choose 265 for higher output video quality.

8. Why is the output video extremely large?

Modify Advanced Options - Video Output Quality Control to 25-51. Higher values result in smaller output video size but lower quality.
Advanced Options - 264/265 Encoding: Choose 265. At the same quality, 265 results in a smaller size.

9. How to configure a network proxy?

Some translation or dubbing services (e.g., Google, OpenAI, Gemini) cannot be accessed directly within some regions and require a network proxy.

Setup Method: In the "Network Proxy Address" text box on the main interface, enter your proxy service address.
Format Requirement: Usually in a format like http://127.0.0.1:10808 or socks5://127.0.0.1:10808 (the port number should be filled according to your proxy client settings).
Important Note: If you are not familiar with proxies or do not have an available proxy service, leave this field blank. Incorrect settings will cause all network functions (including domestic services) to fail.

10. How to customize subtitle font, color, and style?

Click Modify Hard Subtitles on the main interface.
Here you can modify the font, size, color, border style, etc., of hard subtitles.
Color Code Explanation: The color code format is &HBBGGRR&, which is the reverse of common RGB, in the order of Blue Green Red (BGR).
- White: &HFFFFFF&
- Black: &H000000&
- Pure Red: &H0000FF&
- Pure Green: &H00FF00&
- Pure Blue: &HFF0000&

Part 3: Common Problems and Troubleshooting

10. When batch translating videos, e.g., 30-50-100 videos, it always gets stuck?

By default, batch tasks divide each task into multiple stages and process them concurrently. Too many tasks may exhaust resources. You can check Advanced Options -> Force Serial Execution for Batch Translation to change the execution method to serial. This means the second video starts only after the first one is completely translated, and so on sequentially.

11. Why is there audio, subtitle, and video desynchronization after processing?

This is a normal phenomenon in language translation.

Reason: When expressing the same meaning in different languages, sentence length and pronunciation duration change. For example, a 2-second Chinese sentence translated into English might have a dubbing duration of 4 seconds. This change in duration causes the dubbing not to align perfectly with the original video's lip movements and timeline.

12. Always prompted about insufficient VRAM (e.g., `Unable to allocate` error)?

This error means your graphics card does not have enough memory (VRAM) to perform the current task, usually due to using large models or processing long videos.

Solutions (try in recommended order):
1. Use a smaller model: Change the recognition model from large-v3 to medium, small, or base. The large-v3 model requires at least 8GB VRAM, but in practice, other programs also consume VRAM.
2. Adjust advanced settings: In the menu bar Tools/Options -> Advanced Options, make the following modifications to sacrifice some accuracy for lower VRAM usage:
  - CUDA Data Type: Change float32 to float16 or int8.
  - beam_size: Change 5 to 1.
  - best_of: Change 5 to 1.
  - Context: Change true to false.
3. Select Batch Inference in the overall recognition dropdown. This will pre-slice the audio into small segments and then execute multiple segments simultaneously.

13. CUDA is already installed, why can't the software use GPU acceleration?

Please check the following possible reasons:

CUDA Version Incompatibility: The built-in CUDA support version for this software is 12.8. If your CUDA version is too low, it cannot be called.
Outdated Graphics Driver: Please update your NVIDIA graphics driver to the latest version.
Missing cuDNN: Ensure you have correctly installed cuDNN matching your CUDA version.
Hardware Incompatibility: GPU acceleration only supports NVIDIA graphics cards (Nvidia). AMD or Intel graphics cards cannot use CUDA.

14. An error occurs during execution, containing "ffprobe exec error" or `ffmpeg`?

This error is usually related to file paths being too long or containing special characters.

Reason: The Windows system has a maximum path length limit (usually 260 characters). If your video file name itself is very long (e.g., downloaded from YouTube) and stored in a deeply nested folder, the total path easily exceeds this limit.
Solution: Move the video file to a shallower directory (e.g., D:\videos) and rename it to a short English or numeric name.

15. The software prompts that the video "contains no audio track"?

Possible Cause 1: The video indeed has no sound. For example, videos downloaded from YouTube and some other sites have separate video and audio tracks. An error during merging might cause audio loss.
Possible Cause 2: Excessive background noise. If the video environment is very noisy (e.g., street, concert), human speech may be masked, and the model might not detect valid speech.
Possible Cause 3: Incorrect language selection. Ensure the language selected in the "Original Speech" option matches the language actually spoken in the video. For example, if the video contains English dialogue, you must select "English" for correct recognition.

16. GPU usage is very low, is this normal?

Yes, it's normal. The software workflow is: Speech Recognition -> Text Translation -> Text-to-Speech -> Video Synthesis.

Only the first stage, "Speech Recognition", heavily uses the GPU for computation. Other stages (like translation, synthesis) mainly rely on the CPU, so it's expected for the GPU to be under low load most of the time.

17. Why do recognition results and subtitles remain unchanged when processing the same video repeatedly?

Reason: To save time and computing resources, the software has caching enabled by default. If it detects that subtitles have already been generated for a video, it will use the cached result directly instead of reprocessing.
Solution: If you want to force re-recognition and translation, check the Clean Generated checkbox in the top-left corner of the main interface.

18. After processing a few videos, the hard drive space is full?

This is usually due to enabling the "Video Slow" function, which generates a large number of temporary files.

Reason: This function cuts the video into many small segments based on subtitles and processes each segment, generating cache files far exceeding the original video size.
Solutions:
1. Manual Cleanup: After processing is complete, you can manually delete all contents in the tmp folder within the software's root directory.
2. Automatic Cleanup: When the software is closed normally, the program automatically cleans up these caches.

Part 4: General Information

18. Does the software support Docker deployment?

Currently not supported.

19. Can it recognize hard subtitles in the video frame (OCR function)?

No. The principle of this software is to analyze the audio track of the video, recognize human speech, and convert it to text. It does not have Optical Character Recognition (OCR) functionality.

20. Can I add support for a new language?

No. Adding a language requires corresponding support from speech recognition channels, subtitle translation channels, and dubbing channels. Each channel corresponds to various different local models or online API interfaces. Whether they support the new language, and even if they do, different methods may require different format codes for the same language (e.g., Chinese may require zh in some channels, while others require zh-cn or chi). Adding arbitrarily can lead to various unexpected errors. Unless you can code and modify the source code yourself, you cannot add languages.

21. Is the software paid? Can it be used commercially?

Cost: This project is a free and open-source software. You can use all features for free. Please note that if you use third-party translation, TTS (Text-to-Speech), or speech transcription interfaces, those service providers may charge fees, but this is unrelated to this software.
Commercial Use: Both individuals and companies are free to use this software. However, if you wish to integrate this project's code into your own commercial product, you must comply with the GPL-v3 open-source license. Additionally, some channels' models or online APIs may have their own license requirements regarding commercial use. Please consult the platform corresponding to the channel you are using, e.g., for Edge-tts channel consult Microsoft, for ChatTTS dubbing channel consult https://github.com/2noise/ChatTTS.

22. Is there human customer support?

No. This project is a free, open-source software developed by an individual, with no profit, so a dedicated human customer support team cannot be provided. If you encounter problems, please read this FAQ carefully first. Alternatively, you can choose to scan the WeChat QR code in the lower right corner of the software to tip and leave your WeChat ID to obtain paid technical support.

23. Where to download the software and models?

Software Download Address: pyvideotrans.com/downpackage
Source Code Repository Address: github.com/jianchang512/pyvideotrans

pyVideoTrans Common Issues and Solutions ​

Part 1: Installation and Startup Issues ​

1. After double-clicking sp.exe, the software fails to open or shows no response for a long time? ​

2. What to do if prompted that the python310.dll file is missing on startup? ​

3. Does the software need to be installed? ​

4. Why does antivirus software report a virus or block it? ​

5. Does the software support Windows 7? ​

Part 2: Core Features and Settings ​

6. How to improve speech recognition accuracy? ​

7. Why is the processed video clarity/quality reduced? ​

8. Why is the output video extremely large? ​

9. How to configure a network proxy? ​

10. How to customize subtitle font, color, and style? ​

Part 3: Common Problems and Troubleshooting ​

10. When batch translating videos, e.g., 30-50-100 videos, it always gets stuck? ​

11. Why is there audio, subtitle, and video desynchronization after processing? ​

12. Always prompted about insufficient VRAM (e.g., Unable to allocate error)? ​

13. CUDA is already installed, why can't the software use GPU acceleration? ​

14. An error occurs during execution, containing "ffprobe exec error" or ffmpeg? ​

15. The software prompts that the video "contains no audio track"? ​

16. GPU usage is very low, is this normal? ​

17. Why do recognition results and subtitles remain unchanged when processing the same video repeatedly? ​

18. After processing a few videos, the hard drive space is full? ​

Part 4: General Information ​

18. Does the software support Docker deployment? ​

19. Can it recognize hard subtitles in the video frame (OCR function)? ​

20. Can I add support for a new language? ​

21. Is the software paid? Can it be used commercially? ​

22. Is there human customer support? ​

23. Where to download the software and models? ​