Skip to content

FunASR Chinese Speech Recognition

FunASR is a set of open-source speech recognition models from Alibaba. It performs better than the Whisper series for Chinese speech scenarios. The video translation software already supports its use via HTTP calls through the zh_recogn and SenseVoice projects. You only need to deploy the corresponding zh_recogn and SenseVoice integrated packages. After starting them, you can use it by filling in the API address in the video translation software.

However, many users still find this process confusing. Therefore, starting from version v2.97, this feature has been integrated directly into the video translation software. This means you no longer need to deploy and start the zh_recogn and SenseVoice projects separately. You can simply select FunASR Chinese Recognition in the software to use it.

image.png

Select FunASR Chinese in Speech Recognition

After selecting FunASR Chinese Recognition in the speech recognition settings, you can choose to use either the paraformer-zh model or the SenseVoiceSmall model. It is recommended to choose the former, as it offers better performance and speed than the latter.

image.png

First-Time Use: FunASR Chinese Recognition Downloads Models Online

To avoid making the software package too large, the FunASR models are not included within the software package. The first time you use it, the models will be automatically downloaded from modelscope.cn. After downloading, they are saved in the hub folder within the models directory under the software's main folder. Depending on your network conditions, this download may take anywhere from a few minutes to over ten or even several tens of minutes. As long as there are no red error messages, please wait patiently for the download to complete.

image.png