Skip to content

Chatterbox TTS API Service

This is a high-performance Text-to-Speech (TTS) service based on Chatterbox-TTS. It provides an OpenAI TTS-compatible API interface, an enhanced interface supporting voice cloning, and a clean Web user interface.

This project aims to provide developers and content creators with a privately deployable, powerful, and easy-to-integrate TTS solution.

Project Repository: https://github.com/jianchang512/chatterbox-api


Usage in pyVideoTrans

This project can serve as a powerful TTS backend to provide high-quality English dubbing for pyVideoTrans.

  1. Start This Project: Ensure the Chatterbox TTS API service is running locally (http://127.0.0.1:5093).

  2. Update pyVideoTrans: Make sure your pyVideoTrans version is upgraded to v3.73 or higher.

  3. Configure pyVideoTrans:

    • In the pyVideoTrans menu, go to TTS Settings -> Chatterbox TTS.
    • API Address: Enter the address of this service, default is http://127.0.0.1:5093.
    • Reference Audio (Optional): If you want to use voice cloning, enter the filename of the reference audio here (e.g., my_voice.wav). Ensure this audio file is placed in the chatterbox folder within the pyVideoTrans root directory.
    • Adjust Parameters: Tune cfg_weight and exaggeration as needed for optimal results.

    Parameter Tuning Suggestions:

    • General Scenarios (TTS, Voice Assistant): Default settings (cfg_weight=0.5, exaggeration=0.5) work well for most cases.
    • Fast-Paced Reference Audio: If the reference audio has a fast speaking rate, try lowering cfg_weight to around 0.3 to improve the rhythm of the generated speech.
    • Expressive/Dramatic Speech: Try a lower cfg_weight (e.g., 0.3) and a higher exaggeration (e.g., 0.7 or higher). Increasing exaggeration often speeds up the speech, while lowering cfg_weight helps balance it for a more deliberate and clearer pace.

Quick Start Method 1: For Windows Users

We provide a portable package win.7z containing all dependencies for Windows users, greatly simplifying the installation process.

  1. Download and Extract:

    Baidu Netdisk Download Link 【Built-in model ~4GB (CPU version, GPU method below)】https://pan.baidu.com/s/1zXzRAQ0P7X8LJp4OrCvw7w?pwd=1234

  2. Start the Service:

    Double-click the 启动服务.bat script in the root directory.

    When you see information similar to the following in the command window, the service has started successfully:

    ✅ Model loaded successfully.
    Service started successfully, HTTP address: http://127.0.0.1:5093

Method 2: For macOS, Linux, and Manual Installation Users

For macOS, Linux users, or Windows users who prefer a manual setup, please follow these steps.

1. Prerequisites

  • Python: Ensure Python 3.9 or higher is installed.
  • ffmpeg: This is a required audio/video processing tool.
    • macOS (using Homebrew): brew install ffmpeg
    • Debian/Ubuntu: sudo apt-get update && sudo apt-get install ffmpeg
    • Windows (Manual): Download ffmpeg and add it to your system's PATH environment variable.

2. Installation Steps

bash
# 1. Clone the repository
git clone https://github.com/jianchang512/chatterbox-api.git
cd chatterbox-api

# 2. Create and activate a Python virtual environment (recommended)
python3 -m venv venv
# on Windows:
# venv\Scripts\activate
# on macOS/Linux:
source venv/bin/activate

# 3. Install dependencies
pip install -r requirements.txt

# 4. Start the service
python app.py

Once the service starts successfully, you will see the service address http://127.0.0.1:5093 in the terminal.


⚡ Upgrade to GPU Version (Optional)

If your computer has an NVIDIA GPU with CUDA support and you have correctly installed the NVIDIA driver and CUDA Toolkit, you can upgrade to the GPU version for significant performance gains.

Windows Users (One-Click Upgrade)

  1. First, ensure you have successfully run 启动服务.bat at least once to complete the basic environment setup.
  2. Double-click the 安装N卡GPU支持.bat script.
  3. The script will automatically uninstall the CPU version of PyTorch and install the GPU version compatible with CUDA 12.6.

Linux Manual Upgrade

After activating the virtual environment, execute the following commands:

bash
# 1. Uninstall the existing CPU version of PyTorch
pip uninstall -y torch torchaudio

# 2. Install PyTorch matching your CUDA version
# The following command is for CUDA 12.6. Get the correct command for your CUDA version from the PyTorch website.
pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu126

You can visit the PyTorch website to get the installation command suitable for your system.

After upgrading, restart the service. You should see Using device: cuda in the startup logs.


📖 User Guide

1. Web Interface

After starting the service, open http://127.0.0.1:5093 in your browser to access the Web UI.

  • Input Text: Enter the text you want to convert in the text box.
  • Adjust Parameters:
    • cfg_weight: (Range 0.0 - 1.0) Controls speech rhythm. Lower values result in slower, more deliberate speech. For fast-paced reference audio, you can lower this value (e.g., 0.3).
    • exaggeration: (Range 0.25 - 2.0) Controls the emotional and intonational exaggeration of the speech. Higher values produce more expressive speech and may increase speed.
  • Voice Cloning: Click "Choose File" to upload a reference audio file (e.g., .mp3, .wav). If a reference audio is provided, the service will use the cloning interface.
  • Generate Speech: Click the "Generate Speech" button, wait a moment, and you can preview and download the generated MP3 file online.

2. API Calls

Interface 1: OpenAI Compatible Interface (/v1/audio/speech)

This interface does not require reference audio and can be called directly using the OpenAI SDK.

Python Example (openai SDK):

python
from openai import OpenAI
import os

# Point the client to our local service
client = OpenAI(
    base_url="http://127.0.0.1:5093/v1",
    api_key="not-needed"  # API key is not strictly required but needed by the SDK
)

response = client.audio.speech.create(
    model="chatterbox-tts",   # This parameter is ignored
    voice="en",              # 
    speed=0.5,               # Corresponds to the cfg_weight parameter
    input="Hello, this is a test from the OpenAI compatible API.",
    instructions="0.5"     # (Optional) Corresponds to the exaggeration parameter, note it must be a string
    response_format="mp3"    # Optional 'mp3' or 'wav'
)

# Save the audio stream to a file
response.stream_to_file("output_api1.mp3")
print("Audio saved to output_api1.mp3")

Interface 2: Voice Cloning Interface (/v2/audio/speech_with_prompt)

This interface requires uploading both text and a reference audio file in multipart/form-data format.

Python Example (requests library):

python
import requests

API_URL = "http://127.0.0.1:5093/v2/audio/speech_with_prompt"
REFERENCE_AUDIO = "path/to/your/reference.mp3"  # Replace with your reference audio path

form_data = {
    'input': 'This voice should sound like the reference audio.',
    'cfg_weight': '0.5',
    'exaggeration': '0.5',
    'response_format': 'mp3'  # Optional 'mp3' or 'wav'
}

with open(REFERENCE_AUDIO, 'rb') as audio_file:
    files = {'audio_prompt': audio_file}
    response = requests.post(API_URL, data=form_data, files=files)

if response.ok:
    with open("output_api2.mp3", "wb") as f:
        f.write(response.content)
    print("Cloned audio saved to output_api2.mp3")
else:
    print("Request failed:", response.text)