Qwen3-TTS on Mac Mini M4: The Ultimate Installation & Optimization Guide

The Mac Mini M4 is a powerhouse for local AI, but running Qwen3-TTS (Alibaba's latest high-quality text-to-speech model) requires a few "under-the-hood" tweaks to move from NVIDIA-centric defaults to Apple’s Metal Performance Shaders (MPS).

Follow this guide to avoid common pitfalls and get the best performance out of your M4 chip.


1. Prerequisites: System Dependencies

macOS lacks some low-level audio processing libraries required by TTS engines. Install them via Homebrew first:

brew install portaudio ffmpeg sox

Note: Skipping this will likely result in a /bin/sh: sox: command not found error during execution.


2. Environment Setup

We recommend using Python 3.12 with a clean Conda environment to keep things stable.

# Create and activate the environment
conda create -n qwen3-tts python=3.12 -y
conda activate qwen3-tts

# Install the core inference library
pip install -U qwen-tts

# Clone the repository for local modifications
git clone https://github.com/QwenLM/Qwen3-TTS.git
cd Qwen3-TTS
pip install -e .

3. The "M4 Special": Code Modifications

The default scripts are hardcoded for NVIDIA GPUs. To run on your M4, you must modify examples/test_model_12hz_base.py.

A. Fix Model Path & Acceleration (Approx. Line 50)

Find the Qwen3TTSModel.from_pretrained section and update it to use sdpa (Mac-compatible attention) and the mps device.

# --- BEFORE ---
# MODEL_PATH = "Qwen/Qwen3-TTS-12Hz-1.7B-Base/"
# tts = Qwen3TTSModel.from_pretrained(..., attn_implementation="flash_attention_2")

# --- AFTER (Modified for M4) ---
MODEL_PATH = "Qwen/Qwen3-TTS-12Hz-1.7B-Base" # Remove the trailing slash
tts = Qwen3TTSModel.from_pretrained(
    MODEL_PATH,
    torch_dtype=torch.bfloat16,   # M4 fully supports bfloat16
    attn_implementation="sdpa",    # Use SDPA instead of FlashAttention2
    device_map="mps",              # Force use of Apple GPU
)

B. Fix Synchronization Logic (Crucial!)

The M4 chip uses torch.mps, not torch.cuda. Using cuda.synchronize() will crash your script. Replace any synchronization calls with this hardware-aware block:

# Replace torch.cuda.synchronize() with:
if torch.cuda.is_available():
    torch.cuda.synchronize()
elif torch.backends.mps.is_available():
    torch.mps.synchronize() # The correct instruction for M4

4. Handling Large Downloads

The model is roughly 4GB. If you face slow speeds or connection timeouts with HuggingFace, use a mirror (if applicable) or ensure a stable connection.

To use a mirror in your terminal:

export HF_ENDPOINT=https://hf-mirror.com
python test_model_12hz_base.py

Troubleshooting: "InvalidHeaderDeserialization"

If you see a safetensors_rust.SafetensorError, it means your model download was interrupted and the file is corrupted.

  • The Fix: Go to ~/.cache/huggingface/hub, delete the Qwen folder, and run the script again to restart the download.

5. Running the Model

Once the edits are saved, navigate to the examples folder and run:

cd examples
python test_model_12hz_base.py

If everything is set up correctly, the M4 will generate a series of high-fidelity audio samples in a newly created directory.


Pro-Tips for Mac Users

  • Verification: To confirm your GPU is being used, run this in Python:
    import torch; print(torch.backends.mps.is_available()).
  • The "Reboot" Rule: Apple Silicon drivers can occasionally hang after heavy environment switching. If you get an inexplicable error, a system restart fixes 90% of driver-related issues.