Qwen3-TTS on Mac Mini M4: The Ultimate Installation & Optimization Guide
目录
The Mac Mini M4 is a powerhouse for local AI, but running Qwen3-TTS (Alibaba's latest high-quality text-to-speech model) requires a few "under-the-hood" tweaks to move from NVIDIA-centric defaults to Apple’s Metal Performance Shaders (MPS).
Follow this guide to avoid common pitfalls and get the best performance out of your M4 chip.
1. Prerequisites: System Dependencies
macOS lacks some low-level audio processing libraries required by TTS engines. Install them via Homebrew first:
brew install portaudio ffmpeg sox
Note: Skipping this will likely result in a
/bin/sh: sox: command not founderror during execution.
2. Environment Setup
We recommend using Python 3.12 with a clean Conda environment to keep things stable.
# Create and activate the environment
conda create -n qwen3-tts python=3.12 -y
conda activate qwen3-tts
# Install the core inference library
pip install -U qwen-tts
# Clone the repository for local modifications
git clone https://github.com/QwenLM/Qwen3-TTS.git
cd Qwen3-TTS
pip install -e .
3. The "M4 Special": Code Modifications
The default scripts are hardcoded for NVIDIA GPUs. To run on your M4, you must modify examples/test_model_12hz_base.py.
A. Fix Model Path & Acceleration (Approx. Line 50)
Find the Qwen3TTSModel.from_pretrained section and update it to use sdpa (Mac-compatible attention) and the mps device.
# --- BEFORE ---
# MODEL_PATH = "Qwen/Qwen3-TTS-12Hz-1.7B-Base/"
# tts = Qwen3TTSModel.from_pretrained(..., attn_implementation="flash_attention_2")
# --- AFTER (Modified for M4) ---
MODEL_PATH = "Qwen/Qwen3-TTS-12Hz-1.7B-Base" # Remove the trailing slash
tts = Qwen3TTSModel.from_pretrained(
MODEL_PATH,
torch_dtype=torch.bfloat16, # M4 fully supports bfloat16
attn_implementation="sdpa", # Use SDPA instead of FlashAttention2
device_map="mps", # Force use of Apple GPU
)
B. Fix Synchronization Logic (Crucial!)
The M4 chip uses torch.mps, not torch.cuda. Using cuda.synchronize() will crash your script. Replace any synchronization calls with this hardware-aware block:
# Replace torch.cuda.synchronize() with:
if torch.cuda.is_available():
torch.cuda.synchronize()
elif torch.backends.mps.is_available():
torch.mps.synchronize() # The correct instruction for M4
4. Handling Large Downloads
The model is roughly 4GB. If you face slow speeds or connection timeouts with HuggingFace, use a mirror (if applicable) or ensure a stable connection.
To use a mirror in your terminal:
export HF_ENDPOINT=https://hf-mirror.com
python test_model_12hz_base.py
Troubleshooting: "InvalidHeaderDeserialization"
If you see a safetensors_rust.SafetensorError, it means your model download was interrupted and the file is corrupted.
- The Fix: Go to
~/.cache/huggingface/hub, delete theQwenfolder, and run the script again to restart the download.
5. Running the Model
Once the edits are saved, navigate to the examples folder and run:
cd examples
python test_model_12hz_base.py
If everything is set up correctly, the M4 will generate a series of high-fidelity audio samples in a newly created directory.
Pro-Tips for Mac Users
- Verification: To confirm your GPU is being used, run this in Python:
import torch; print(torch.backends.mps.is_available()). - The "Reboot" Rule: Apple Silicon drivers can occasionally hang after heavy environment switching. If you get an inexplicable error, a system restart fixes 90% of driver-related issues.
