Jump to related tools in the same category or review the original source on GitHub.

Speech & Transcription @paki81 Updated 2/26/2026

Qwen Tts OpenClaw Skill - ClawHub

Do you want your AI agent to automate Qwen Tts workflows? This free skill from ClawHub helps with speech & transcription tasks without building custom tools from scratch.

What this skill does

Local text-to-speech using Qwen3-TTS-12Hz-1.7B-CustomVoice. Use when generating audio from text, creating voice messages, or when TTS is requested. Supports 10 languages including Italian, 9 premium speaker voices, and instruction-based voice control (emotion, tone, style). Alternative to cloud-based TTS services like ElevenLabs. Runs entirely offline after initial model download.

Install

npx clawhub@latest install qwen-tts

Full SKILL.md

Open original
namedescription
qwen-ttsLocal text-to-speech using Qwen3-TTS-12Hz-1.7B-CustomVoice. Use when generating audio from text, creating voice messages, or when TTS is requested. Supports 10 languages including Italian, 9 premium speaker voices, and instruction-based voice control (emotion, tone, style). Alternative to cloud-based TTS services like ElevenLabs. Runs entirely offline after initial model download.

Qwen TTS

Local text-to-speech using Hugging Face's Qwen3-TTS-12Hz-1.7B-CustomVoice model.

Quick Start

Generate speech from text:

scripts/tts.py "Ciao, come va?" -l Italian -o output.wav

With voice instruction (emotion/style):

scripts/tts.py "Sono felice!" -i "Parla con entusiasmo" -l Italian -o happy.wav

Different speaker:

scripts/tts.py "Hello world" -s Ryan -l English -o hello.wav

Installation

First-time setup (one-time):

cd skills/public/qwen-tts
bash scripts/setup.sh

This creates a local virtual environment and installs qwen-tts package (~500MB).

Note: First synthesis downloads ~1.7GB model from Hugging Face automatically.

Usage

scripts/tts.py [options] "Text to speak"

Options

  • -o, --output PATH - Output file path (default: qwen_output.wav)
  • -s, --speaker NAME - Speaker voice (default: Vivian)
  • -l, --language LANG - Language (default: Auto)
  • -i, --instruct TEXT - Voice instruction (emotion, style, tone)
  • --list-speakers - Show available speakers
  • --model NAME - Model name (default: CustomVoice 1.7B)

Examples

Basic Italian speech:

scripts/tts.py "Benvenuto nel futuro del text-to-speech" -l Italian -o welcome.wav

With emotion/instruction:

scripts/tts.py "Sono molto felice di vederti!" -i "Parla con entusiasmo e gioia" -l Italian -o happy.wav

Different speaker:

scripts/tts.py "Hello, nice to meet you" -s Ryan -l English -o ryan.wav

List available speakers:

scripts/tts.py --list-speakers

Available Speakers

The CustomVoice model includes 9 premium voices:

Speaker Language Description
Vivian Chinese Bright, slightly edgy young female
Serena Chinese Warm, gentle young female
Uncle_Fu Chinese Seasoned male, low mellow timbre
Dylan Chinese (Beijing) Youthful Beijing male, clear
Eric Chinese (Sichuan) Lively Chengdu male, husky
Ryan English Dynamic male, rhythmic
Aiden English Sunny American male
Ono_Anna Japanese Playful female, light nimble
Sohee Korean Warm female, rich emotion

Recommendation: Use each speaker's native language for best quality, though all speakers support all 10 languages (Chinese, English, Japanese, Korean, German, French, Russian, Portuguese, Spanish, Italian).

Voice Instructions

Use -i, --instruct to control emotion, tone, and style:

Italian examples:

  • "Parla con entusiasmo"
  • "Tono serio e professionale"
  • "Voce calma e rilassante"
  • "Leggi come un narratore"

English examples:

  • "Speak with excitement"
  • "Very happy and energetic"
  • "Calm and soothing voice"
  • "Read like a narrator"

Integration with OpenClaw

The script outputs the audio file path to stdout (last line), making it compatible with OpenClaw's TTS workflow:

# OpenClaw captures the output path
cd skills/public/qwen-tts
OUTPUT=$(scripts/tts.py "Ciao" -s Vivian -l Italian -o /tmp/audio.wav 2>/dev/null)
# OUTPUT = /tmp/audio.wav

Performance

  • GPU (CUDA): ~1-3 seconds for short phrases
  • CPU: ~10-30 seconds for short phrases
  • Model size: ~1.7GB (auto-downloads on first run)
  • Venv size: ~500MB (installed dependencies)

Troubleshooting

Setup fails:

# Ensure Python 3.10-3.12 is available
python3.12 --version

# Re-run setup
cd skills/public/qwen-tts
rm -rf venv
bash scripts/setup.sh

Model download slow/fails:

# Use mirror (China mainland)
export HF_ENDPOINT=https://hf-mirror.com
scripts/tts.py "Test" -o test.wav

Out of memory (GPU): The model automatically falls back to CPU if GPU memory insufficient.

Audio quality issues:

  • Try different speaker: --list-speakers
  • Add instruction: -i "Speak clearly and slowly"
  • Check language matches text: -l Italian for Italian text

Model Details

Original URL: https://github.com/openclaw/skills/blob/main/skills/paki81/qwen-tts

Related skills

If this matches your use case, these are close alternatives in the same category.