Text to Speech

The text can come from documents, a website, text messages or an embedded software system (where text cannot be displayed on a screen). The Language Technologies Portal’s Welsh text-to-speech resources are based on open source systems.


Orpheus-TTS
Canopy AI’s Orpheus-TTS is one of the most advanced open-source text-to-speech systems, originally built on top of Llama-3b. It produces natural-sounding voices with lifelike emotion, nuanced intonation, and rhythm that rival (and often surpass) closed-source services. It can generate new voices without lengthy training and supports expressive control through simple emotion tags making it ideal for real-time applications. Our fine tuned version of Orpheus-TTS offers:
- Authentic Welsh voices: Trained on high-quality Welsh data, including natural speech and native intonation patterns.
- Accuracy and consistency: Carefully tuned to handle the unique sound structure and rhythm of Welsh speech.
- Welsh-specific emotion tags: Add cues like <chwerthin> (laughter) or <anadlu> (breathing) for expressive, culturally faithful delivery.

Examples of Orpheus in use


Piper
We have fine-tuned Piper for use with the Welsh language. For the first time, developers, educators, and creators can access natural, authentic Welsh voices that run in real-time — even on small devices like Raspberry Pi.
What is Piper?
Piper is a modern, open-source TTS engine designed for speed and quality. It turns text into natural-sounding speech with minimal latency and can be run on a wide range of hardware, from laptops to embedded systems. It’s simple to integrate, lightweight, and capable of generating speech in real-time — making it perfect for assistive technology, smart devices, and interactive apps.
Examples of Piper in use
TTS Voices via REST API (Cloud-Based)
Workflow:
- Client Application (mobile app, web app, server) sends a request to a TTS REST API.
HTTP POST request with:- text → the input text to convert.
- voice → the chosen voice model (e.g., “piper/benyw-de”, “orpheus/gwryw-gogledd”).
- Cloud TTS Engine processes the request.
- The provider uses a neural voice model trained on large datasets.
- Generates audio on the fly.
- Response is sent back with an audio file/stream.
- Client can play, download, or store it.
Advantages:
- Access to high-quality neural voices.
- Variety of languages and accents.
- No need for local computation (lightweight client).
Disadvantages:
- Requires internet connectivity.
- Latency depends on network speed.
Click here for more information
TTS Voices Offline (On-Device)
Offline TTS means the model runs locally without calling a remote API.
Approaches:
- Embedded TTS Engines (traditional, smaller footprint):
- e.g., espeak, Festival, MaryTTS, Piper.
- Lightweight, but can be robotic-sounding.
- Neural TTS Models on Device:
- Deploying pre-trained deep learning TTS models locally.
- Examples: Coqui TTS, Orpheus.
- Can run on CPU/GPU, depending on device power.
Advantages:
- Works without connectivity.
- Zero recurring API cost.
- No privacy concerns (data never leaves the device).
Disadvantages:
- Voice quality may be lower unless you bundle large neural models.
- Storage footprint can be significant (hundreds of MB).
- Requires more CPU/GPU resources.
We have many tools which use this approach, click on one of the following links for further information:
techiaith/llais_festival
techiaith/Festival_MSAPI
techiaith/docker-marytts
techiaith/piper-cy
techiaith/piper-ios-app