Text to Speech

Text-to-speech allows a computer system to speak any text using either a synthetic or natural voice.

The text can come from documents, a website, text messages or an embedded software system (where text cannot be displayed on a screen). The Language Technologies Portal’s Welsh text-to-speech resources are based on open source systems.

Orpheus-TTS

Canopy AI’s Orpheus-TTS is one of the most advanced open-source text-to-speech systems, originally built on top of Llama-3b. It produces natural-sounding voices with lifelike emotion, nuanced intonation, and rhythm that rival (and often surpass) closed-source services. It can generate new voices without lengthy training and supports expressive control through simple emotion tags making it ideal for real-time applications. Our fine tuned version of Orpheus-TTS offers:

Authentic Welsh voices: Trained on high-quality Welsh data, including natural speech and native intonation patterns.
Accuracy and consistency: Carefully tuned to handle the unique sound structure and rhythm of Welsh speech.
Welsh-specific emotion tags: Add cues like <chwerthin> (laughter) or <anadlu> (breathing) for expressive, culturally faithful delivery.

Examples of Orpheus in use

Piper

We have fine-tuned Piper for use with the Welsh language. For the first time, developers, educators, and creators can access natural, authentic Welsh voices that run in real-time — even on small devices like Raspberry Pi.

What is Piper?

Piper is a modern, open-source TTS engine designed for speed and quality. It turns text into natural-sounding speech with minimal latency and can be run on a wide range of hardware, from laptops to embedded systems. It’s simple to integrate, lightweight, and capable of generating speech in real-time — making it perfect for assistive technology, smart devices, and interactive apps.

Examples of Piper in use

TTS Voices via REST API (Cloud-Based)

Workflow:

Client Application (mobile app, web app, server) sends a request to a TTS REST API.
HTTP POST request with:
- text → the input text to convert.
- voice → the chosen voice model (e.g., “piper/benyw-de”, “orpheus/gwryw-gogledd”).
Cloud TTS Engine processes the request.
- The provider uses a neural voice model trained on large datasets.
- Generates audio on the fly.
Response is sent back with an audio file/stream.
- Client can play, download, or store it.

Advantages:

Access to high-quality neural voices.
Variety of languages and accents.
No need for local computation (lightweight client).

Disadvantages:

Requires internet connectivity.
Latency depends on network speed.

Click here for more information

TTS Voices Offline (On-Device)

Offline TTS means the model runs locally without calling a remote API.

Approaches:

Embedded TTS Engines (traditional, smaller footprint):
- e.g., espeak, Festival, MaryTTS, Piper.
- Lightweight, but can be robotic-sounding.
Neural TTS Models on Device:
- Deploying pre-trained deep learning TTS models locally.
- Examples: Coqui TTS, Orpheus.
- Can run on CPU/GPU, depending on device power.

Advantages:

Works without connectivity.
Zero recurring API cost.
No privacy concerns (data never leaves the device).

Disadvantages:

Voice quality may be lower unless you bundle large neural models.
Storage footprint can be significant (hundreds of MB).
Requires more CPU/GPU resources.

We have many tools which use this approach, click on one of the following links for further information:

techiaith/llais_festival

techiaith/Festival_MSAPI

techiaith/docker-marytts

techiaith/piper-cy

techiaith/piper-ios-app

Welsh National Language Technologies Portal

Text to Speech

Orpheus-TTS

Examples of Orpheus in use

Piper

What is Piper?

Examples of Piper in use

TTS Voices via REST API (Cloud-Based)

Workflow:

Advantages:

Disadvantages:

TTS Voices Offline (On-Device)

Approaches:

Advantages:

Disadvantages:

Links

Follow Us

Canolfan Bedwyr