November 5, 2025
High-quality text-to-speech model based on Continue-1-OSS, featuring 8 unique voices, emotional expression, and real-time generation—Open Source.

Today, SVECTOR announces Continue-TTS, a fine-tuned text-to-speech model based on the Continue-1-OSS architecture. This model is specifically trained for high-quality speech synthesis and delivers exceptional voice generation capabilities.
Continue-TTS provides natural speech with human-like intonation, emotion, and rhythm. With 8 unique voices, real-time generation capabilities (~200ms latency), and built-in emotional expression support, this model rivals commercial solutions while remaining fully open source.
Released under Apache 2.0 license, Continue-TTS is freely available for both research and commercial applications, bringing professional-grade speech synthesis to everyone.
Continue-TTS combines the Continue-1-OSS language model with advanced neural audio codecs to generate exceptionally natural speech from text. The model features 8 professionally designed voices with distinct personalities and characteristics.
Continue-TTS includes 8 professionally designed voices with unique personalities:
Conversational and natural, perfect for general use
Warm and friendly, excellent for storytelling
Energetic and bright, great for upbeat content
Deep and authoritative, ideal for narration
Friendly and casual, perfect for conversational content
Soft and gentle, excellent for calm narration
Dynamic and expressive, great for engaging content
Warm and engaging, perfect for emotional expression
Continue-TTS generates human-like speech with natural intonation, emotion, and rhythm. The model produces high-quality audio that rivals commercial text-to-speech solutions while remaining fully open source and accessible.
Each voice is carefully trained to maintain consistent personality and characteristics across different types of content, from conversational dialogue to professional narration.
Built-in support for natural emotions adds depth and authenticity to generated speech. Continue-TTS understands emotion tags and seamlessly integrates them into speech output:
<laugh> - Natural laughter<chuckle> - Light laugh<sigh> - Expressive sigh<gasp> - Surprised gasp<cough>, <yawn>, <groan>, <sniffle> - Additional natural soundsWith streaming support and low latency (~200ms on GPU), Continue-TTS enables interactive applications like voice assistants, real-time narration, and conversational AI. The model generates audio chunks progressively, allowing immediate playback without waiting for complete generation.
Simple Python package for quick integration into applications:
pip install continue-speech
from continue_tts import Continue1Model
model = Continue1Model(
model_name="SVECTOR-CORPORATION/Continue-TTS"
)
audio = model.generate_speech(
prompt="Hello from Continue-TTS!",
voice="nova"
)Continue-TTS excels across diverse applications requiring natural speech synthesis:
Natural storytelling with emotional expression, perfect for creating engaging audiobooks with professional narration quality and character voices.
Conversational AI with personality for voice-enabled applications, customer service bots, and interactive digital assistants.
Text-to-speech for visually impaired users, screen readers, and accessibility tools requiring natural, easy-to-understand speech.
Voiceovers for videos, podcasts, presentations, and multimedia content with multiple voice options for diverse characters.
Dynamic character voices and dialogue generation for interactive gaming experiences with emotional and contextual speech.
Interactive learning materials with voice, language learning applications, and automated tutoring systems with clear pronunciation.
Natural-sounding automated responses for phone systems, chatbots, and support applications requiring professional voice interaction.
Clear pronunciation models for language education, vocabulary training, and conversational practice with natural intonation.
Continue-TTS combines the Continue-1-OSS language model with the SNAC multi-scale neural audio codec. The model generates audio tokens autoregressively, which are then decoded into waveforms using the neural codec for high-quality 24kHz audio output.
Continue-TTS was fine-tuned from Continue-1-OSS using:
The model uses 7 audio tokens per frame in a hierarchical encoding structure, with a total vocabulary of 156,940 tokens (including 28,672 audio-specific tokens). This architecture enables efficient, high-quality speech generation with natural prosody and emotional expressiveness.
SVECTOR is committed to responsible AI development. Continue-TTS users should follow ethical guidelines:
Always disclose when audio is AI-generated. Users should clearly indicate that content is synthesized and not recorded from real human voices to maintain transparency and trust.
Do not clone voices without explicit permission from the individual. Voice impersonation without consent is unethical and may be illegal in many jurisdictions.
Implement safeguards against deepfakes, misinformation, and deceptive content. Verify important audio content and use authentication mechanisms where appropriate.
Avoid generating harmful, deceptive, or illegal content. Users are responsible for ensuring their applications comply with applicable laws and regulations.
As with any text-to-speech model, Continue-TTS has certain limitations: