July 14, 2025

Introducing Spec-3.5

Experience the research preview of Spec-3.5—our most capable model yet. Now accessible to Plus, Pro, and global developers.

Try on Spec Chat ➚

We’re releasing a research preview of Spec‑3.5—our most advanced and capable model for chat and multimodal tasks to date. Spec‑3.5 represents a significant leap in scaling both post-training and reinforcement learning, enabling deeper reasoning, broader knowledge, and more creative problem-solving.

Through extensive testing, Spec‑3.5 demonstrates more natural interactions, improved contextual understanding, and enhanced ability to follow user intent. Its expanded knowledge base and advanced reasoning capabilities make it highly effective for writing, programming, and tackling complex real-world challenges, with reduced hallucination rates.

By sharing Spec‑3.5 as a research preview, we aim to better understand its strengths and limitations, and invite the community to explore its capabilities across diverse applications. We look forward to seeing innovative uses and gathering feedback to guide future improvements.

Performance Benchmarks

The reported evaluation scores for Spec-3.5 are approximations conducted in a controlled environment by the SVECTOR team, without involvement from external entities.

Benchmarks

HLE Comparison

Performance Metrics Comparison

Training Compute Allocation

Model Specifications

Architecture: Hybrid Transformer-Mixture of Experts (MoE)
Active Parameters: 25 Billion
Context Length: 256,000 Tokens
Modalities: Text, Images, Audio, Structured Data
R3S Variant: 4-bit AWQ Quantization, 16-layer Fixed Depth
T4Z Variant: Recursive Reasoning, Symbolic Engine (SymPy/Z3)
Training Compute: 1.2 × 10⁴ FLOP

Architecture Overview

Spec-3.5 is built upon a Hybrid Transformer-Mixture of Experts (MoE) architecture, featuring modality-specific input encoders, Spec Core Processing Units (SCPUs) with entropy-regularized routing, and advanced cross-modal attention mechanisms.

The input processing layer tokenizes and embeds multimodal data (text, images, audio, structured data) into a unified embedding space. These embeddings are processed through SCPUs, which employ an entropy-regularized gating function to dynamically select top experts, reducing inference latency by 10-15% and achieving a 92% cache hit rate.

For ultra-long context processing, Spec-3.5 implements chunked streaming inference and memory reuse protocols, enabling stable handling of 256k-token contexts with 28% reduced peak VRAM. The R3S variant optimizes for edge deployment with static top-2 expert selection and 4-bit quantization, while the T4Z variant integrates recursive reasoning layers and symbolic engines (SymPy/Z3) for complex problem-solving.

The architecture concludes with an RL-optimized value head for preference learning and context-aware decoders tailored for variant-specific tasks, ensuring high performance and efficiency.

Future Work

Spec-3.5 marks a major step toward Artificial General Intelligence (AGI), combining scaled pretraining and reinforcement learning for enhanced intelligence. It introduces seamless switching between thinking and non-thinking modes, allowing users to adjust the thinking budget, and broadens language support for greater global reach.

Looking ahead, we plan to advance our models across several dimensions: refining architectures, scaling data, increasing model size, extending context length, broadening modalities, and improving RL with environmental feedback for long-horizon reasoning. We see a shift from training models to training agents, and our next iteration will deliver meaningful improvements for everyone’s work and life.