Theta-35-Mini is a lightweight, language model designed for efficient reasoning and language understanding in resource-constrained environments.
Access Theta-35-Mini ➚Theta-35-Mini is a lightweight, 3 billion parameter language model developed by SVECTOR as a distilled and optimized variant of the flagship Theta-35 (33B). Built on a Qwen2-style architecture and trained using Group Relative Policy Optimization (GRPO), Theta-35-Mini is designed for efficient language understanding and reasoning in resource-constrained environments.
It inherits architectural principles and alignment strategies from its larger sibling, while significantly reducing the computational footprint—making it suitable for deployment on local machines, edge devices, and constrained inference clusters.
Maintains high reasoning quality and coherence in a fraction of the size of Theta-35.
Trained with Group Relative Policy Optimization for better output control and instruction-following.
Designed for low-latency use cases including on-device assistants and embedded systems.
Released under a permissive license to encourage community research and deployments.
Group Relative Policy Optimization (GRPO) is an advanced training methodology that enables fine-grained reward shaping and efficient policy learning. This technique allows Theta-35-Mini to achieve better output control and enhanced instruction-following behavior compared to traditional training approaches.
GRPO works by organizing training samples into groups and optimizing policies relative to group performance, leading to more stable and effective learning dynamics. This results in improved alignment and more reliable outputs across diverse tasks.
While Theta-35-Mini does not match the raw capabilities of larger models like Theta-35, it demonstrates strong performance across lightweight reasoning benchmarks and few-shot language tasks, particularly when optimized with GRPO. Its performance-to-size ratio makes it highly competitive among models in the 3B parameter class.