July 16, 2021
Introducing Spec-1-Mini
A compact, transformer-based 130M parameter language model optimized for efficiency and performance.
Model Architecture
- Architecture: Decoder-only Transformer
- Parameters: 130 Million
- Layers: 12 transformer blocks
- Hidden size: 768
- Attention heads: 12
- Max context length: 1024 tokens
- Tokenizer: Byte-Pair Encoding (BPE) — 50,000 vocab size
Training Details
- Training corpus: Less than 10GB of curated multilingual text
- Domains: Open-domain internet corpus, books, documentation, conversations
- Training hardware: Low-end CPU clusters (Intel i5/i7 8th-Gen equivalents)
- Epochs: 8
- Batch size: 256 (gradient accumulation)
- Precision: FP32 with manual memory optimization
- Frameworks: PyTorch + HuggingFace Transformers
Use Cases
- Text generation for chatbots and interfaces
- Semantic search and document summarization
- Offline inference in mobile or edge devices
- Proof-of-concept NLP pipelines for startups
Comparison to Industry Models
- Memory footprint: < 500MB model size on disk
- Inference speed: Real-time response on CPU (< 1s per response)