July 16, 2021

Introducing Spec-1-Mini

A compact, transformer-based 130M parameter language model optimized for efficiency and performance.

Spec-1-Mini Architecture

Model Architecture

  • Architecture: Decoder-only Transformer
  • Parameters: 130 Million
  • Layers: 12 transformer blocks
  • Hidden size: 768
  • Attention heads: 12
  • Max context length: 1024 tokens
  • Tokenizer: Byte-Pair Encoding (BPE) — 50,000 vocab size

Training Details

  • Training corpus: Less than 10GB of curated multilingual text
  • Domains: Open-domain internet corpus, books, documentation, conversations
  • Training hardware: Low-end CPU clusters (Intel i5/i7 8th-Gen equivalents)
  • Epochs: 8
  • Batch size: 256 (gradient accumulation)
  • Precision: FP32 with manual memory optimization
  • Frameworks: PyTorch + HuggingFace Transformers

Use Cases

  • Text generation for chatbots and interfaces
  • Semantic search and document summarization
  • Offline inference in mobile or edge devices
  • Proof-of-concept NLP pipelines for startups

Comparison to Industry Models

  • Memory footprint: < 500MB model size on disk
  • Inference speed: Real-time response on CPU (< 1s per response)

Explore More