TonoFabric Achitecture
General TopicsThe Brain Behind
Every AI Factory
TonoFabric™ is the enterprise orchestration platform that transforms distributed TonoForge™ hardware into a unified, intelligent AI cloud. Ten microservices, six global regions, one seamless API.
Four Layers, One API
TonoFabric™ is structured as a layered microservices platform. Every request traverses the same deterministic path — from authentication through intelligent routing to distributed inference — with full telemetry at every hop.
From Prompt to Response in 7 Hops
Every API call follows a deterministic path through the platform. Authentication, quota enforcement, model selection, cluster allocation, and inference happen in a single round-trip.
{
“prompt”: “Analyse this turbine sensor data for anomalies”,
“purpose”: “general”,
“region_pref”: “eu-west”,
“session_id”: null // auto-created if omitted
}
// Response
{
“session_id”: “a3f8…”,
“model_id”: “m-70b-general”,
“answer”: “Based on the sensor readings, I detect 3 anomalies…”,
“latency_ms”: 142
}
10 Services, Zero Single Points of Failure
Six Regions, Dual Data Centres
The Cluster Manager seeds 200 GPU clusters across six regions, each with dual data centres for redundancy. The three-tier allocation algorithm guarantees a healthy endpoint for every request — with graceful failover across DCs and regions.
Three-Tier Failover
Six Model Classes, One API
| Model ID | Parameters | Family | Hardware | Max Context | Throughput Hint |
|---|---|---|---|---|---|
| m-10b-general | 10 B | General | Any (NVIDIA / AMD) | 8,192 tokens | 2,000 rps |
| m-34b-general | 34 B | General | Any (NVIDIA / AMD) | 16,384 tokens | 1,200 rps |
| m-70b-general | 70 B | General | NVIDIA preferred | 32,768 tokens | 600 rps |
| m-120b-coder | 120 B | Coder | NVIDIA preferred | 32,768 tokens | 400 rps |
| m-250b-general | 250 B | General | NVIDIA preferred | 65,536 tokens | 200 rps |
| m-1000b-research | 1 T | Research | NVIDIA required | 131,072 tokens | 50 rps |
The recommendation engine matches prompt token count, purpose, and latency SLA to the optimal model. Hardware affinity ensures large models run on NVIDIA clusters with sufficient HBM3e capacity.
Full-Stack Visibility
Every service is instrumented with OpenTelemetry. Traces flow through the OTel Collector to Prometheus for metrics and Grafana for visualisation. Every hop is observable — from API Gateway to GPU rack.
OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4318
# Grafana access
GRAFANA_ADMIN_USER=admin
GRAFANA_ADMIN_PASS=[secure-password]
Defence in Depth
From Docker Compose to Kubernetes
Development Stack
Single-command local development with Docker Compose. All 10 services, plus the observability stack (OTel Collector, Prometheus, Grafana), spin up with docker compose up.
api_gateway: # :8000
auth_service: # :8001
model_registry: # :8002
cluster_manager:# :8003
router_service: # :8004
session_service:# :8005
storage_service:# :8006
inference_proxy:# :8007 + gRPC :9000
telemetry_service:# :8008
usage_service: # :8009
otel-collector: # :4317/:4318
prometheus: # :9090
grafana: # :3000
Production Stack
Kubernetes deployment with Kustomize overlays or Helm umbrella chart. Horizontal Pod Autoscalers for central and edge clusters. CI/CD via GitHub Actions — build, push to GHCR, deploy.
Sealed Secrets — Encrypt secrets with kubeseal, commit safely
External Secrets — Sync from AWS/GCP/Azure secret stores
Kustomize — Environment overlays (dev, staging, prod)
Helm Chart — Umbrella chart for full platform deployment
GitHub Actions — Build → push GHCR → deploy overlay/Helm
Built for Scale
| Layer | Technology | Purpose |
|---|---|---|
| API Framework | FastAPI (Python) | Async HTTP services with Pydantic validation |
| RPC Transport | gRPC + protobuf | High-performance inference calls (:9000) |
| HTTP Client | httpx (async) | Inter-service communication |
| Auth | PyJWT + HS256 | Token signing & verification |
| Database | SQLAlchemy (async) + Alembic | User store, session store, schema migrations |
| Scheduling | APScheduler | Session TTL expiry & archival |
| Tracing | OpenTelemetry SDK | Distributed tracing across all services |
| Metrics | Prometheus + Grafana | Time-series metrics & dashboards |
| Orchestration | Docker Compose / Kubernetes | Dev & production deployment |
| CI/CD | GitHub Actions | Build, push GHCR, deploy Kustomize/Helm |
| Secrets | Sealed Secrets / External Secrets | Encrypted in-cluster or cloud-synced secrets |
Deploy TonoFabric™ in your infrastructure
One API to orchestrate AI across every TonoForge™ node. Contact us to schedule an architecture review.
