LIVE ON LOCAL GPU

Talk to a real AI agent

27 billion parameters running on dedicated hardware. No cloud APIs, no data collection. Your conversation stays between you and the machine.

Core

Base Model

3 Tiers

Upgrade Path

25 t/s

Inference Speed

32k

Context Window

3 Free / Day

Pick your tier

Each tier unlocks more capability. Credits never expire.

Starter

$10

100 credits

$0.10 / message

Norax Core — private inference
Streaming responses (~25 t/s)
Code, research & analysis
Persistent memory across sessions
Web search & URL fetching
Credits never expire

Plus

$15

250 credits

$0.06 / message

Everything in Starter, plus:
Access to 70B+ frontier models
Extended 32k context window
Priority inference queue
Multi-turn deep reasoning mode
File upload & document analysis
40% more credits per dollar

Pro

$20

500 credits

$0.04 / message

Everything in Plus, plus:
Full agent mode — tool use & automation
Code execution sandbox
Browser automation & web scraping
Scheduled tasks & cron agents
API access (bring your own integrations)
Best value — 60% savings

Not another chatbot wrapper

Full AI agent stack — memory, reasoning, tool use, running on hardware we own.

🔒

Private by default

Local GPU inference — your prompts never leave our hardware. No third-party API calls, no data harvesting.

🧠

Cognitive architecture

Multi-layer memory system with semantic retrieval, procedural knowledge, and cross-session persistence.

⚡

Real-time streaming

Token-streamed responses from optimized models running on dedicated hardware. No cloud API round-trips.

🛠

Tool use

Code execution, web research, file management, browser automation — the full version operates autonomously.

🔄

Self-improving

Learns from every interaction. Mistakes become procedural memory. Performance compounds over time.

🌐

Multi-surface

Discord, Telegram, Signal, web — same agent, same memory, any interface. Operates 24/7 without supervision.

User → Web Interface
  ↓
Agent Gateway → Brain Router
  ↓              ↓
Memory        Tools
  ↓              ↓
Local LLM ← Norax Core
  ↓
Response → Streaming SSE

Full agent pipeline, not a proxy

Your message flows through a complete cognitive architecture: brain routing, memory retrieval, tool orchestration, and local inference — before streaming back token by token. This demo runs the same pipeline as the production system.

Talk to a real AI agent

What do you want to know?

Pick your tier

Not another chatbot wrapper

Private by default

Cognitive architecture

Real-time streaming

Tool use

Self-improving

Multi-surface

Full agent pipeline, not a proxy

⚡ Daily limit reached

Talk to a real AI agent

What do you want to know?

Pick your tier

Not another chatbot wrapper

Private by default

Cognitive architecture

Real-time streaming

Tool use

Self-improving

Multi-surface

Full agent pipeline, not a proxy

Welcome back

Create your account

⚡ Daily limit reached