NEW Halcyon 2.0 — sub‑second routing is live

The command center for AI‑native engineering teams

Halcyon captures every model call, routes it through policy, executes it at the edge, and gives you one signal to watch — instead of ten dashboards.

Start building — free See how it works

No credit card · Deploy in under 5 minutes

route.checkout-agent → gpt-tier-2 184ms

route.support-copilot → claude-tier-1 211ms

route.data-extractor → internal-model 92ms

Trusted by engineering teams at

Orbital Ferrovia Northbeam Kindred Labs Vantpoint Meridian

The pipeline

One signal, four stages

Every request your team sends to a model passes through the same four stages — so you always know where it is, and why.

01 / Capture

Capture every call

A single SDK line hooks into every model request your services make — no proxies to babysit, no logs to stitch together after the fact.

02 / Route

Route by policy

Send each request to the right model for its cost, latency, and accuracy needs — set once, enforced automatically on every call.

03 / Execute

Execute at the edge

Requests run from the region closest to your user, with automatic failover to a backup model the instant one provider degrades.

04 / Observe

Observe in one view

Cost, latency, and quality for every route, model, and team — in a single live view instead of ten separate dashboards.

2.4B

Requests routed / mo

41ms

Median added latency

99.98%

Routing uptime

120+

Teams shipping on Halcyon

Customers

Teams who stopped guessing

★★★★★

“We replaced four separate logging tools with Halcyon's single view. Incident response time dropped by more than half in the first month.”

Maya OkonkwoStaff Engineer, Orbital

★★★★★

“The routing layer alone paid for itself. We cut model spend by 34% without touching a single line of application code.”

Daniel ReyesVP Engineering, Ferrovia

★★★★★

“Setup took an afternoon. Six months later it's the first place any of us look when something feels off in production.”

Priya ChandranHead of Platform, Northbeam

Pricing

Straightforward, by request volume

Start free. Move up only when your traffic does.

Starter

$0/ month

For small teams testing their first routed workflow.

Up to 50K requests / mo
2 routing policies
7‑day log retention
Community support

Start free

Questions, answered

How does Halcyon route between models?

You define policies based on cost ceiling, latency budget, and required accuracy tier. Halcyon evaluates every request against your active policy in under a millisecond and picks the best available model in real time.

Do I need to change my application code?

No. Halcyon sits behind a drop‑in SDK that mirrors the API shape of the major model providers, so most teams migrate a service in under an hour with no changes to their call sites.

What happens if a model provider goes down?

Halcyon detects degraded latency or error rates within seconds and automatically fails over to your configured backup model, then routes back once the primary recovers.

Can I self‑host Halcyon?

Yes, Enterprise plans include a dedicated deployment inside your own VPC, with the same routing engine and dashboard running entirely on your infrastructure.

Is there a limit on the number of routing policies?

Starter includes two policies to get you testing quickly. Pro and Enterprise plans include unlimited policies, so you can tune routing separately for every team and workload.

Ready to see your traffic in one view?

Connect your first workflow in under five minutes. No credit card required.

Start building — free