Pricing & Latency

How Neurovn estimates cost and latency for every node in your workflow. All numbers are computed locally — zero external API calls.

How pricing works

Neurovn maintains a local pricing registry — a JSON file with per-model rates for 38 models across 7 providers. When you assign a model to an Agent node, the estimator looks up that model's input and output token rates.

Actual pricing is set by each provider and can change at any time. Neurovn's registry is updated regularly, but for production-critical budgets, always verify against the provider's official pricing page.

Providers can change prices without notice. Neurovn estimates are approximations for planning — always confirm sensitive budgets against official rate cards.

Cost formulas

LLM (Agent) Node

input_cost   = (input_tokens  / 1,000,000) × model.input_per_million
output_cost  = (output_tokens / 1,000,000) × model.output_per_million
node_cost    = input_cost + output_cost

Input tokens are counted from the system prompt + context using native tokenization. Output tokens are estimated via a task-type multiplier.

Tool Node

tool_overhead = +schema_tokens + avg_response_tokens from tool registry
fallback       = +200 schema +800 response tokens when tool metadata is missing

Tool latency/cost effects come from registry definitions. The fallback latency is 200 ms when a tool is undefined.

Workflow Total

workflow_cost = SUM(node_costs)
+ branch_probabilities × branch_costs
+ loop_iterations × loop_body_costs

For branched workflows, each path's cost is weighted by its probability. For loops, cost scales by expected iterations (with a configurable max).

Latency model

Per-Node Latency

agent_latency = (output_tokens / model.tokens_per_sec) × 1000 ms
tool_latency  = tool_registry.latency_ms (fallback 200 ms)

Graph Latency

sequential  = SUM(node_latencies)      along the path
parallel    = MAX(branch_latencies)     across branches
loop/retry  = expected_iterations × single_lap_latency

The critical path — the longest-latency path through the graph — determines the end-to-end P95 latency estimate.

Supported providers

Neurovn ships with pricing data for 38 models across 7 providers. Adding a new model is a single entry in backend/data/model_pricing.json — no frontend code changes needed.

Provider	Models	Pricing
OpenAI	GPT-4, GPT-4o, GPT-4o mini, o3, o4 mini + more	Official
Anthropic	Claude 4 Sonnet, Claude 4 Opus, Claude 3.7 Sonnet + more	Official
Google	Gemini 1.5 Pro/Flash, Gemini 2.0 Pro/Flash, Gemini Exp-1206 + more	Official
Meta	Llama 3.1 (405B/70B/8B), Llama 3.2, Llama 3.3	Official
Mistral	Mistral Large/Medium/Small, Codestral	Official
DeepSeek	DeepSeek-V3, DeepSeek-R1	Official
Cohere	Command R, Command R+, Command Nightly	Official

Next up

Node Palette

All node types explained

Integrations

CLI and decorator implementation guides