Pricing & Latency

How Neurovn estimates cost and latency for every node in your workflow. All numbers are computed locally — zero external API calls.

How pricing works

Neurovn maintains a local pricing registry — a JSON file with per-model rates for 38 models across 7 providers. When you assign a model to an Agent node, the estimator looks up that model's input and output token rates.

Actual pricing is set by each provider and can change at any time. Neurovn's registry is updated regularly, but for production-critical budgets, always verify against the provider's official pricing page.

Providers can change prices without notice. Neurovn estimates are approximations for planning — always confirm sensitive budgets against official rate cards.

Cost formulas

LLM (Agent) Node

input_cost = (input_tokens / 1,000,000) × model.input_per_million output_cost = (output_tokens / 1,000,000) × model.output_per_million node_cost = input_cost + output_cost

Input tokens are counted from the system prompt + context using native tokenization. Output tokens are estimated via a task-type multiplier.

Tool Node

tool_overhead = +schema_tokens + avg_response_tokens from tool registry fallback = +200 schema +800 response tokens when tool metadata is missing

Tool latency/cost effects come from registry definitions. The fallback latency is 200 ms when a tool is undefined.

Workflow Total

workflow_cost = SUM(node_costs) + branch_probabilities × branch_costs + loop_iterations × loop_body_costs

For branched workflows, each path's cost is weighted by its probability. For loops, cost scales by expected iterations (with a configurable max).

Latency model

Per-Node Latency

agent_latency = (output_tokens / model.tokens_per_sec) × 1000 ms tool_latency = tool_registry.latency_ms (fallback 200 ms)

Graph Latency

sequential = SUM(node_latencies) along the path parallel = MAX(branch_latencies) across branches loop/retry = expected_iterations × single_lap_latency

The critical path — the longest-latency path through the graph — determines the end-to-end P95 latency estimate.

Supported providers

Neurovn ships with pricing data for 38 models across 7 providers. Adding a new model is a single entry in backend/data/model_pricing.json — no frontend code changes needed.

ProviderModelsPricing
OpenAIGPT-4, GPT-4o, GPT-4o mini, o3, o4 mini + moreOfficial
AnthropicClaude 4 Sonnet, Claude 4 Opus, Claude 3.7 Sonnet + moreOfficial
GoogleGemini 1.5 Pro/Flash, Gemini 2.0 Pro/Flash, Gemini Exp-1206 + moreOfficial
MetaLlama 3.1 (405B/70B/8B), Llama 3.2, Llama 3.3Official
MistralMistral Large/Medium/Small, CodestralOfficial
DeepSeekDeepSeek-V3, DeepSeek-R1Official
CohereCommand R, Command R+, Command NightlyOfficial

Next up