How to price an AI product in 2026: token costs and margins

How to price an AI product in 2026 starts with modeling per-user inference cost, then choosing a pricing model that protects gross margin: flat with usage caps, prepaid credits, or true usage-based with markup. Aim for 50-70% gross margin (not the old 80% SaaS rule), and never pass token costs through 1:1 to customers.

Most AI founders price as if they are selling pure SaaS. They are not. Your COGS scales with every request, and the 80% gross-margin reflex from 2018 SaaS playbooks will quietly eat your runway. Here is the framework for AI product pricing that survives the next token-price swing.

Model your true per-user inference cost first

Before you pick a pricing model, you need a per-user cost number. Tokens per action multiplied by price per million tokens, multiplied by P50 and P90 actions per user per month, gives you the floor.

Run two scenarios. The median user (P50) tells you whether your headline price covers cost. The heavy user (P90) tells you whether your ten worst customers will erase your gross margin in a single quarter.

A workable model for an AI SaaS in the 11-50 user range:

inputs: 6,000 tokens/action @ $2.50/1M = $0.015
outputs: 2,000 tokens/action @ $10/1M = $0.020
total per action: $0.035

P50 user: 120 actions/month = $4.20 inference cost
P90 user: 600 actions/month = $21.00 inference cost

If you charge $29/month flat with no cap, the P90 user costs you $21 in raw tokens before any infra, support, or model retraining. Your gross margin on that cohort is already underwater.

The 5-step usage-based AI pricing model that protects margin

Use this sequence. Skip the "experiment with three pricing models in parallel" detour and commit, ship, iterate, as Y Combinator recommends for B2B pricing.

Compute per-action cost at current provider rates, with a 30% buffer for input variance.
Project P50 and P90 monthly usage per active user from the last 30 days of logs.
Set a base price that delivers a 60% gross margin on the P50 user.
Cap usage at roughly 3x the P50 to prevent the P90 user from breaking the unit. Charge overages or upgrade them.
Reprice every 90 days against current token prices and observed usage. Do not wait for the annual cycle.

The inference cost pass-through trap: don't tie price 1:1 to provider tokens

Founders who price as "GPT-4o cost + 20%" think they have built a hedge. They have built the opposite: a contract that erodes pricing power in both directions.

When OpenAI cuts prices, as they did in December 2024 by 60% on input and 87.5% on output for the GPT-4o realtime API, pass-through pricing means your revenue per user drops in lockstep. You captured none of the margin expansion. When the next model launches at 3x cost for a 2x quality jump and your customers want it, you cannot raise prices without renegotiating every contract.

The fix: charge for value or output, then absorb token cost shifts inside your gross margin. Sequoia's framing of outcome-based and agent-based pricing for AI exists for exactly this reason. You price the deliverable (qualified meeting booked, contract reviewed, ticket resolved), not the LLM call underneath.

AI gross margin in 2026: the 50-70% floor VCs accept

VCs writing AI checks have re-baselined. The old "80%+ or it is not a real SaaS" reflex still applies to pure software, but AI-native companies are getting funded at gross margins that would have been disqualifying three years ago.

Communicate the number explicitly in your deck. Say "current gross margin: 58%, with a path to 70% as we shift the long-tail of queries to a cheaper model." Hiding it forces the partner to assume the worst. The capital environment is large, with software startups raising $8.98 billion on Carta in Q3 2025 alone, so the money is there. What is scarce is the founder who can defend 55% with a clear path to 70%.

Signals that read as weak unit economics to an investor:

No per-user inference number in your data room: they will assume you have not measured it.
A pricing page with no usage cap: they will assume P90 users are crushing you.
"We pass through OpenAI costs" in your pricing model: they will assume zero pricing power.

Falling token costs are a margin opportunity, not a price cut

When your provider drops prices, the instinct is to drop yours to match. Resist that for at least one quarter. Token deflation is rewriting the usage-based pricing playbook, and the early movers are banking the margin instead of giving it back.

Use the savings to lift gross margin toward 70%, fund a higher-context model for power users on the top tier, or extend usage caps without changing the headline price. All three create defensibility. None of them shows up as a price cut on your invoice, which is the only number your customer remembers when their next renewal lands.

Why this matters for your raise

AI pricing is the single hardest line item in a Series A AI deck right now. A 58% gross margin you can defend with a per-user inference model wins; a 75% gross margin you cannot explain loses. Get your token math right at 11-50 users and you walk into the next conversation with the slide that closes the round.

FAQ

How do you price an AI SaaS product? Start by modeling per-user inference cost at P50 and P90 usage, then set a base price that delivers a 60% gross margin on the median user and cap usage at roughly 3x that level. Most AI SaaS in 2026 lands on a flat tier with a usage cap plus overages, or a credits-based model. Avoid 1:1 pass-through pricing.

Should AI pricing be usage-based or flat? Hybrid wins for most early-stage AI products: a flat base tier with a generous usage cap, then overage charges or a higher tier for heavy users. Pure usage-based pricing creates revenue volatility that hurts forecasting and enterprise procurement. Pure flat pricing exposes you on P90 users who can crush your gross margin alone.

What gross margin should an AI startup target? 50-70% is the acceptable range for AI-native companies in 2026, with 60%+ as the working floor for a credible Series A story. The old SaaS reflex of 80%+ no longer applies because COGS scales with inference. Show a defended path to 70% in your data room and you keep the partner in the room.

How do you handle rising or falling token costs in pricing? When token costs fall, hold your headline price for at least one quarter and bank the margin or reinvest it in better models for top-tier users. When costs rise, raise prices on new customers and grandfather existing accounts for one renewal cycle. Never use a 1:1 pass-through model; it kills pricing power in both directions.

What is the pass-through trap when pricing AI products? The pass-through trap is pricing your product as provider token cost plus a fixed markup. It feels like a hedge but eliminates pricing power: when providers cut prices your revenue drops in lockstep, and when better models launch at higher costs you cannot raise prices without contract renegotiation. Charge for the value or outcome instead.

How to cold email VCs in 2026: the tactical playbook — for when the playbook turns into a raise.
The H1 2026 SaaS pricing report — Related pricing guide.
The H1 2026 AI Product GTM Report: data, pricing, and retention — Related gtm business model guide.
How to price SaaS at seed 2026: the founder framework — Related pricing guide.

How to price an AI product in 2026: token costs and margins

Model your true per-user inference cost first

The 5-step usage-based AI pricing model that protects margin

The inference cost pass-through trap: don't tie price 1:1 to provider tokens

AI gross margin in 2026: the 50-70% floor VCs accept

Falling token costs are a margin opportunity, not a price cut

Why this matters for your raise

FAQ

Related on the hub

Causo is shipping a sales product.

Keep reading

SaaS gross margin benchmarks at seed in 2026

AI founder seed 2026: what changed and the playbook that works

GTM for AI products in 2026: the motion that actually converts

AI tool budget for a seed startup in 2026