Frequently asked questions
Common questions about shipping AI in B2B SaaS
How long does it take to ship a customer-facing AI feature in B2B SaaS?
A narrow customer-facing AI feature with strong scoping ships to production in 10 to 16 weeks. The first 2 weeks are scoping and evaluation-set building. Weeks 3 to 8 are the build with continuous testing against the evaluation set. Weeks 9 to 10 are a closed beta with 5 to 10 friendly customers. Weeks 11 to 12 are the production launch with monitoring and rollback paths in place. Wider features that touch many workflows or carry strict regulatory constraints take longer, typically 4 to 6 months.
What is the typical cost of a customer-facing AI feature build?
For Finnish mid-market B2B SaaS in 2026, the typical custom build cost for one customer-facing AI feature lands between 80,000 and 250,000 euros. The lower end buys a narrow, well-scoped feature (a smart search bar, a draft-generation tool, a single in-app assistant on top of existing data). The higher end buys a feature that touches multiple workflows, integrates with several internal systems, and carries strict quality requirements. Ongoing run cost (foundation model API, infrastructure, monitoring) typically adds 1,000 to 5,000 euros per month at moderate customer volume.
Should we build the AI feature or buy a vendor product?
For an AI feature inside your own product, build. If the AI appears in front of your customers as part of your product, buying it from a SaaS vendor means the same feature appears in every competitor's product within twelve months. Building keeps the feature distinct, keeps your customer data inside your perimeter, and avoids per-end-user costs the vendor will charge you to pass through. The reason to buy is when the AI is for your internal team rather than your customers.
What are the four customer-facing AI patterns that consistently work?
Assistant: a chat surface that answers questions over the customer's own data (their documents, their reports, their account). Generation: AI drafts content the customer reviews and edits before using (email drafts, summaries, copy, replies). Automation: AI takes actions the customer would have taken manually, with their approval (auto-routing, auto-tagging, auto-scheduling). Intelligence: AI invisible behind a number or a list (lead scoring, churn prediction, content recommendations). Most successful first AI features in B2B SaaS land in the assistant or generation pattern because the human is in the loop and quality issues stay recoverable.
What architecture is right for a B2B SaaS AI feature?
A foundation model layer (Claude, GPT, Gemini, Llama, Mistral) accessed through an API or self-hosted depending on data sensitivity. A retrieval layer that pulls the customer's own data into the prompt as context. A tools layer that lets the model query the customer's data through audited operations, typically exposed through MCP. A UI layer in the product where the AI appears (chat, inline suggestion, dashboard widget). Per-tenant isolation across every layer so one customer's data never reaches another customer's prompt or model. The application that customers see is custom-built (React, Next.js, Vue on the front; FastAPI, Express, NestJS on the back) and runs in the company's own cloud.
How do we prevent hallucinations in a customer-facing AI feature?
Three discipline layers. First, ground the model in the customer's own data using RAG with strict source citation so every claim the AI makes is traceable back to a document or row. Second, evaluate the AI continuously on a 100 to 500 example test set that covers common cases and known failure modes, and run the evaluation on every model update or prompt change. Third, design the UI so the AI's output is editable, not authoritative: the customer reviews and approves before the action commits. Hallucinations cannot be eliminated, but they can be contained inside a workflow where the customer catches them before they cause damage.
What is per-tenant data isolation and why does it matter?
Per-tenant isolation is the architectural rule that one customer's data never reaches another customer's prompt, retrieval, or model context. In multi-tenant B2B SaaS this is non-negotiable: a finance customer's transactions cannot show up in a different customer's AI assistant. Practically it means tenant-scoped vector indexes, tenant-scoped retrieval queries, tenant-scoped tool calls, and tenant-scoped audit logs. Skipping this is the single most common compliance failure in early B2B SaaS AI features and the one that breaks customer trust the moment it surfaces.
How should we price an AI feature in B2B SaaS?
Three pricing models work in B2B SaaS. Bundled into an existing tier: AI becomes a tier upsell driver and ARPU rises through tier moves rather than separate billing. Per-seat add-on: 10 to 50 euros per seat per month on top of the base price, sold as an AI module. Per-usage: priced per resolution, per draft, per task. The per-use cost of running the AI typically lands between 0.05 and 2.00 euros depending on model and complexity, and feature pricing should sit at 5 to 20 times the per-use cost to leave margin for support, evaluation, and rebuilds.
What evaluation discipline does a customer-facing AI feature need?
Five practices. A test set of 100 to 500 representative examples built during scoping, including known failure modes. Continuous evaluation that runs against the test set on every prompt change, model update, or retrieval index change. Production sampling that runs the same evaluation on a percentage of real customer interactions. A quality dashboard the team checks weekly. A rollback path when quality regresses. Without these, an AI feature that works at launch silently regresses over the following months and the regression surfaces as customer complaints rather than as an internal signal.
What goes wrong most often when shipping an AI feature in B2B SaaS?
Six patterns repeat. Scoping too broad: trying to ship an assistant that handles every workflow rather than one narrow workflow well. Skipping the evaluation set: no test set means no way to tell if quality is moving. Sending data across tenants accidentally: a retrieval query that returns rows from a different customer. Mixing the AI feature with the rest of the product without an off switch: when quality regresses, there is no way to turn the AI off cleanly. Underestimating monitoring: the AI is shipped without dashboards that show error rate, latency, and cost per call. Pricing the feature below run cost: the feature ships, customers use it, and gross margin drops at every renewal.