How to run enterprise GenAI like a production service


Scale becomes manageable when GenAI is treated as a service with explicit constraints and measurable outcomes. It’s best to rely on a set of production disciplines to get there.

enterprise genAI 01

UST

Define the production contract

Write the contract for the experience you plan to operate. Put numbers on it. Include p95 latency, availability, error budget, and expected behavior under load. Add a cost envelope per request. Capture policy requirements for data access, citation, and tool use.

This step changes design choices quickly. A team with a 2.5-second p95 target makes different retrieval and routing choices than a team that can tolerate 10 seconds. A team with a three-cent per answer budget makes different model tier choices than a team with a fifty-cent budget.

Treat retrieval as the main system

Most enterprise assistants rely on retrieval-augmented generation. Retrieval drives answer quality. Retrieval also drives unit economics through context size, re-ranking, and repeat work. I spend more time on retrieval quality than on prompt wording.



Source link