GPT-4o mini
Spot-cheap workhorse for customer questions and classification.
via OpenAI →What is this model?
GPT-4o mini is the "small" variant of OpenAI's GPT-4o. Optimised for low cost and high throughput, without major compromises on quality for business conversations. At $0.15 per million input tokens, this is one of the cheapest capable models on the market — significantly cheaper than comparable models from Anthropic or Google.
Strengths
Strengths: extremely low cost (~6× cheaper than Claude Haiku), high inference speed (100+ tokens/sec), 128K context window, native vision support (images can be used as input), solid instruction-following for business conversations. For most customer-service use cases this is the cost-efficient default pick.
Best suited for
- General customer questions and chatbot conversations
- Fast classification and routing
- Real-time interactions with low latency
How ZelixAI uses this model
We position GPT-4o mini as the default "scale-without-budget-worries" pick: for customers with high conversation volume where every cent counts. Good alternative to Claude Haiku when price is the main criterion. Not recommended when EU data residency is a hard requirement — use Mistral Small in the Privacy Cluster instead.
Real-world examples within ZelixAI
Real example: a webshop uses GPT-4o mini to handle 5,000+ daily order status questions, return requests and product info queries. Cost: approximately €15 per day at this volume — comparable to one hour of human support work. An insurance company uses the same model for intent routing of incoming emails (which team, which priority) — average latency 200ms, throughput >5000 calls/hour.
Limitations and caveats
Limitations: US cloud provider — not for strict EU data residency. Less capable than larger models on complex multi-step reasoning, long-document analysis or nuance-heavy content creation. For heavy analysis use o3 or GPT-5.5. For critical decisions always build in human verification.
Technical specifications
| Provider | OpenAI |
| Context window | 128K tokens |
| Throughput | 100+ tokens/s (Very fast) |
| Cost tier | Affordable |
| Tool / function-calling | yes |
| Data residency | United States (cloud provider) |