GPT-4o (omni)
Multimodal all-rounder — text, image and audio in one model.
via OpenAI →What is this model?
GPT-4o ("o" for "omni") launched in May 2024 as OpenAI's first truly multimodal model. In a single neural network it can read text, analyse images and process audio — without intermediate steps via separate models. It has been broadly validated in production at thousands of companies and is used as a baseline in countless AI benchmarks.
Strengths
Strengths: native multimodal (images + text in one call), 128K context window, strong tool use with reliable argument types, broad language coverage (50+ languages at quality level), and the most stable OpenAI API version. For use cases that require vision (invoice OCR, product photo analysis, document scans) this is often the only right pick within ZelixAI.
Best suited for
- Multimodal tasks — text, image and audio in one model
- Tool-use / function-calling workflows
- Text creation, marketing copy, letters
How ZelixAI uses this model
We deploy GPT-4o within ZelixAI as the "vision bot": if your use case includes images, photos or documents as input, this is the primary model. For pure text conversations GPT-4o mini is often sufficient and cheaper. GPT-4o remains the choice for production-grade stability where newer models (GPT-5.5) are still too recent.
Real-world examples within ZelixAI
Real example: a fashion retailer uses GPT-4o to automatically describe and categorise product photos — colour, style, suitable occasion. An insurer uses vision capabilities to automatically triage uploaded damage photos ("windscreen broken — category: glass damage"). A logistics company has GPT-4o compare packing-list photos to the purchase order to detect discrepancies.
Limitations and caveats
Limitations: US cloud provider — not for strict EU data residency. Higher priced than GPT-4o mini ($2.5/1M input vs $0.15) — deploy only where the extra capacity is needed. For pure reasoning o3 is often stronger; for absolute flagship level GPT-5.5 is now superior. GPT-4o remains the most predictable choice however.
Technical specifications
| Provider | OpenAI |
| Context window | 128K tokens |
| Throughput | 40–100 tokens/s (Fast) |
| Cost tier | Mid-range |
| Tool / function-calling | yes |
| Data residency | United States (cloud provider) |