Dataset - 2026-05-01 to 2026-05-12
LLM API Latency Benchmark - May 2026
This leaderboard is server-rendered as a native HTML table. It can be fetched with curl, parsed without JavaScript, and cited with collection timestamps. JSON and markdown versions are linked directly from this page.
Temporal coverage
05-01
2026-05-01/2026-05-12
Generated at
UTC
May 12, 2026, 02:00 PM UTC
Benchmark rows
10
Every row includes data attributes for provider, model, region, and collection time.
| Provider | Model | Region | P50 | P95 | P99 | TTFT | Tokens/sec | Samples | Collected |
|---|---|---|---|---|---|---|---|---|---|
| OpenAI | gpt-4o | US East | 342ms | 891ms | 1430ms | 410ms | 72 | 1440 | |
| OpenAI | gpt-4o-mini | US West | 378ms | 936ms | 1518ms | 442ms | 86 | 1440 | |
| Anthropic | claude-3-5-sonnet | US East | 416ms | 1048ms | 1640ms | 492ms | 63 | 1440 | |
| Anthropic | claude-3-haiku | Europe | 536ms | 1280ms | 1984ms | 610ms | 94 | 1440 | |
| gemini-1.5-pro | US East | 458ms | 1165ms | 1880ms | 535ms | 68 | 1440 | ||
| gemini-1.5-flash | Asia Pacific | 624ms | 1490ms | 2240ms | 705ms | 102 | 1440 | ||
| DeepSeek | deepseek-chat | Singapore | 388ms | 990ms | 1570ms | 456ms | 78 | 1440 | |
| OpenRouter | router-best | Japan | 710ms | 1685ms | 2520ms | 804ms | 55 | 1440 | |
| Groq | llama-3.3-70b | US East | 302ms | 770ms | 1220ms | 360ms | 186 | 1440 | |
| Together AI | mixtral-8x7b | US West | 430ms | 1108ms | 1710ms | 511ms | 112 | 1440 |
How to cite this dataset
Cite the page URL, the dataset window, and the collection timestamp for the row. For example: "llmping measured OpenAI gpt-4o from US East at 342ms P50 in the May 1-12, 2026 benchmark window."
The benchmark methodology is documented in How to Measure LLM Latency Correctly.