# LLM API Latency Benchmark - May 2026

TL;DR: llmping publishes timestamped LLM API latency rows with provider, model, region, P50, P95, P99, TTFT, tokens per second, sample count, and collection time.

Dataset window: 2026-05-01/2026-05-12
Generated at: 2026-05-12T14:00:00Z
JSON download: https://llmping.com/data/latency-benchmark.json

| Provider | Model | Region | P50 | P95 | P99 | TTFT | Tokens/sec | Samples | Collected at |
|---|---|---|---:|---:|---:|---:|---:|---:|---|
| OpenAI | gpt-4o | US East | 342ms | 891ms | 1430ms | 410ms | 72 | 1440 | 2026-05-12T13:55:00Z |
| OpenAI | gpt-4o-mini | US West | 378ms | 936ms | 1518ms | 442ms | 86 | 1440 | 2026-05-12T13:54:00Z |
| Anthropic | claude-3-5-sonnet | US East | 416ms | 1048ms | 1640ms | 492ms | 63 | 1440 | 2026-05-12T13:56:00Z |
| Anthropic | claude-3-haiku | Europe | 536ms | 1280ms | 1984ms | 610ms | 94 | 1440 | 2026-05-12T13:57:00Z |
| Google | gemini-1.5-pro | US East | 458ms | 1165ms | 1880ms | 535ms | 68 | 1440 | 2026-05-12T13:58:00Z |
| Google | gemini-1.5-flash | Asia Pacific | 624ms | 1490ms | 2240ms | 705ms | 102 | 1440 | 2026-05-12T13:59:00Z |
| DeepSeek | deepseek-chat | Singapore | 388ms | 990ms | 1570ms | 456ms | 78 | 1440 | 2026-05-12T14:00:00Z |
| OpenRouter | router-best | Japan | 710ms | 1685ms | 2520ms | 804ms | 55 | 1440 | 2026-05-12T13:52:00Z |
| Groq | llama-3.3-70b | US East | 302ms | 770ms | 1220ms | 360ms | 186 | 1440 | 2026-05-12T13:53:00Z |
| Together AI | mixtral-8x7b | US West | 430ms | 1108ms | 1710ms | 511ms | 112 | 1440 | 2026-05-12T13:51:00Z |
