Dataset - 2026-05-01 to 2026-05-12

LLM API Latency Benchmark - May 2026

This leaderboard is server-rendered as a native HTML table. It can be fetched with curl, parsed without JavaScript, and cited with collection timestamps. JSON and markdown versions are linked directly from this page.

JSON download Markdown version

Temporal coverage
05-01

2026-05-01/2026-05-12

Generated at
UTC

May 12, 2026, 02:00 PM UTC

Benchmark rows
10

Every row includes data attributes for provider, model, region, and collection time.

P50, P95, P99, and TTFT are milliseconds. Output speed is tokens per second. Collection timestamps are UTC.
Provider Model Region P50 P95 P99 TTFT Tokens/sec Samples Collected
OpenAI gpt-4o US East 342ms 891ms 1430ms 410ms 72 1440
OpenAI gpt-4o-mini US West 378ms 936ms 1518ms 442ms 86 1440
Anthropic claude-3-5-sonnet US East 416ms 1048ms 1640ms 492ms 63 1440
Anthropic claude-3-haiku Europe 536ms 1280ms 1984ms 610ms 94 1440
Google gemini-1.5-pro US East 458ms 1165ms 1880ms 535ms 68 1440
Google gemini-1.5-flash Asia Pacific 624ms 1490ms 2240ms 705ms 102 1440
DeepSeek deepseek-chat Singapore 388ms 990ms 1570ms 456ms 78 1440
OpenRouter router-best Japan 710ms 1685ms 2520ms 804ms 55 1440
Groq llama-3.3-70b US East 302ms 770ms 1220ms 360ms 186 1440
Together AI mixtral-8x7b US West 430ms 1108ms 1710ms 511ms 112 1440

How to cite this dataset

Cite the page URL, the dataset window, and the collection timestamp for the row. For example: "llmping measured OpenAI gpt-4o from US East at 342ms P50 in the May 1-12, 2026 benchmark window."

The benchmark methodology is documented in How to Measure LLM Latency Correctly.