Dataset - 2026-05-01 to 2026-05-12

LLM API Latency Benchmark - May 2026

Name: LLM API Latency Benchmark - May 2026
Creator: llmping

This leaderboard is server-rendered as a native HTML table. It can be fetched with curl, parsed without JavaScript, and cited with collection timestamps. JSON and markdown versions are linked directly from this page.

Historical snapshot: these measurements cover May 1–12, 2026. Use them to compare that collection window, not as a current model ranking or live provider benchmark.

JSON download Markdown version

Temporal coverage

05-01

2026-05-01/2026-05-12

Generated at

UTC

May 12, 2026, 02:00 PM UTC

Benchmark rows

Every row includes data attributes for provider, model, region, and collection time.

P50, P95, P99, and TTFT are milliseconds. Output speed is tokens per second. Collection timestamps are UTC.
Provider	Model	Region	P50	P95	P99	TTFT	Tokens/sec	Samples	Collected
OpenAI	gpt-4o	US East	342ms	891ms	1430ms	410ms	72	1440	May 12, 2026, 01:55 PM UTC
OpenAI	gpt-4o-mini	US West	378ms	936ms	1518ms	442ms	86	1440	May 12, 2026, 01:54 PM UTC
Anthropic	claude-3-5-sonnet	US East	416ms	1048ms	1640ms	492ms	63	1440	May 12, 2026, 01:56 PM UTC
Anthropic	claude-3-haiku	Europe	536ms	1280ms	1984ms	610ms	94	1440	May 12, 2026, 01:57 PM UTC
Google	gemini-1.5-pro	US East	458ms	1165ms	1880ms	535ms	68	1440	May 12, 2026, 01:58 PM UTC
Google	gemini-1.5-flash	Asia Pacific	624ms	1490ms	2240ms	705ms	102	1440	May 12, 2026, 01:59 PM UTC
DeepSeek	deepseek-chat	Singapore	388ms	990ms	1570ms	456ms	78	1440	May 12, 2026, 02:00 PM UTC
OpenRouter	router-best	Japan	710ms	1685ms	2520ms	804ms	55	1440	May 12, 2026, 01:52 PM UTC
Groq	llama-3.3-70b	US East	302ms	770ms	1220ms	360ms	186	1440	May 12, 2026, 01:53 PM UTC
Together AI	mixtral-8x7b	US West	430ms	1108ms	1710ms	511ms	112	1440	May 12, 2026, 01:51 PM UTC

How to cite this dataset

Cite the page URL, the dataset window, and the collection timestamp for the row. For example: "llmping measured OpenAI gpt-4o from US East at 342ms P50 in the May 1-12, 2026 benchmark window."

The benchmark methodology is documented in How to Measure LLM Latency Correctly.