Region benchmark
LLM API Latency from US East
US East developers access OpenAI, Anthropic, Google, and other LLM APIs with 302-458ms median latency in the current snapshot. Best provider for US East right now: Groq.
Current latency
Current latency table
| Provider | Model | Region | P50 | P95 | P99 | TTFT | Tokens/sec | Collected |
|---|---|---|---|---|---|---|---|---|
| OpenAI | gpt-4o | US East | 342ms | 891ms | 1430ms | 410ms | 72 | |
| Anthropic | claude-3-5-sonnet | US East | 416ms | 1048ms | 1640ms | 492ms | 63 | |
| gemini-1.5-pro | US East | 458ms | 1165ms | 1880ms | 535ms | 68 | ||
| Groq | llama-3.3-70b | US East | 302ms | 770ms | 1220ms | 360ms | 186 |
Provider-by-provider breakdown
Best provider for US East by use case
| Use case | Best provider/model | Reason |
|---|---|---|
| Real-time chat | Groq llama-3.3-70b | Lowest P50 and highest output speed in this snapshot. |
| General product assistant | OpenAI gpt-4o | Balanced TTFT, tail latency, and broad model capability. |
| Long reasoning response | Anthropic Claude 3.5 Sonnet | Slightly slower first token, but stable long-form throughput. |
How US East developers can reduce latency
US East is the lowest-latency region in this snapshot because most providers terminate traffic close to Virginia, Ohio, or New York network hubs.
- Host server-side inference callers in us-east when most users are in North America and the provider has a US endpoint.
- Measure TTFT separately from total completion time because streaming chat feels fast only when the first token arrives quickly.
- Keep retry budgets small for chat. A retry that starts after P95 often feels worse than a graceful fallback model.
Compare this page with the global leaderboard and the benchmark method in How to Measure LLM Latency Correctly.