# LLM API Latency from US East - Real-time Benchmarks

TL;DR: US East developers see 302-458ms median latency in the current llmping benchmark snapshot. Best provider for US East right now: Groq.

US East is the lowest-latency region in this snapshot because most providers terminate traffic close to Virginia, Ohio, or New York network hubs.

## Current latency


| Provider | Model | Region | P50 | P95 | P99 | TTFT | Tokens/sec | Samples | Collected at |
|---|---|---|---:|---:|---:|---:|---:|---:|---|
| OpenAI | gpt-4o | US East | 342ms | 891ms | 1430ms | 410ms | 72 | 1440 | 2026-05-12T13:55:00Z |
| Anthropic | claude-3-5-sonnet | US East | 416ms | 1048ms | 1640ms | 492ms | 63 | 1440 | 2026-05-12T13:56:00Z |
| Google | gemini-1.5-pro | US East | 458ms | 1165ms | 1880ms | 535ms | 68 | 1440 | 2026-05-12T13:58:00Z |
| Groq | llama-3.3-70b | US East | 302ms | 770ms | 1220ms | 360ms | 186 | 1440 | 2026-05-12T13:53:00Z |


## Best provider for US East by use case

| Use case | Winner | Reason |
|---|---|---|
| Real-time chat | Groq llama-3.3-70b | Lowest P50 and highest output speed in this snapshot. |
| General product assistant | OpenAI gpt-4o | Balanced TTFT, tail latency, and broad model capability. |
| Long reasoning response | Anthropic Claude 3.5 Sonnet | Slightly slower first token, but stable long-form throughput. |

## How US East developers can reduce latency

- Host server-side inference callers in us-east when most users are in North America and the provider has a US endpoint.
- Measure TTFT separately from total completion time because streaming chat feels fast only when the first token arrives quickly.
- Keep retry budgets small for chat. A retry that starts after P95 often feels worse than a graceful fallback model.
