Region benchmark

LLM API Latency from US East

US East developers access OpenAI, Anthropic, Google, and other LLM APIs with 302-458ms median latency in the current snapshot. Best provider for US East right now: Groq.

Markdown version

Current latency

Current latency table

Provider Model Region P50 P95 P99 TTFT Tokens/sec Collected
OpenAI gpt-4o US East 342ms 891ms 1430ms 410ms 72
Anthropic claude-3-5-sonnet US East 416ms 1048ms 1640ms 492ms 63
Google gemini-1.5-pro US East 458ms 1165ms 1880ms 535ms 68
Groq llama-3.3-70b US East 302ms 770ms 1220ms 360ms 186

Provider-by-provider breakdown

Best provider for US East by use case

Use case Best provider/model Reason
Real-time chat Groq llama-3.3-70b Lowest P50 and highest output speed in this snapshot.
General product assistant OpenAI gpt-4o Balanced TTFT, tail latency, and broad model capability.
Long reasoning response Anthropic Claude 3.5 Sonnet Slightly slower first token, but stable long-form throughput.

How US East developers can reduce latency

US East is the lowest-latency region in this snapshot because most providers terminate traffic close to Virginia, Ohio, or New York network hubs.

  • Host server-side inference callers in us-east when most users are in North America and the provider has a US endpoint.
  • Measure TTFT separately from total completion time because streaming chat feels fast only when the first token arrives quickly.
  • Keep retry budgets small for chat. A retry that starts after P95 often feels worse than a graceful fallback model.

Compare this page with the global leaderboard and the benchmark method in How to Measure LLM Latency Correctly.