Region benchmark

LLM API Latency from US West

US West developers access OpenAI, Anthropic, Google, and other LLM APIs with 378-430ms median latency in the current snapshot. Best provider for US West right now: OpenAI.

Markdown version

Current latency

Current latency table

Provider Model Region P50 P95 P99 TTFT Tokens/sec Collected
OpenAI gpt-4o-mini US West 378ms 936ms 1518ms 442ms 86
Together AI mixtral-8x7b US West 430ms 1108ms 1710ms 511ms 112

Provider-by-provider breakdown

Best provider for US West by use case

Use case Best provider/model Reason
Real-time chat OpenAI gpt-4o-mini Lowest P50 in the US West sample and strong token speed.
Batch processing Together AI mixtral-8x7b Higher output speed can beat slightly slower TTFT for long jobs.
Cost-sensitive routing OpenAI mini-class models Lower latency and smaller model cost usually align.

How US West developers can reduce latency

US West is strong for teams deployed on west coast clouds. Cross-country hops add measurable latency, but the tail is still suitable for interactive chat.

  • Do not route west coast user traffic through east coast application servers just to call an LLM API.
  • Cache system prompts and retrieval snippets near the worker or serverless region that performs the model call.
  • Track provider-specific status codes because network latency and rate limiting look similar in aggregate charts.

Compare this page with the global leaderboard and the benchmark method in How to Measure LLM Latency Correctly.