Region benchmark
LLM API Latency from US West
US West developers access OpenAI, Anthropic, Google, and other LLM APIs with 378-430ms median latency in the current snapshot. Best provider for US West right now: OpenAI.
Current latency
Current latency table
| Provider | Model | Region | P50 | P95 | P99 | TTFT | Tokens/sec | Collected |
|---|---|---|---|---|---|---|---|---|
| OpenAI | gpt-4o-mini | US West | 378ms | 936ms | 1518ms | 442ms | 86 | |
| Together AI | mixtral-8x7b | US West | 430ms | 1108ms | 1710ms | 511ms | 112 |
Provider-by-provider breakdown
Best provider for US West by use case
| Use case | Best provider/model | Reason |
|---|---|---|
| Real-time chat | OpenAI gpt-4o-mini | Lowest P50 in the US West sample and strong token speed. |
| Batch processing | Together AI mixtral-8x7b | Higher output speed can beat slightly slower TTFT for long jobs. |
| Cost-sensitive routing | OpenAI mini-class models | Lower latency and smaller model cost usually align. |
How US West developers can reduce latency
US West is strong for teams deployed on west coast clouds. Cross-country hops add measurable latency, but the tail is still suitable for interactive chat.
- Do not route west coast user traffic through east coast application servers just to call an LLM API.
- Cache system prompts and retrieval snippets near the worker or serverless region that performs the model call.
- Track provider-specific status codes because network latency and rate limiting look similar in aggregate charts.
Compare this page with the global leaderboard and the benchmark method in How to Measure LLM Latency Correctly.