Current latency table

Provider	Model	Region	P50	P95	P99	TTFT	Tokens/sec	Collected
OpenAI	gpt-4o-mini	US West	378ms	936ms	1518ms	442ms	86	May 12, 2026, 01:54 PM UTC
Together AI	mixtral-8x7b	US West	430ms	1108ms	1710ms	511ms	112	May 12, 2026, 01:51 PM UTC

Provider-by-provider breakdown

Best provider for US West by use case

Use case	Best provider/model	Reason
Real-time chat	OpenAI gpt-4o-mini	Lowest P50 in the US West sample and strong token speed.
Batch processing	Together AI mixtral-8x7b	Higher output speed can beat slightly slower TTFT for long jobs.
Cost-sensitive routing	OpenAI mini-class models	Lower latency and smaller model cost usually align.

How US West developers can reduce latency

US West is strong for teams deployed on west coast clouds. Cross-country hops add measurable latency, but the tail is still suitable for interactive chat.

Do not route west coast user traffic through east coast application servers just to call an LLM API.
Cache system prompts and retrieval snippets near the worker or serverless region that performs the model call.
Track provider-specific status codes because network latency and rate limiting look similar in aggregate charts.

Compare this page with the global leaderboard and the benchmark method in How to Measure LLM Latency Correctly.

LLM API Latency from US West

Current latency table

Best provider for US West by use case

How US West developers can reduce latency