AI API Latency Test
Measure model response time, first token latency, streaming behavior, and OpenAI-style compatibility. Built for relays, gateways, and compatible endpoints.
Notes
Advanced raw responses
Request payload
{}
/v1/models
{}
/v1/chat/completions
{}
/v1/chat/completions (stream)
{}
How it works
1) Send a GET request to /v1/models. 2) Send a non-stream chat completion. 3) Send a stream request and measure TTFT.
When to use it
Use this page when you have an OpenAI-compatible endpoint, a relay, or a proxy and want to know if it is actually fast enough for real chat usage. Compatibility alone is not enough. A slow endpoint can still pass the API shape test, so the latency metrics matter just as much as the status label.
How to read the result
PASS means the tested calls follow the expected OpenAI-style API shape.
Fast, Usable, Slow, or Very slow is based on chat latency, TTFT, and streaming total time.
Time to first token matters most for chat UX. Under 2 seconds is good, 2-6 seconds is usable.
FAQ
What is an AI API latency test?
It checks how quickly an AI endpoint answers, how fast the first token appears, and whether streaming is supported.
What does TTFT mean?
TTFT is time to first token. It is the first streaming delay users feel in chat-like interfaces.
Does PASS mean the API is fast?
No. PASS means the tested API calls worked in an OpenAI-style way. The endpoint can still be slow.
Why do latency results change?
Network path, endpoint load, model choice, and relay quality can all change the numbers from one test to the next.
Can this detect a bad relay or proxy?
It can reveal slow or incompatible behavior, which is often enough to spot a shaky relay. It is a practical test, not a full audit.
Do you store my API key?
No. The key is only used for the live request and is not saved in the app.