TTFT means time to first token. It is the delay between sending a prompt and seeing the first streamed token.

Latency Test Checks if it feels fast. API Validator Checks if it works. Compatible Checker Checks if it looks like OpenAI.

Small tool, real signal

AI API Latency Test

Q: What is an AI API latency test?

It measures how fast an AI endpoint responds, how long it takes to produce the first token, and whether streaming works.

Q: Does PASS mean the API is fast?

No. PASS only means the tested endpoints behaved like an OpenAI-style API. The endpoint can still be slow.

Q: Can this detect a bad relay or proxy?

It can catch slow, broken, or incompatible relays and proxies, but it cannot prove every endpoint is honest.

Q: Do you store my API key?

No. The key is only used for the live request and is not stored in the app.

Measure model response time, first token latency, streaming behavior, and OpenAI-style compatibility. Built for relays, gateways, and compatible endpoints.

LatencyMeasure first-byte and response times.

StreamingCheck TTFT and event stream support.

CompatibilityProbe /v1/models and /v1/chat/completions.

Run a test

Base URL

API Key

Model

Available models

Fetch models first, then click one to use it.

Prompt

Fetch models first if you are not sure which model ID to test.

No account, no stored key. Tests run server-side and are not saved.

How it works

1) Send a GET request to /v1/models. 2) Send a non-stream chat completion. 3) Send a stream request and measure TTFT.

When to use it

Use this page when you have an OpenAI-compatible endpoint, a relay, or a proxy and want to know if it is actually fast enough for real chat usage. Compatibility alone is not enough. A slow endpoint can still pass the API shape test, so the latency metrics matter just as much as the status label.

How to read the result

Compatibility

PASS means the tested calls follow the expected OpenAI-style API shape.

Performance

Fast, Usable, Slow, or Very slow is based on chat latency, TTFT, and streaming total time.

TTFT

Time to first token matters most for chat UX. Under 2 seconds is good, 2-6 seconds is usable.

FAQ

What is an AI API latency test?

It checks how quickly an AI endpoint answers, how fast the first token appears, and whether streaming is supported.

What does TTFT mean?

TTFT is time to first token. It is the first streaming delay users feel in chat-like interfaces.

Does PASS mean the API is fast?

No. PASS means the tested API calls worked in an OpenAI-style way. The endpoint can still be slow.

Why do latency results change?

Network path, endpoint load, model choice, and relay quality can all change the numbers from one test to the next.

Can this detect a bad relay or proxy?

It can reveal slow or incompatible behavior, which is often enough to spot a shaky relay. It is a practical test, not a full audit.

Do you store my API key?

No. The key is only used for the live request and is not saved in the app.

AI API Latency Test

Notes

Request payload

/v1/models

/v1/chat/completions

/v1/chat/completions (stream)

How it works

When to use it

How to read the result

FAQ