AI API Trust Layer, not another cheap proxy

Compare, benchmark, and verify AI APIs before production.

youtoken.cc helps developers test any OpenAI-compatible endpoint with their own key, compare model output in a blind arena, and publish reproducible reports. We do not store BYOK secrets. We do not call verification absolute truth. We show evidence.

Run the trust tools Read the methodology

BYOK Benchmark without saving provider API keys.

Elo Blind Arena voting updates transparent rankings.

Risk score Verifier reports probability, evidence, and limits.

Reports Shareable snapshots include time, node, model, params.

live preview / phase 1 static prototype

Arena prompt

Model A, anonymousstreaming

Route by policy, not by one metric. Cheap models handle low-risk utility calls; verified channels handle user-facing or regulated output; latency caps protect interactive flows.

Model B, anonymouscomplete

Cost is only useful after confidence is known. The system should record provider behavior, compare error modes, then pick the lowest-cost route that clears the trust threshold.

#1Claude Sonnet 4.51548+18

#2GPT-4.1 mini1514+7

#3DeepSeek V3.21496-4

#4Gemini 2.5 Flash1479+3

Three trust tools first. API resale later.

The MVP earns confidence before charging for routing. Each tool produces evidence users can inspect, share, and challenge.

P0 / Arena

Blind model comparison

Two models answer the same prompt anonymously. Users vote on usefulness, then the system reveals names and updates Elo.

Latest match

gpt-4.1-mini beat deepseek-chat by 61% preference over 93 votes.

Match pool

P0 / Perf Tester

BYOK endpoint benchmark

Paste a Base URL and temporary key for the current browser session. Reports include TTFB, tokens/sec, total latency, node, and params.

Base URL API Key, never stored

TTFB 318ms / 46 tok/s

P0 / Verifier

Provider claim verification

Run knowledge cutoff checks, response pattern analysis, optional logprob fingerprinting, and output a probability score with caveats.

Provider claim Target model

Confidence 74%, medium risk

Public reports are the growth loop.

Every useful test becomes a shareable artifact with enough metadata to reproduce or dispute the result.

perf/kr-seoul/gpt-4.1-mini/2026-05-21 Node: Korea VPS, cold start: no, concurrency: 1, prompt hash: 8f4c1a

public

verify/provider-alpha/gpt-4-claim Evidence: model list, cutoff questions, style fingerprint, user appeal open

medium risk

arena/code-generation/week-21 2,418 votes, session/IP limits enabled, Elo K=32, new model K=64

ranked

Report Service

Reproducible by default

Reports include model, endpoint class, timestamp, geographic node, parameters, tool version, limitations, and appeal status. No raw API keys. No private prompts in public reports unless the user explicitly publishes them.

Commercial API routing stays behind the compliance gate: legal entity, payment path, privacy terms, abuse limits, and upstream cost model must exist first.

Trust products must explain themselves.

The methodology page is part of the product, not documentation nobody reads. It tells users where the data is strong and where it is not.

Elo Arena Expected score = 1 / (1 + 10 ^ ((opponentRating - modelRating) / 400)). Regular K=32. New-model K=64 for the first 10 matches. Ties split the score.

BYOK Perf The browser sends a temporary key to the benchmark proxy for the current run only. The result stores timing, token counts, node, model, params, and a redacted endpoint class.

Verifier Reports combine connection checks, knowledge cutoff probes, response pattern analysis, and optional logprob fingerprinting. Output is a confidence estimate, not a legal finding.

Appeals and retests Providers can request retest, submit context, and attach public remediation notes. Paid promotion never changes independent report scoring.

Status, risk, and operations are first-class.

Developers will not trust a trust platform that hides incidents. The status board becomes the public operating ledger.

99.93% Gateway adapter uptime, last 30 days

418ms Median Korea node TTFB across public tests

0 keys BYOK secrets persisted after benchmark runs

12 Open provider appeals waiting for retest

Implementation path

The static page ships the story now. The product grows only where trust and compliance are ready.

Phase 1

Static trust prototype

Replace the old identity positioning with AI Trust Layer landing, tools mock, methodology, status, and waitlist.

Phase 2

Working tools

Build Arena, BYOK Perf, Verifier, public reports, limits, and provider appeals without payments.

Phase 3

Growth back office

Launch tools directory, code snippets, TrustMark workflow, audit log, SEO report pages, and moderation.

Phase 4

Commercial API

Only after entity, payment, compliance, cost model, abuse ceilings, and observability are verified.

Join the trust layer before it becomes a gateway.

Early users get Arena voting, BYOK benchmarking, verifier reports, and provider retest workflows first. Paid routing comes later, after the boring parts are real: entity, tax, privacy, audit logs, risk controls, payment rails.

hello@youtoken.cc Back to tools