AI Reviewer Agent
A two-pass LQA agent that diagnoses translation issues and surgically polishes the output using advanced AI models.
How it works
The AI Reviewer runs a two-pass LQA (Language Quality Assessment) process on every reviewed translation item:
- Diagnose — a fast model scans the source and translated text for specific issue types, assigns a quality score from 0 to 100, and returns a structured list of problems with suggested fixes.
- Polish — if issues are found, the full model applies the suggested fixes surgically, preserving all template variables, HTML tags, and ICU tokens untouched.
When the quality score is ≥ 95 the polish pass is skipped entirely — no second AI call, no extra cost.
Configuration
Add the reviewer fields to your existing buildtranslator.json. Below is a complete example with all standard fields and the reviewer enabled:
{
"sourceLanguage": "en",
"targetLanguages": ["it", "fr", "de"],
"localesPath": "./locales",
"context": "SaaS web application for project management",
"tone": "Professional but friendly",
"aiReviewer": true,
"aiReviewerExclude": ["de"]
}aiReviewer— set totrueto enable the agent globally for all target languages.aiReviewerExclude— optional array of locale codes to skip, using the same format astargetLanguages(e.g."de","it"). Useful when a language already has a dedicated human reviewer or when the target market is low-priority.
context and tone fields you already set for translation — no extra configuration needed for those.Running the reviewer
There are two ways to invoke the AI Reviewer. They differ in scope: one reviews only what was just translated, the other reviews your entire translated output.
As part of a translation run — delta-scoped
npx @polycli/cli run --reviewThe --review flag chains the review step immediately after translation completes. Only the strings that were translated in the current run are reviewed — i.e. the same delta used by the translation phase. If three keys changed and were translated, only those three keys are sent to the reviewer. Strings that were already correct and untouched in this run are never re-reviewed.
This is the recommended mode for CI/CD pipelines: cost and time scale with the size of your commit, not the size of your entire locale file. aiReviewer: true in your config is equivalent to passing --review on every run.
Standalone command — full review
npx @polycli/cli reviewRuns the reviewer over all previously translated strings that exceed the 15-word threshold, regardless of whether they changed recently. No translation pass is triggered. Use this for:
- A one-time quality audit of translations produced by a third party or an older run.
- After changing
toneorcontextin your config and wanting to re-assess the full corpus under the new criteria. - Periodic full-corpus QA outside the normal CI/CD flow.
Credit cost
Credits are charged only when the AI actually rewrites a string — i.e. when the diagnose pass finds actionable issues and the polish pass produces a different text. Diagnosis-only runs (no issues found, or score ≥ 95) always cost zero.
cost = 3 × words(translated text) — only when polishing occurs
cost = 0 — when diagnosis finds no issues
cost = 0 — when quality score is ≥ 95For example, a 40-word string that needs fixing costs 120 credits (40 × 3). A 40-word string that is already correct costs 0 credits. The 3× multiplier covers the diagnosis, polish, and re-score passes only when all three are needed.
Markdown files
For Markdown, billing is based on the affected segments only, not the entire file. After diagnosis, the word count of each originalSegment returned in the issues list is summed, and the cost is 3 × words(affected segments). Paragraphs with no issues are never billed.
A balance check for the worst-case cost happens before any AI call. If your balance would be insufficient the request returns 402 Payment Required and no credits are consumed.
Content filtering
Not every string is worth sending to the reviewer. The agent applies the following thresholds:
- JSON / ARB strings — only strings with more than 15 words are reviewed. Short UI labels such as button text or single-word values are skipped automatically.
- Markdown files — entire files are passed to the reviewer with no word-count threshold applied.
Issue types
The diagnose pass classifies each detected problem into one of the following issue types:
false_friendA word looks similar to a source-language word but carries a different meaning in the target language.
context_errorThe translation is grammatically correct but wrong given the surrounding UI or domain context.
unnatural_phrasingThe text reads like a literal translation rather than natural target-language prose.
register_mismatchThe formality level (e.g. formal vs. informal address) differs from the rest of the product.
omissionPart of the source content is missing from the translation.
otherAny other quality issue that does not fit the categories above.
API endpoint
The CLI communicates with the following endpoint. You can also call it directly if you want to integrate the reviewer into your own pipeline.
POST /api/translate/reviewAuthentication: pass your API key in the x-api-key header.
Request body
{
"originalText": "Welcome back! You have {count} unread messages.",
"translatedText": "Willkommen zurück! Du hast {count} ungelesene Nachrichten.",
"sourceLang": "English",
"targetLang": "German",
"sourceType": "json",
"context": "SaaS web application for project management",
"tone": "Professional but friendly"
}Response body
{
"qualityScore": 72,
"issues": [
{
"originalSegment": "Du hast {count} ungelesene Nachrichten.",
"issueType": "register_mismatch",
"explanation": "Informal 'Du' used; product uses formal 'Sie' throughout.",
"suggestedFix": "Sie haben {count} ungelesene Nachrichten."
}
],
"polishedText": "Willkommen zurück! Sie haben {count} ungelesene Nachrichten.",
"wordsConsumed": 18
}When no issues are found, issues is an empty array, qualityScore is ≥ 95, and polishedText equals the original translatedText.
Frequently asked questions
How can I measure the quality of my translations?
Every review returns a quality score from 0 to 100. The score is assigned by an AI model based on faithfulness to the original, naturalness of phrasing, register consistency, and completeness. A score above 90 indicates a near-perfect translation. Your PolyCLI dashboard shows the average quality score across all reviewed items, giving you a measurable, objective view of your localisation quality over time.
Does the reviewer change my existing translation files?
Yes — when issues are found, the polished text overwrites the draft translation directly in your output files (e.g. locales/it.json). The original source file is never modified.
Can I exclude a language from review?
Yes. Add the language name to aiReviewerExclude in your buildtranslator.json. The reviewer will skip those languages entirely and log them as skipped in the CLI output.
Why are short strings skipped for JSON / ARB files?
Short strings like button labels or single words rarely have nuanced translation issues, and reviewing them would consume credits with negligible quality benefit. The 15-word threshold is fixed and applies to the translated string length.
What happens if my credits run out mid-review?
The credit check happens before running the review. If your balance is too low for a particular string, that string returns a 402 error and the CLI stops with a clear message. Strings reviewed before the error are already written to disk.