AI Reviewer Agent

A two-pass LQA agent that diagnoses translation issues and surgically polishes the output using advanced AI models.

How it works

The AI Reviewer runs a two-pass LQA (Language Quality Assessment) process on every reviewed translation item:

  1. Diagnose — a fast model scans the source and translated text for specific issue types, assigns a quality score from 0 to 100, and returns a structured list of problems with suggested fixes.
  2. Polish — if issues are found, the full model applies the suggested fixes surgically, preserving all template variables, HTML tags, and ICU tokens untouched.

When the quality score is ≥ 95 the polish pass is skipped entirely — no second AI call, no extra cost.

Configuration

Add the reviewer fields to your existing buildtranslator.json. Below is a complete example with all standard fields and the reviewer enabled:

buildtranslator.json
{
  "sourceLanguage": "en",
  "targetLanguages": ["it", "fr", "de"],
  "localesPath": "./locales",
  "context": "SaaS web application for project management",
  "tone": "Professional but friendly",
  "aiReviewer": true,
  "aiReviewerExclude": ["de"]
}
  • aiReviewer — set to true to enable the agent globally for all target languages.
  • aiReviewerExclude — optional array of locale codes to skip, using the same format as targetLanguages (e.g. "de", "it"). Useful when a language already has a dedicated human reviewer or when the target market is low-priority.
The reviewer uses the context and tone fields you already set for translation — no extra configuration needed for those.

Running the reviewer

There are two ways to invoke the AI Reviewer. They differ in scope: one reviews only what was just translated, the other reviews your entire translated output.

As part of a translation run — delta-scoped

bash
npx @polycli/cli run --review

The --review flag chains the review step immediately after translation completes. Only the strings that were translated in the current run are reviewed — i.e. the same delta used by the translation phase. If three keys changed and were translated, only those three keys are sent to the reviewer. Strings that were already correct and untouched in this run are never re-reviewed.

This is the recommended mode for CI/CD pipelines: cost and time scale with the size of your commit, not the size of your entire locale file. aiReviewer: true in your config is equivalent to passing --review on every run.

When a new target language is added for the first time, its entire source is translated (no prior lock entry exists). In that case the reviewer also covers all translated strings for that language, since everything was freshly produced.

Standalone command — full review

bash
npx @polycli/cli review

Runs the reviewer over all previously translated strings that exceed the 15-word threshold, regardless of whether they changed recently. No translation pass is triggered. Use this for:

  • A one-time quality audit of translations produced by a third party or an older run.
  • After changing tone or context in your config and wanting to re-assess the full corpus under the new criteria.
  • Periodic full-corpus QA outside the normal CI/CD flow.
A full standalone review on a large locale file can consume significantly more credits than a delta-scoped run. Check your estimated word count before running it across many languages.

Credit cost

Credits are charged only when the AI actually rewrites a string — i.e. when the diagnose pass finds actionable issues and the polish pass produces a different text. Diagnosis-only runs (no issues found, or score ≥ 95) always cost zero.

text
cost = 3 × words(translated text)   — only when polishing occurs
cost = 0                            — when diagnosis finds no issues
cost = 0                            — when quality score is ≥ 95

For example, a 40-word string that needs fixing costs 120 credits (40 × 3). A 40-word string that is already correct costs 0 credits. The 3× multiplier covers the diagnosis, polish, and re-score passes only when all three are needed.

Markdown files

For Markdown, billing is based on the affected segments only, not the entire file. After diagnosis, the word count of each originalSegment returned in the issues list is summed, and the cost is 3 × words(affected segments). Paragraphs with no issues are never billed.

A balance check for the worst-case cost happens before any AI call. If your balance would be insufficient the request returns 402 Payment Required and no credits are consumed.

Content filtering

Not every string is worth sending to the reviewer. The agent applies the following thresholds:

  • JSON / ARB strings — only strings with more than 15 words are reviewed. Short UI labels such as button text or single-word values are skipped automatically.
  • Markdown files — entire files are passed to the reviewer with no word-count threshold applied.

Issue types

The diagnose pass classifies each detected problem into one of the following issue types:

false_friend

A word looks similar to a source-language word but carries a different meaning in the target language.

context_error

The translation is grammatically correct but wrong given the surrounding UI or domain context.

unnatural_phrasing

The text reads like a literal translation rather than natural target-language prose.

register_mismatch

The formality level (e.g. formal vs. informal address) differs from the rest of the product.

omission

Part of the source content is missing from the translation.

other

Any other quality issue that does not fit the categories above.

API endpoint

The CLI communicates with the following endpoint. You can also call it directly if you want to integrate the reviewer into your own pipeline.

text
POST /api/translate/review

Authentication: pass your API key in the x-api-key header.

Request body

json
{
  "originalText": "Welcome back! You have {count} unread messages.",
  "translatedText": "Willkommen zurück! Du hast {count} ungelesene Nachrichten.",
  "sourceLang": "English",
  "targetLang": "German",
  "sourceType": "json",
  "context": "SaaS web application for project management",
  "tone": "Professional but friendly"
}

Response body

json
{
  "qualityScore": 72,
  "issues": [
    {
      "originalSegment": "Du hast {count} ungelesene Nachrichten.",
      "issueType": "register_mismatch",
      "explanation": "Informal 'Du' used; product uses formal 'Sie' throughout.",
      "suggestedFix": "Sie haben {count} ungelesene Nachrichten."
    }
  ],
  "polishedText": "Willkommen zurück! Sie haben {count} ungelesene Nachrichten.",
  "wordsConsumed": 18
}

When no issues are found, issues is an empty array, qualityScore is ≥ 95, and polishedText equals the original translatedText.

Frequently asked questions

How can I measure the quality of my translations?

Every review returns a quality score from 0 to 100. The score is assigned by an AI model based on faithfulness to the original, naturalness of phrasing, register consistency, and completeness. A score above 90 indicates a near-perfect translation. Your PolyCLI dashboard shows the average quality score across all reviewed items, giving you a measurable, objective view of your localisation quality over time.

Does the reviewer change my existing translation files?

Yes — when issues are found, the polished text overwrites the draft translation directly in your output files (e.g. locales/it.json). The original source file is never modified.

Can I exclude a language from review?

Yes. Add the language name to aiReviewerExclude in your buildtranslator.json. The reviewer will skip those languages entirely and log them as skipped in the CLI output.

Why are short strings skipped for JSON / ARB files?

Short strings like button labels or single words rarely have nuanced translation issues, and reviewing them would consume credits with negligible quality benefit. The 15-word threshold is fixed and applies to the translated string length.

What happens if my credits run out mid-review?

The credit check happens before running the review. If your balance is too low for a particular string, that string returns a 402 error and the CLI stops with a clear message. Strings reviewed before the error are already written to disk.