When a Mexican company processes legal documents with artificial intelligence, it pays more than an American company for the same work. Not because providers charge differently. But because AI literally doesn't speak your language as fluently.
We tested 8 AI models with real legal documents β NDA clauses, articles of incorporation, tax provisions β in English and Spanish. The results reveal an invisible "language tax."
What is a token and why should you care
AI models don't read words. They read text fragments called tokens. Sometimes a token is a complete word ("contract" = 1 token). Sometimes it's a piece of a word ("cumplimiento" = "cum" + "pl" + "imiento" = 3 tokens).
Why does it matter? Because everything is charged by token. API pricing, memory limits, response speed β everything depends on how many tokens the model processes.
And here's the problem: tokenizers β the algorithms that split text into tokens β were trained predominantly on English text.
How a tokenizer works
The algorithm is called BPE (Byte Pair Encoding). It takes a massive corpus of text and starts with individual characters. It finds which pairs appear together most frequently and merges them. "t" + "h" appear together millions of times in English β merge into "th". Then "th" + "e" becomes "the". And so on for thousands of iterations until building a vocabulary.
The result: common English words end up as compact tokens. Spanish words β especially technical, legal, or tax-related β never accumulated enough frequency to merge. They remain fragmented.
The word "fideicomiso" (trust) β a concept that moves billions of dollars annually in Mexico β breaks down into 4 meaningless fragments: "f", "ide", "icom", "iso". The tokenizer treats it as if it were a typo.
What we measured: 8 models, real legal documents
We took NDA, incorporation, and tax clauses in both languages and ran them through 8 models' tokenizers. All results were corrected by subtracting system prompt baseline to get real text overhead.
GPT-4 and GPT-4o (OpenAI) β the most popular API models
Claude Sonnet 4 (Anthropic) β measured directly via API
Llama 2 (Meta) β open source model with 32K vocabulary
Llama 3.3 70B (Meta) β run via Groq
Qwen 3.5 9B (Alibaba) β running locally on our own server
Qwen3 32B (Alibaba) β run via Groq
GPT-OSS 120B (OpenAI) β open-weight model, Apache 2.0 license
Result 1: The NDA clause
The same confidentiality agreement, processed by each model with baseline subtracted:
The range goes from +22.0% (Qwen 3.5 9B local) to +66.7% (Llama 3.3 and Qwen3 32B). Claude Sonnet 4 lands at +43.6%, worse than GPT-4 (+24.5%).
Result 2: Mexican legal vocabulary
Measured with Claude Sonnet 4, the most token-hungry terms:
"Sociedad AnΓ³nima Promotora de InversiΓ³n de Capital Variable" consumes 17 tokens. This name appears in every notarial document for thousands of Mexican companies.
Result 3: Bigger doesn't mean more efficient
Qwen3 32B was the worst model for Spanish (+66.7%), while Qwen 3.5 9B β same manufacturer but smaller β achieved +22.0%. Parameter count has no relationship with tokenizer efficiency.
Result 4: The local model wins
Qwen 3.5 9B running locally on our server had the lowest overhead (+22.0%). With local inference, the per-token cost is also zero β you pay electricity and hardware, regardless of language.
GPT-OSS: open weights, good tokenizer
GPT-OSS 120B from OpenAI, at +34.4% overhead, was the best open-weight model. It has an Apache 2.0 license, 117B parameters (5.1B active, MoE architecture), fits on a single 80GB GPU. The 20B version runs on 16GB VRAM.
What this means for your company
If your company processes legal, tax, or notarial documents through AI APIs, you're paying an invisible surcharge. With local inference, this "language tax" disappears.
Methodology
Tokens were measured using native tokenizers and each model's API. All API results (Groq, Anthropic) were corrected by subtracting system prompt baseline to get real text token counts. Offline tokenizers (GPT-4, Llama 2) need no correction. Test texts are standard legal clauses translated with semantic equivalence.
At Leeuwwolk, we process Mexican legal documents with AI models running on our own infrastructure. If you want to know how much you could be saving, contact us.