Home

Donate
Perspective

How Measurement Can Fix Content Moderation's Language Equity Gap

Sujata Mukherjee, Sasha Maria Mathew / Feb 13, 2026

Jamillah Knowles & Reset.Tech Australia / Better Images of AI / People with phones / CC-BY 4.0

As the world’s tech leaders prepare to gather for the India AI Impact Summit on February 16, one topic is no longer optional: language equity. In a country like India, where linguistic diversity isn’t just a feature but a way of life, the "language gap" in AI isn't just a technical glitch—it’s a barrier to safety, participation, and economic opportunity.

Of roughly 7,000 global languages, only a handful are thriving digitally, with a mere 10 languages making up 82% of all internet content. This massive resource disparity is the core challenge for content moderation in non-English languages. Before Large Language Models (LLMs) emerged, moderation tools and human reviewers already struggled, often failing to capture the linguistic complexity of transliteration, code-switching, and sophisticated algospeak, which is common in non-English content.

The rise of LLMs presents a paradox. While these systems are often hailed for being language-agnostic, research shows they are built on a foundation that heavily favors English and a few other dominant languages, creating a typological echo-chamber. This creates a "poor get poorer" cycle in the digital space. High-resource languages get the best moderation tools, the most accurate chatbots, the safest filters and dominate performance leaderboards. Meanwhile, communities speaking "low-resource" languages are left with tools that don't understand their slang, their cultural nuances, or their safety risks.

How should platforms distribute finite investment resources across different languages? Most platforms default to a utilitarian or welfare maximizing approach. They look at where the most users are and put their money there. On paper, this makes sense for shareholders: invest in English and Spanish first because that’s where the volume is. However, this approach entrenches existing digital exclusion, fails to mitigate risks for smaller, highly vulnerable communities and leaves user trust and safety gaps unaddressed. In the long run, it also limits market expansion and user retention.

An alternative approach: focusing on capabilities

Instead of just counting heads, we propose a capabilities approach. This means shifting the focus from inputs—how much money we spend, to outcomes- what users can actually do. Language "support" should not mean clunky or tone-deaf translation. Real equity means a user in a rural village in South East Asia should have the same level of safety and agency on a platform, as a user in New York. For example, relying on machine translation for low-resource languages might look like language coverage, but it fails to capture critical contextual, cultural, and idiomatic nuance—and is sometimes just plain wrong.

We believe that bridging the language divide isn't just a matter of how much money you throw at it. It’s about where that money goes. "Investment should be deployed toward strengthening digital capabilities of language communities, rather than brittle off-the-shelf translation tools that don't expand what communities are capable of. A capability-enhancing investment would open doors for people to actually participate in the digital spaces—whether that’s finding reliable info, joining a conversation, or sharing their own culture. Such investments might look like: adapting a model to encode local cultural awareness (making sure a model understands local social norms, traditional foods, or the importance of a specific festival) and engaging community partners (working with people on the ground to create high-quality data that reflects how they actually speak). Since no budget is bottomless, we suggest a "risk-based" approach. By targeting communities that need it most, we can find cost-effective ways to make a huge difference, backed by clear metrics that prove the value to both society and the business.

Risk-based targeting

We suggest using a simple formula to find out who needs help most: R = f(Vulnerability, Service Impact), where

  • Vulnerability: Does this community face social instability, conflict, or high numbers of younger users?
  • Service impact: How much do they rely on the platform, and how bad is the current quality of the AI they are using?

Populations that are highly vulnerable and experience high service impact present the potential for the greatest 'capability gains'. This allows platforms to make continuous, incremental, and targeted investments. Examples of cost-effective interventions include:

  • Community-in-the-loop: micro annotation sprints with local speakers and cultural experts to create small, high-quality fine-tuning datasets for priority low-resource languages.
  • Artificial corpora generation: Using LLMs to annotate and parse cultural datasets, Creating datasets that replicates low-resource language features using generative models and culturally curated seed data
  • Multi-agent interaction: Simulating cross-cultural dialogue to generate diverse, high-quality cultural datasets that improve understanding and alignment
  • Leveraging regional models: Utilizing smaller, specialized multilingual models developed by local actors (e.g., AfriBERTa, SEA-LION, IndicBERT), which are more cost-effective, require fewer computational resources, and avoid the biases of larger models

Measuring progress: core metrics for language equity

How will platforms know they’re making progress? We suggest measuring at both ends of the investment: baseline assessments on the language equity gap (which should guide investments and be monitored for improvement over time) and post-facto assessments that show impact in quantifiable business outcomes and societal justice. Detailed tables are attached to this article with full descriptions of suggested metrics, alternatives and their limitations, but here is the gist of what we should be looking at:

  1. State of Equity, or Estimating the Language Equity Gap (Table 1 below). In order to measure how “digitally ready” a language is, we suggest three “buckets” of metrics to track.
    1. Resource parity, using tools like the Digital Language Equity (DLE) metric. This helps us see which languages are starving for data. If a language has a lot of speakers but a low DLE score, that’s a red flag for investment.
    2. Model performance and fairness, to compare moderation outcomes across languages. Metrics like F1 Parity compare how well an LLM moderation model works in English versus another language. F1 score is a standard accuracy metric adopted by most model providers to balance precision and recall. F1 parity requires the same standard of performance across languages. We also look for equalized odds—basically, is the model accidentally flagging "safe" content as "harmful" more often in some languages than others?
    3. Cultural competence: This is the most overlooked part. Does the model understand local food, festivals, or social norms? New benchmarks (like DOSA for Indian cultures) help us test if an LLM is actually "smart" or just repeating Western biases.
  2. Measuring the real world impact, in business and societal terms (Table 2 below). We look at four main areas to see if changes are actually working. While some of these are hard to measure and require expert audits, they provide the return on investment (ROI) story that set up language equity as a core business priority rather than a side project.
    1. Reducing real-world risk: We track the capability gain—basically, how much better the model performs after we give it a boost with local community data. We also measure if the overall "Risk Function" for a community is dropping over time.
    2. Encouraging user participation: One of the biggest hurdles for non-English speakers is false positives, that is, when the model removes a harmless post because it doesn't understand the slang or the script (like transliteration). If persistent, such removals can have a silencing effect on a language community. We also look at appeal parity: if a user in Ethiopia appeals a deleted post, do they have the same chance of a fair review as someone in Germany?
    3. Building transparency: When content is removed, users deserve to know why. This metric tracks whether platforms are providing clear, culturally relevant explanations in the user's native language rather than just a generic template.
    4. The business bottom line: Language equity is good for growth. We track "engagement rate lift" (do people post more when they feel safe and understood?) and "safety-related churn" (how many users are leaving the platform because they’re tired of being misunderstood or harassed?).

Conclusion

The current utilitarian approach to internationalization is regressive. It keeps the best tools in the hands of the few while leaving the rest of the world to hobble along. But as we head into the India AI Impact Summit, the goal is clear: we need to move from an "English-first" mindset to an "equity-first" reality.

Using our more progressive, risk-based strategy, even platforms with modest budgets can stop the language gap from widening. The Risk Function R = f(Vulnerability, Service Impact) enables focus on where the vulnerability is highest through cost-effective solutions like smaller regional models and local community partnerships. In future work, we will explore next-gen AI solutions like using generative models for artificial corpora and multi-agent dialogue simulations to augment and fill gaps with sourcing community partners.

Ultimately, investing in language equity isn't a passion project. We believe language equity is good business. When platforms work better for more people, user churn goes down (people don't leave because of trust and safety issues) and engagement goes up. The platform is “sticky” and the community thrives. Tools for LLM-driven content moderation are becoming more widely adopted everyday and this article proposes a set of metrics to help make them work better than traditional review methods. Now, we just need the collective will to point these tools in the right direction.

* * *

Table 1: Metrics for Assessing Language Equity in Content Moderation
Metric Category Metric/ Benchmark Definition & Formula (where applicable) Significance and trade-offs

LANGUAGE RESOURCE PARITY

Language resource parity is difficult to measure, but quantitative (token count), qualitative (multilingual data quality index) and holistic language resourcedness measures like the Language Ranker Score and the region-specific MasakhaNER are good proxies to start with.

Digital Language Equity (DLE) metric Comparative score ranking languages by their digital resourcing against a number of technological factors, which include availability of corpora, language models and computational grammars, and lexical resources specific to those languages.

Supports identification of investment opportunities—languages with low DLE scores and medium-high speaker populations should automatically qualify for equity advancement measures.

The DLE metric was created for European languages and has limited applicability to other languages families; it is largely quantitative and favours large-scale datasets over smaller, high-quality datasets.

MasakhanNER MasakhanNER is a named-entity recognition benchmark which measures representational equity across 20 African languages via variance or ΔF1 and zero-shot transfer ratio Offers unique insights into language performance gaps, though only applicable to African languages and domain coverage limited to news

MODEL PERFORMANCE

Model performance is typically assessed via a host of capability assessment benchmarks, which run on a variety of tasks and are measured by accuracy and other metrics. For content moderation, we use F1 scores and equalized odds.

F1 parity ratio

The ratio between F1 performance in non-English languages against the English language baseline, offers a crude view of the language gap and enables cross-language performance comparisons.

\( \frac{F1_{(X)}}{F1_{(English)}} \)

Where X is a random low resource language.

Useful to highlight critical investment needs, but only useful when comparing the same or like models, on similar tasks, with similar evaluation datasets
Equalized odds

The ratio of TPR and FPR across differently resourced languages provides a relative measure of fairness, where 1 = perfect equity, and 0 is perfect inequity. (See below)

\( *\; Equalized\; odds\; ratio = \tfrac{\min(TPR_X,\, TPR_{En})}{\max(TPR_X,\, TPR_{En})} \)

Where TPR = \( \tfrac{TP}{TP + FN} \) and FPR = \( \tfrac{FP}{FP + TN} \)

Detection of harmful content and flagging of benign content at the same rates across languages is the equity ideal, but practically impossible to achieve. It can serve as a directional reference point

CULTURAL COMPETENCE

Cultural competence or awareness metrics are cross-cutting and typically span: cultural common-sense and knowledge, bias (behavior social groups in different sociocultural and linguistic settings) and safety (toxicity, malicious instructions etc)

Culture-specific common sense knowledge benchmarks may be

auto-generated and multicultural (CANDLE-CSSK) or supplemented with local native authorship

and mono/ multilingual

CLIcK (South Korea), DOSA ( Indian cultural geographies), ARADiCE-Culture (Arabic sub-cultures)

CDEval tests sociocultural value systems.

CANDLE-CSSK measures cultural knowledge recall and common-sense reasoning in relation to a target group or facet. Covers over 386 cultural groups and multiple cultural facets (clothing, food, rituals etc)

Region-specific benchmarks have local authenticity and native authorship. Some examples are the CLiCK benchmark (Korea-specific cultural commonsense or reasoning) DOSA (Indian subculture); AraDiCE-Culture (Arabic subculture)

The CDEval benchmark evaluates the cultural orientation of LLMs across 6 dimensions and 7 domains (education, family, wellness, work etc)

Cultural competence measures assess how well a model is able to function in regional and linguistic cultural contexts. From cultural 'common sense' reasoning and cultural value orientation, to understanding of local biases and safety that is resilient across diverse language typologies, cultural competence is critical for parsing of context in global content moderation tasks. This is an often overlooked facet of model fitness for sociolinguistic moderation.

Notable limitations: can be expensive to construct natively; non-standardized (due to inconsistent conceptions of cultural boundaries) and non-repliable across cultures; may be reductive of sub-national/regional cultural diversity and dialectical variation. Many benchmarks (inclusive of tasks and datasets) rely on machine translation with light human validation—which risks importation of non-local norms and biases

Demographic bias

MBBQ—Multilingual bias benchmarks for question answering;

Multilingual HolisticBias/MMHB

Multilingual HateCheck

MBBQ tests if a model's bias behavior is consistent across languages (English, Dutch, Spanish, Turkish). BharatBBQ and KoBBQ evaluate model fairness in the Indian and Korean socio-cultural contexts (specific identity axes) respectively.

MMHB—Detects representational bias in open-ended text generation across 50+ languages based on descriptors of social groups and measured by sentiment polarity, affect/regard, toxicity score and lexical bias.

MHC—Test suite covering 20 functional categories of hate speech across 10 languages, written by native speakers using a shared template

SAFETY X-Safety Assesses if models trained largely in English are able to generalise safely in non-English languages; covers 10 languages and 14 types of safety issues (toxicity, discrimination, misinformation, malicious instructions etc) Multilingual safety benchmarks skew toward HRLs and are often based on English-origin translations—which may center non-local conceptions of harm, but are useful to expose the large gap in safety performance for non-English languages—which is crucial for global deployment.
\[ * \; Equalized \; odds \; ratio = \min\!\left(\frac{\min(TPR_X, \, TPR_{English})}{\max(TPR_X, \, TPR_{English})} \;,\; \frac{\min(FPR_X, \, FPR_{English})}{\max(FPR_X, \, FPR_{English})}\right) \]
Table 2: Outcome-Based Metrics for Language Equity Interventions
Metric Category Metric/ Benchmark Definition & Formula (where applicable) Significance in Content Moderation Trade-offs/ Considerations
RISK REDUCTION Capability Gains Score The change in Model Performance metrics (e.g., 𝚫F1 parity ratio) for a priority low-resource language (LRL) post-intervention (e.g., after fine-tuning with a Community-in-the-Loop dataset). Directly measures the return on investment (ROI) of a capabilities-based intervention (e.g., 𝚫 investment leads to 𝚫performance gain). Gains may regress without continuous community input.
Risk Index Reduction The percentage reduction in a language community's Risk Index score over a defined period. The Risk Index combines Vulnerability and Service Impact, as defined by R = f(V, I). Quantifies the success of the pragmatic approach: prioritizing and mitigating the risk for the most vulnerable/impacted communities. Can be resource-intensive to track accurately.
USER PARTICIPATION False Positive Impact Score The rate of benign user content incorrectly flagged and removed (False Positives, FP) specifically tied to complex linguistic phenomena (e.g., transliteration, code-switching) in an LRL post-intervention. Requires sophisticated labeling schema.
Appeal Success Rate Parity The ratio of the appeal success rate for removed content in a priority LRL to the appeal success rate in a High-Resource Language (HRL), e.g., English. Reveals whether the moderation decision review process (human or automated) is equally fair and accurate for LRL users compared to HRL users A high parity ratio could mask problems if the baseline (HRL) appeal rate is low or if LRL users are less aware of their right to appeal (digital literacy issues).
TRANSPARENCY Contextual Transparency

Quantitative: what proportion of high-risk language communities receive regulation-standard explanations for moderation actions.

Qualitative: measure of the accuracy and cultural relevance of moderation transparency notifications, especially for nuanced policy violations in an LRL.

Assesses coverage and quality of communication to users impacted by moderation actions Primarily qualitative; scores are subjective and rely on expert-led audits or user surveys, which are expensive and slow to conduct at scale across many LRLs.
BUSINESS IMPACT Community-based engagement rate lift

Engagement rate is a simplistic measure that offers insight into the content performance over a period.

\( Engagement\; rate = \tfrac{Total\; engagement}{Total\; reach} \times 100 \)

\( ER\; Growth\; Rate = \tfrac{ER_{Prev} - ER_{Curr}}{ER_{Prev}} \times 100 \)

Moderation is considered a baseline condition for content performance; measuring the improved reach of content by impressions rather than user count, offers a simple insight into whether improved moderation improves content performance.

Qualitative, language-specific surveys may be expensive to construct and scale (particularly in terms of eliciting safety-related responses) and low participation may compromise significance.

As lagging indicators, causal attribution may be challenging.

In addition, defining language communities for ER measurement is difficult due to international and sub-national complexity and diversity.

Safety-related user churn

Safety-related user churn requires intentional exit surveys or analysis of customer support tickets, denominated by the target language community

\( Safety\; churn = \tfrac{\#\text{ users churning (safety)}}{\#\text{ users at start}} \)

Offers insight into the negative business impact of ineffective moderation.
\[ Engagement \; rate = \frac{Total \; engagement}{Total \; reach} * 100 \] \[ ER \; Growth \; Rate = \frac{ER_{Previous \; period} - ER_{Current \; period}}{ER_{Previous \; period}} * 100 \] \[ Safety \; user \; churn = \frac{\#\text{ of users churning for safety reasons}}{\#\text{ of users at start of period}} \]

Authors

Sujata Mukherjee
Sujata Mukherjee is a problem solver, trust & safety leader and digital anthropologist with 20 years building trusted product experiences, leading research and scaling CX functions. She is currently Senior Director of Trust & Safety Product Management at Upwork, and previously led research, advocacy...
Sasha Maria Mathew
Sasha Mathew is a technology policy leader with 10+ years at the intersection of digital rights, online safety and regulatory policy, in jurisdictions across the US, UK, Europe and India. She is currently the global head of the platform policy team at Bumble. Previously, she was a product policy lea...

Related

Perspective
When AI Can’t Understand Your Language, Democracy Breaks Down December 11, 2025
Analysis
Lost in Translation: How Content Moderation Fails Tamil Speakers OnlineMay 19, 2025

Topics