Toxicity Detection
Disinformation: Knowledge Repository
On Bias and Fairness in NLP: Investigating the Impact of Bias and Debiasing in Language Models on the Fairness of Toxicity Detection
Language models have become the state-of-the-art in natural language processing (NLP) and are increasingly applied to various NLP tasks. While research has shown that these models exhibit bias, the extent to which this bias affects the fairness of downstream NLP tasks remains underexplored. Additionally, despite the introduction of numerous debiasing techniques, their impact on the fairness of NLP tasks has not been thoroughly studied.
In this work, we examine three key sources of bias in NLP models—representation bias, selection bias, and overamplification bias—and analyse their influence on the fairness of toxicity detection. Furthermore, we assess how applying different bias removal techniques affects the fairness of this task.
Our results provide strong evidence that downstream sources of bias, particularly overamplification bias, have the most significant impact on the fairness of toxicity detection. Additionally, we find strong evidence that mitigating overamplification bias by fine-tuning language models on datasets with balanced contextual representations and balanced ratios of positive examples across identity groups can enhance fairness in toxicity detection. Based on our findings, we propose a set of guidelines to ensure the fairness of the task of toxicity detection.
More details in the related outputs
Fatma Elsafoury, Stamos Katsigiannis. 2023. On Bias and Fairness in NLP: Investigating the Impact of Bias and Debiasing in Language Models on the Fairness of Toxicity Detection. arXiv preprint, arXiv:2305.12829v3. https://doi.org/10.48550/arXiv.2305.12829