Gender Bias in Machine Translation Systems

Ullmann, Stefanie

doi:10.1007/978-3-030-88615-8_7

Gender Bias in Machine Translation Systems

Stefanie Ullmann⁴

Chapter
First Online: 01 February 2022

1000 Accesses
1 Citations

Part of the book series: Social and Cultural Studies of Robots and AI ((SOCUSRA))

Abstract

In recent years, headlines such as ‘Is Google Translate Sexist?’ (Mail Online, Is Google translate sexist? Users report biased results when translating gender-neutral languages into English in 2017) or ‘The Algorithm that Helped Google Translate Become Sexist’ (Olson, The Algorithm that Helped Google Translate Become Sexist in 2018) have appeared in the technology sections of the world’s news providers. The nature of our highly interconnected world has made online translators indispensable tools in our daily lives. However, their output has the potential to cause great social harm. Due to the continuous pursuit to create ever larger language models and, as a consequence thereof, the opaque nature of unsupervised training datasets, language-based AI systems, such as online translators, can easily produce biased content. If left unchecked, this will inevitable have detrimental consequences. This chapter addresses the nature, impact and risks of bias in training data by looking at the concrete example of gender bias in machine translation (MT). The first section will provide an introduction to recent proposals for ethical AI guidelines in different sectors and the field of natural language processing (NLP) will be presented. Next, I will explain different types of bias in machine learning and how they can manifest themselves in language models. This is followed by presenting the results of a corpus-linguistic analysis I performed of a sample dataset that was later used to train a MT system. I will explore the gender-related imbalances in the corpus that are likely to give rise to biased results. In the final section of this chapter, I will discuss different approaches to reduce gender bias in MT and present findings from a set of experiments my colleagues and I conducted ourselves to mitigate bias in MT. The research presented in this chapter takes a highly interdisciplinary approach, as it takes expertise from linguistics, philosophy, computer science and engineering in order to successfully dismantle and solve the complex problem of bias in NLP.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Hardcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
See Engström’s chapter in this anthology for a critique of these international frameworks.
2.
Note, however, that not all language communities and countries have equal access to NLP systems (Siavoshi 2020).
3.
The most common equivalent of a female ‘nurse’ in German is Krankenschwester or its abbreviated form Schwester. However, the counts for other, less frequent terms like Arzthelferin, were also included here. This is simply due to the fact that the English ‘nurse’ has multiple equivalents in German. Also, the German Schwester may also be translated into the English ‘sister’. These, however, were kept separate in the word counts.
4.
These approaches all address binary linguistic realisations of gender and more work is needed to ensure improved non-binary inclusivity in these systems (Darwin 2017; Ackerman 2019).

References

Ackerman, L. 2019. Syntactic and cognitive issues in investigating gendered coreference. Glossa: A Journal of General Linguistics 4(1): 117. https://doi.org/10.5334/gjgl.721.
BBC News. 2021. Reddit removed 6% of all posts made last year. 17 February. https://www.bbc.co.uk/news/technology-56099232 (accessed 23 May 2021).
Bender, E.M., and B. Friedman. 2018. Data statements for natural language processing: Toward mitigating system bias and enabling better science. Transactions of the Association for Computational Linguistics 6: 587–604.
Google Scholar
Bender, E.M., T. Gebru, A. McMillan-Major, and S. Shmitchell. 2021. On the dangers of stochastic parrots: Can language models be too big? Conference on Fairness, Accountability, and Transparency (FAccT ’21), 14. https://doi.org/10.1145/3442188.3445922.
Beukeboom, C.J. 2014. Mechanisms of linguistic bias: How words reflect and maintain stereotypic expectancies. In Sydney symposium of social psychology: Social cognition and communication, eds. J.P. Forgas, J. Laszlo, and O. Vincze, 313–330. New York: Psychology Press.
Google Scholar
Blodgett, S.L., S. Barocas, H. Daumé III, and H. Wallach. 2020. Language (technology) is power: A critical survey of bias in NLP. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. https://arxiv.org/abs/2005.14050.
Boddington, P. 2017. Towards a Code of Ethics for Artificial Intelligence. Cham: Springer.
Google Scholar
Bolukbasi, T., K.-W. Chang, J. Zou, V. Saligrama, and A. Kalai. 2016. Man is to computer programmer as woman is to homemaker? Debiasing word embeddings. Proceedings of the 30th International Conference on Neural Information Processing Systems, 4356–4364.
Google Scholar
Caliskan, A.J., J. Bryson, and A. Narayanan. 2017. Semantics derived automatically from language corpora contain human-like biases. Science 356(6334): 183–186.
Google Scholar
Chen, I.Y., F.D. Johansson, and D. Sontag. 2018. Why is my classifier discriminatory? Advances in Neural Information Processing Systems 31: 3543–3554.
Google Scholar
Chowdhury, G.G. 2003. Natural language processing. Annual Review of Information Science and Technology 37(1): 51–89. https://doi.org/10.1002/aris.1440370103.
Criado-Perez, C. 2019. Invisible Women: Exposing Data Bias in a World Designed for Men. London: Penguin.
Google Scholar
Darwin, H. 2017. Doing gender beyond the binary: A virtual ethnography. Symbolic Interaction 40(3):317–334.
Google Scholar
Davidson, T., D. Bhattacharya, and I. Weber. 2019. Racial bias in hate speech and abusive language detection datasets. https://arxiv.org/abs/1905.12516v1.
Dignum, V. 2018 Ethics in artificial intelligence: introduction to the special issue. Ethics and Information Technology 20:1–3. https://doi.org/10.1007/s10676-018-9450-z.
Google Scholar
Equality and Human Rights Commission. 2018. Equality act 2010. https://www.equalityhumanrights.com/en/equality-act/equality-act-2010 (accessed 23 May 2021).
Etzioni, A., and O. Etzioni. 2017. Incorporating Ethics into Artificial Intelligence. The Journal of Ethics 21: 403–418. https://doi.org/10.1007/s10892-017-9252-2.
Google Scholar
Eubanks, V. 2017. Automating Inequality: How high-tech tools profile, police, and punish the poor. New York: St. Martin’s Press.
Google Scholar
Floridi, L., J. Cowls, T.C. King, and M. Taddeo. 2020. How to Design AI for Social Good: Seven Essential Factors. Science and Engineering Ethics 26:1771–1796. https://doi.org/10.1007/s11948-020-00213-5.
Google Scholar
Friedman, B., and H. Nissenbaum. 1996. Bias in computer systems. ACM Transactions on Information Systems (TOIS) 14(3): 330–347.
Google Scholar
Garvey, S.C. 2021. Unsavory medicine for technological civilization: Introducing ‘Artificial Intelligence & its Discontents’. Interdisciplinary Science Review 46(1–2): 1–18. https://doi.org/10.1080/03080188.2020.1840820.
Google Scholar
Gehman, S., S. Gururangan, M. Sap, Y. Choi, and N.A. Smith. 2020. Real toxicity prompts: Evaluating neural toxic degeneration in language models. Findings of the Association for Computational Linguistics: EMNLP 2020, 3356–3369.
Google Scholar
Google AI. 2020. Artificial intelligence at Google: Our principles. https://ai.google/principles.
Government Digital Service (GDS) and Office for Artificial Intelligence (OAI). 2019. Understanding artificial intelligence ethics and safety. https://www.gov.uk/guidance/understanding-artificial-intelligence-ethics-and-safety.
Hagendorff, T. 2020. The Ethics of AI Ethics: An Evaluation of Guidelines. Minds & Machines 30: 99–120. https://doi.org/10.1007/s11023-020-09517-8.
Google Scholar
Hagerty, A, and I. Rubinov. 2019. Global AI ethics: A review of the social impacts and ethical implications of artificial intelligence. https://arxiv.org/ftp/arxiv/papers/1907/1907.07892.pdf.
Heaven, W.D. 2020. Open AI’s new language generator GPT-3 is shockingly good—and completely mindless. MIT Technology Review, 20 July. https://www.technologyreview.com/2020/07/20/1005454/openai-machine-learning-language-generator-gpt-3-nlp/ (accessed 23 May 2021).
HLEGAI (High Level Expert Group on Artificial Intelligence), European Commission. 2019. Ethics guidelines for trustworthy AI. https://ec.europa.eu/digital-single-market/en/news/ethics-guidelines-trustworthy-ai.
Indurkhya, N., and F.J. Damerau, eds. 2010. Handbook of Natural Language Processing, 2nd ed. Boca Raton: CRC Press.
Google Scholar
Jakobson, R., L.R. Waugh, and M. Monville-Burston. 1990. On language. Cambridge, MA: Harvard University Press.
Google Scholar
Jobin, A., M. Ienca, and E. Vayena. 2019. The global landscape of AI ethics guidelines. Nature Machine Intelligence 1:389–399. https://doi.org/10.1038/s42256-019-0088-2.
Google Scholar
Kilgarriff, A., V. Baisa, J. Bušta, M. Jakubíček, V. Kovář, J. Michelfeit, P. Rychlý, and V. Suchomel. 2014. The sketch engine: Ten years on. Lexicography 1(1): 7–36. http://www.sketchengine.eu.
Korteling, J.E., A.-M. Brouwer, and A. Toet. 2018. A neural network framework for cognitive bias. Frontiers in psychology. https://doi.org/10.3389/fpsyg.2018.01561.
Google Scholar
Mail Online. 2017. Is Google translate SEXIST? Users report biased results when translating gender-neutral languages into English. Mail Online, 1 December. https://www.dailymail.co.uk/sciencetech/article-5136607/Is-Google-Translate-SEXIST.html (accessed 13 May 2021).
Mittelstadt, B. 2019. Principles alone cannot guarantee ethical AI. Nature Machine Intelligence 1: 501–507. https://doi.org/10.1038/s42256-019-0114-4.
Google Scholar
Nadkarni, P.M., L. Ohno-Machado, and W.W. Chapman. 2011. Natural language processing: An introduction, Journal of the American Medical Informatics Association 18(5): 544–551. https://doi.org/10.1136/amiajnl-2011-000464.
Google Scholar
Nosek, B. A., M.R. Banaji, and A.G. Greenwald. 2002. Harvesting implicit group attitudes and beliefs from a demonstration web site. Group Dynamics: Theory, Research, and Practice 6(1): 101–115. https://doi.org/10.1037/1089-2699.6.1.101.
Google Scholar
Olson, P. 2018. The algorithm that helped Google translate become sexist. Forbes, 15 February. https://www.forbes.com/sites/parmyolson/2018/02/15/the-algorithm-that-helped-google-translate-become-sexist/?sh=7e5e82c87daa (accessed 13 May 2021).
Prates, M. O., P.H. Avelar, and L.C. Lamb. 2019. Assessing gender bias in machine translation: a case study with Google Translate. Neural Computing and Applications 32: 6363–6381. https://doi.org/10.1007/s00521-019-04144-6.
Google Scholar
Quah C.K. 2006. Machine translation systems. Translation and technology. Palgrave textbooks in translating and interpreting, 57–92. London: Palgrave Macmillan. https://doi.org/10.1057/9780230287105_4.
Reddy, S., and K. Knight. 2016. Obfuscating gender in social media writing. Proceedings of 2016 EMNLP Workshop on Natural Language Processing and Computational Social Science, 17–26.
Google Scholar
Rudinger, R., J. Naradowsky, B. Leonard, and B. Van Durme. 2018. Gender bias in coreference resolution. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2: 8–14.
Google Scholar
Sap, M., D. Card, S. Gabriel, Y. Choi, and N.A. Smith. 2019. The risk of racial bias in hate speech detection. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 1668–1678.
Google Scholar
Sattelberg, W. 2021. The demographics of reddit: Who uses the site? Alphr, 6 April. https://www.alphr.com/demographics-reddit/ (accessed 25 May 2021).
Shah, D., H.A. Schwartz, and D. Hovy. 2020. Predictive biases in natural language processing models: A conceptual framework and overview. https://arxiv.org/pdf/1912.11078.pdf.
Siavoshi, M. 2020. The importance of natural language processing for non-English languages. Towards Data Science, 22 September. https://towardsdatascience.com/the-importance-of-natural-language-processing-for-non-english-languages-ada463697b9d (accessed 22 May 2021).
Swan, O. 2015. Polish gender, subgender, and quasi-gender. Journal of Slavic Linguistics 23(1): 83–122. https://www.jstor.org/stable/24602179.
Tomalin, M., B. Byrne, S. Concannon, D. Saunders, and S. Ullmann. 2021. The practical ethics of bias reduction in machine translation: Why domain adaptation is better than data debiasing. Ethics and Information Technology. https://doi.org/10.1007/s10676-021-09583-1.
Google Scholar
Tsamados, A., N. Aggarwal, J. Cowls, J. Morley, H. Roberts, M. Taddeo, and L. Floridi. 2021. The ethics of algorithms: Key problems and solutions. AI & Society. https://doi.org/10.1007/s00146-021-01154-8.
Google Scholar
UNESCO. 2020. Elaboration of a recommendation on the ethics of artificial intelligence. https://en.unesco.org/artificial-intelligence/ethics.
Wagner, C., D. Garcia, M. Jadidi, and M. Strohmaier. 2015. It’s a man’s Wikipedia? Assessing gender inequality in an online encyclopaedia. Ninth International AAAI Conference on Web and Social Media. https://arxiv.org/abs/1501.06307.
Webster, K., M. Recasens, V. Axelrod, and J. Baldridge. 2018. Mind the GAP: A balanced corpus of gendered ambiguous pronouns. https://arxiv.org/abs/1810.05201.
Wesslen, R., D. Markant, A. Karduni, and W. Dou. 2020. Using resource-rational analysis to understand cognitive biases in interactive data visualizations. IEEE VIS 2020 Workshop on Visualization Psychology (VisPsych). https://arxiv.org/abs/2009.13368v2.
Wikimedia Foundation. 2020. Addressing wikipedia’s gender gap. https://wikimediafoundation.org/our-work/addressing-wikipedias-gender-gap/ (accessed 23 May 2021).
Yu, H., Z. Shen, C. Miao, C. Leung, V. R. Lesser, and Q. Yang. 2018. Building ethics into artificial intelligence. Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI'18), 5527–5533. https://arxiv.org/abs/1812.02953.
Zhao, J., T. Wang, M. Yatskar, V. Ordonez, and K.-W. Chang. 2017. Men also like shopping: Reducing gender bias amplification using corpus-level constraints. https://arxiv.org/pdf/1707.09457.pdf.
Zhao, J., T. Wang, M. Yatskar, V. Ordonez, and K.-W. Chang. 2018. Gender bias in coreference resolution: Evaluation and debiasing methods. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2: 15–20.
Google Scholar

Download references

Author information

Authors and Affiliations

Centre for Research in the Arts, Social Sciences and Humanities, University of Cambridge, Cambridge, UK
Stefanie Ullmann

Authors

Stefanie Ullmann
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Stefanie Ullmann .

Editor information

Editors and Affiliations

Sociology, Brandon University, Brandon, MB, Canada
Ariane Hanemaayer

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Ullmann, S. (2022). Gender Bias in Machine Translation Systems. In: Hanemaayer, A. (eds) Artificial Intelligence and Its Discontents. Social and Cultural Studies of Robots and AI. Palgrave Macmillan, Cham. https://doi.org/10.1007/978-3-030-88615-8_7

Download citation

DOI: https://doi.org/10.1007/978-3-030-88615-8_7
Published: 01 February 2022
Publisher Name: Palgrave Macmillan, Cham
Print ISBN: 978-3-030-88614-1
Online ISBN: 978-3-030-88615-8
eBook Packages: Social SciencesSocial Sciences (R0)

Publish with us

Policies and ethics