English is no longer the language of the web

The internet sings in the full spectrum of languages.
The internet sings in the full spectrum of languages.
Image: AP Photo/Fabrice Coffrini

Conventional wisdom suggests that English is becoming “the world’s second language,” a lingua franca that many forward—looking organizations are adopting it as a working language. Optimists about the spread of English as a global second language suggest it will enable collaboration and ease problem solving without threatening the survival of mother tongues. Pointing to hundreds of thousands of Chinese children who learn English by shouting phrases back at teachers, the American entrepreneur Jay Walker offers the idea that English will be a language of economic opportunity for most speakers: they’ll work and think in their mother tongue, but English will allow them to communicate, share, and transact.

Cultural-preservation organizations like UNESCO aren’t as confident of this vision. They warn that English may crowd out less widely spoken languages as it spreads around the world through television, music, and film. But something more subtle and complicated appears to be going on. While English may be emerging as a bridge language, a wave of media is being produced in other languages, in newspapers, on television, and on the Internet. As technologies make it easier for people to communicate to broad and narrow audiences in their native languages, we’re discovering that linguistic difference is surprisingly persistent.

One way to consider the future of language in a connected world is to ask, “What percent of the Internet’s content is written in English?”

Look online for an answer to that query—posed in English—and you’re likely to encounter a website last updated in 2003, EnglishEnglish.com. The site’s “English Facts and Figures” page asserts that “80% of home pages on the Web are in English, while the next greatest, German, has only 4.5% and Japanese 3.1%.” The sources behind this assertion are unclear, but it’s consistent with early research on linguistic diversity online. In 1997, Geoffrey Nunberg and Hinrich Schütze released a study estimating that 80 percent of the World Wide Web’s content was in English. The Online Computer Library Center followed in 2003 with a study estimating that 72 percent of online content was in English.

These early studies led researchers to suggest that English had a “head start” that other languages would find difficult to overcome. With such a large user base of English speakers online, many websites would publish content only in English, and web users would adapt to monolingualism by improving their language skills, which in turn would increase the incentive to publish in English. Neil Gandal of Tel Aviv University analyzed web use in Quebec, Canada, in 2001 and observed that native French speakers spent 66 percent of their online time on English-language websites. Furthermore, young Quebecois looked at more English content than their elders, suggesting that language barriers would be even less relevant for a future generation of web users. Given that Francophone Quebecois were willing to read English content online, Gandal argued, website developers wouldn’t bother to localize their content, leading to a future with more sites entirely in English.

Both the 70–80 percent English “fact” and the head start theory have lingered despite evidence that the linguistic shape of the World Wide Web has changed dramatically in the past ten years as it expanded both in scale and in the number of authors creating content. One reason the “fact” persists is that it’s incredibly difficult to generate a believable estimate of language diversity online. Early studies tried to create a random sample of websites by choosing a selection of IP addresses, loading whatever page emerged, and using automated tools to determine what language it was written in. This method works poorly these days, when sites like Facebook, reached via a single IP address, include multilingual content generated by more than half a billion users. Newer methods rely on search engines to index the web, then attempt to estimate coverage of different languages on the basis of the comparative frequency of words in different languages.

Álvaro Blanco leads a team at FUNREDES (Foundation for Networks and Development), a Dominican Republic-based nonprofit organization focused on technology in the developing world, that has been researching linguistic diversity since 1996 by means of these new methods. Try your search query about English-language content online, posed in Spanish or most other Romance languages, and it’s his research that usually tops the search results. His team searches for “word concepts” in different languages, counting the results for “Monday” versus “Lunes” (Spanish) versus “Lundi” (French). In 1996, his research estimated that 80 percent of the content online was in English. That percentage fell steadily through successive experiments until 2005, when he estimated that 45 percent of online content was in English.

While Blanco’s research continues, he warns that search engines may no longer offer a representative sample of content online. “Twitter, Facebook, social networks—these are all difficult for search engines to index fully.” Blanco estimates that search engines now index less than 30 percent of the visible web, and suggests that the indexed subset skews toward English—language sites, often because those sites are the most profitable places to sell advertising. “My personal opinion is that English now represents less than 40% of online content,” Blanco offers, though he believes he’ll need to refine his methodology to prove his hunch.

Statistics about Internet usage show much faster growth in countries where English is not the dominant language has been. In 1996, more than 80 percent of Internet users were native English speakers. By 2010, that percentage had dropped to 27.3 percent. While the number of English-speaking Internet users has almost trebled since 2000, twelve times as many people in China use the Internet now as in 1996. Growth is even more dramatic in the Arabic—speaking world, where twenty-five times as many people are online as in 1996.

But that’s not the most important shift. When Gandal predicted that Quebecois web users would get accustomed to using sites like Amazon.com in English, he didn’t realize that most web users in 2010 would be creating content as well as consuming it. More than half of China’s 450 million Internet users regularly use a social media platform, writing blog posts, posting updates on Renren (China’s Facebook equivalent) or status messages to Sina Weibo, a microblogging site similar to Twitter. And the vast majority of those updates are in Chinese, not English.

On a visit to Amman, Jordan, in July 2005, the high point for me was a leisurely dinner with a dozen Jordanian bloggers, whose websites I’d been following in the run-up to the trip. As we looked over the ancient stone houses of Jabal Amman from the terrace of the restaurant, our conversation bounced between English and Arabic. “You guys all speak Arabic as a first language—why do you all blog in English?” I asked. Ahmad Humeid, a talented designer and the proprietor of the 360 East blog explained, “I want my perspectives on Jordan to be read around the world, which means I need to write in English. Besides, the people who only read Arabic aren’t reading blogs.”

Seven years on, Ahmad still blogs in English, but many newer Middle Eastern bloggers write primarily in Arabic. For multilingual web users, there’s a tipping point associated with language use. So long as most of your potential audience doesn’t speak your language, it makes sense to write in a second, more globally popular language. But once your compatriots have joined you online, the equation shifts. If you want to reach your friends, you may write to them in one language. If you want to engage a wider audience, you may use another. Haitham Sabbah, a passionate Jordanian Palestinian activist who served as Middle East editor for Global Voices from 2005 to 2007, now writes in English to criticize American and Israeli policy in the Middle East and in Arabic to critique Arab leaders, making those criticisms more opaque to international audiences. English is for engaging with a wide audience, while Arabic is a private language for disagreements he has with fellow Arabs, which he wants to keep “within the family.”

Gandal’s Quebecois research subjects may have read a lot of English-language content, but that doesn’t mean they preferred reading in a second language. While most of India’s 50 million Internet users speak English, a survey by the Indian market research company JuxtConsult revealed that almost three-quarters prefer and seek out content in their first languages. Cognizant of this preference, Google offers interfaces to its search engine in nine different Indian languages, and in over 120 languages in total. Given that 68 languages are spoken by at least 10 million speakers worldwide, other companies with global ambitions may be looking at developing Tagalog and Telugu interfaces in the near future.

When we began curating blog posts to publish on Global Voices, Rebecca and I realized we would need to address issues of language and translation. We hired editors fluent in French, Arabic, Russian, Chinese, and Spanish to translate conversations into English for publication on the site. In those early days, we never seriously considered publishing an edition other than in English, assuming that translating our work into other languages would be prohibitively expensive and that, since our community of editors and authors used English as a “working language,” everyone could read and appreciate our output.

Less than a year after we started the project, Portnoy Zheng, a Taiwanese university student, launched a Chinese edition of the Global Voices site. Taking advantage of the fact that Global Voices publishes using a Creative Commons license, Zheng and friends began selecting stories from Global Voices that caught their attention and posting Chinese translations on his website. After Portnoy accepted our offer to turn his site into an official Chinese edition of our site, hosted on our servers, Rebecca and I were flooded with requests to build other-language editions of Global Voices.

Why does it make sense to produce Global Voices in Malagasy, a language rarely spoken outside Madagascar, a country where only 1.5 percent of the population has access to the Internet? Our Malagasy contributors were worried that their language wouldn’t make the leap from the analog to the digital world. French, not Malagasy, is taught in schools, and French enjoys a higher prestige than Madagascar’s indigenous language. Our contributors were willing to do the work to publish the edition and help preserve the language. Though they personally were trilingual, they wanted to share their work, and the broader coverage of Global Voices, with friends and family who weren’t as comfortable reading English or French as they were.

Our Malagasy site is now read by a significant fraction of Madagascar’s online community and has inspired a new humility on the part of our editorial team regarding the importance of language. Translators, responsible for making our content accessible in more than thirty languages, now outnumber writers of original content for the site, and those sites, collectively, receive as much traffic as our English-language site.

In 2010, members of our community asked for an additional change to Global Voices: they wanted to publish original content in French, Spanish, and other languages besides English. This presents a challenge for our editorial team. While virtually everyone involved with the project speaks multiple languages, it’s hard for our editor in chief to take responsibility for posts in languages she does not speak. After a long debate, we reached a consensus, and now our multilingual newsroom translates stories written in over a dozen languages into English. This leads to uncomfortable moments: I sometimes glance at our servers and discover our most popular (often our most controversial) story is in a language I don’t read well, and I find myself waiting for our French-to-English translators to catch up so that I can understand what our team is publishing. But it’s clearly been the correct step to take. Our coverage of Francophone Africa is much stronger than in past years, because authors who can write easily in French can now compose in that language, then rely on a community of translators to make their work accessible in English.

Excerpted from Rewire: Digital Cosmopolitans in the Age of Connection, by Ethan Zuckerman. Copyright © 2013 by Ethan Zuckerman. With permission of the publisher, W.W. Norton & Company, Inc.