Google’s New Dermatology App Wasn’t Designed for People With Darker Skin

The company trained the system to recognize different skin conditions. But like Google itself, the app's data has a diversity problem.

by Todd Feathers

May 20, 2021, 1:40pm

A cartoon of a person in a peach shirt holding up a smartphone to their hand

Image: Google

Google on Tuesday unveiled a dermatology app that it says can recognize 288 different skin conditions from pictures, and there’s something very Google about it. The deep learning system the app is based on was originally trained and tested on a dataset that—like the company itself—vastly underrepresents people with dark skin tones.

The app, according to the company’s announcement, grew out of a May 2020 study Google researchers published in the journal Nature Medicine in which they demonstrated the efficacy of using deep learning to recognize skin conditions.

To accomplish the task, the researchers used a training dataset of 64,837 images of 12,399 patients located in two states. But of the thousands of skin conditions pictured, only 3.5 percent came from patients with Fitzpatrick skin types V and VI—those representing brown skin and dark brown or black skin, respectively. 90 percent of the database was composed of people with fair skin, darker white skin, or light brown skin, according to the study.

As a result of the biased sampling, dermatologists say the app could end up over- or under-diagnosing people who aren’t white.

“This kind of cavalier attitude that some in tech have had when it comes to health is not surprising. They’re rolling out things without necessarily thinking about the public health implications,” Dr. Ade Adamson, a dermatologist at the University of Texas’s Dell Medical School, told Motherboard. “There should have been some caution here. There should have been a prospective [bias and accuracy] study that they put out for us to look at.”

In its announcement, Google wrote that it had accounted “for factors like age, sex, race and skin types—from pale skin that does not tan to brown skin that rarely burns.” The dataset for the Nature paper was composed of 43.7 percent people who self-identified as Hispanic or Latinx, 34 percent White, 11 percent Asian, and 6.8 percent Black, according to the study.

Google then published another study in JAMA Network Open in April 2021 in which it measured the accuracy of its system’s diagnoses when used alone, compared to general physicians and specialists making diagnoses and the accuracy of diagnoses when general physicians and specialists used its system as an aid.

According to slides Google shared with Motherboard—from a presentation given after the publication of that paper—its deep learning system was 87.9 percent accurate at identifying skin conditions for Black patients, the highest of any ethnicity. Accuracy for Black patients also showed among the best improvement between specialists diagnosing on their own and specialists working with Google’s tool, according to the slides.

But those analyses were done using ethnicity, not Fitzpatrick skin types that correspond to how dark a person’s skin is. And the JAMA study authors noted that their work contained important limitations, namely that Fitzpatrick skin type V (brown) was underrepresented in the data and type VI (dark brown or black) was completely absent from the dataset. As a result, the accuracy rates Google included in its slides would have excluded those for darker-skinned patients, regardless of their ethnicity.

A Google spokesperson told Motherboard that that the entire field of dermatology suffers from a lack of data and images of non-white patients and that accounting for that problem was at the forefront of researchers minds as they designed the app, which they plan to further refine before its public release.

“Our AI-powered dermatology assist tool is the culmination of more than three years of research,” Johnny Luu, the spokesperson for Google Health, wrote in an email to Motherboard “Since our work was featured in Nature Medicine, we’ve continued to develop and refine our technology with the incorporation of additional datasets that includes data donated by thousands of people, and millions of more curated skin concern images.”

In his email, Luu also provided a statement attributed to Dr. Steven Waldren, chief medical informatics officer of the American Academy of Family Physicians, saying in part: “By augmenting primary care with a robust AI solution, we can potentially improve the quality of care that is provided [for skin conditions].”

The medical tech field has a history of racially biased products, from optical heart rate sensors to algorithms hospitals use to decide which patients need further care. And Google has a representation problem of its own—the company is only 3.7 percent Black, according to its 2020 diversity report, and many of those employees are in non-technical roles.

The problem has come to a head over the last year following the company’s firing of respected AI ethicist Timnit Gebru and the revelations that followed about its treatment of other Black employees.

The news prompted a major recruiting organization for historically black colleges and university (HBCU) graduates to end its partnership with Google. And recently, three organizations working to boost the careers of Black and queer tech workers announced they would stop accepting funding from the company.

Without a diverse body of researchers and executives, critics have warned, the company will end up making technology that hurts the communities who are left out.

“The low representation of Black people within this [dermatology app] dataset suggests it is optimized for white consumers,” Mutale Nkonde, a Stanford Digital Society Lab fellow and founder of AI for the People, told Motherboard. “This preferencing of white people seems to be inline with their treatment of Black people within the company.”

Google said that while building the dermatology app it took measures to “make sure we’re building for everyone.” It also clarified that while the app has received certification in Europe as a Class I medical device, users should not be using it to diagnose themselves.

Dr. Adamson said it’s naive to think that people won’t consider the suggested conditions as diagnoses or evidence that nothing is wrong, particularly when it comes to serious conditions like cancer. So if the app is widely used, and especially if its accuracy is poor on darker skin tones, it could lead to dangerous strains on public health systems.

“Google claims to turn up the sensitivity on things that are possibly skin cancers,” he said. “From a clinical perspective, all that’s going to do is crank up the false positives—the people who are told they have skin cancer when that’s not true—and they’re going to flood dermatologists’ offices.”

Tagged:algorithmsGooglediversityDermatologymachine learning

Google’s New Dermatology App Wasn’t Designed for People With Darker Skin

ONE EMAIL. ONE STORY. EVERY WEEK. SIGN UP FOR THE VICE NEWSLETTER.