Project

Gender Shades

Why This Matters

As more data is gathered from individuals, and artificial intelligence (AI) is employed to make important decisions,  we must continue to ask precisely who is benefiting. We must also be intentional about inclusive and ethical AI. Only then can the benefits of AI be equitably distributed.

For example, in the last few years, advances in AI have renewed hope in the capacity of technology to help drive precision healthcare—the ability to deliver the right treatment to the right person at the right time seems ever more attainable.  With enough health data, we can train algorithms to make precise medical decisions. Yet my work with the Algorithmic Justice League, along with mounting research studies (including Gender Shades), show artificial intelligence can be biased if its creators are not intentional about gathering inclusive data or ignore evaluations of results by diverse subgroups. 

In her talk "His and Hers ... Healthcare," Dr. Paula Johnson demonstrates the perils of data collection and analysis that ignore sex differences in healthcare. I was surprised to learn that it wasn't until 1993 that clinical trial studies funded by the US government were required to include women. Even when women are included in clinical trials, sex differences are often overlooked. By paying attention to these differences, Dr. Johnson's work advances science and medicine for both men and women. Her talk reminds us to appreciate our differences instead of ignoring them. 

Because algorithmic fairness is based on different contextual assumptions and optimizations for accuracy, this work aimed to show why we need rigorous reporting on the performance metrics on which algorithmic fairness debates center. The work focuses on increasing phenotypic and demo- graphic representation in face datasets and algorithmic evaluation. Inclusive benchmark datasets and subgroup accuracy reports will be necessary to increase transparency and accountability in artificial intelligence. For human-centered computer vision, I define transparency as providing information on the demographic and phenotypic composition of training and benchmark datasets. 

I define accountability as reporting algorithmic performance on demographic and phenotypic subgroups and actively working to close performance gaps where they arise. Algorithmic transparency and accountability reach beyond technical reports and should include mechanisms for consent and redress, which we do not focus on here. Nonetheless, the findings from this work concerning benchmark representation and intersectional auditing provide empirical support for increased demographic and phenotypic transparency and accountability in artificial intelligence.