Project

Gender Shades

Joy Buolamwini

Results

High-Level Benchmark Evaluation Results

High-Level Gender Classification Results

All classifiers perform better on male faces than female faces (8.1%-20.6% difference in error rate)
All classifiers perform better on lighter faces than darker faces (11.8%-19.2% difference in error rate)
All classifiers perform worst on darker female faces (20.8%-34.7% error rate)
Microsoft and IBM classifiers perform best on lighter male faces (error rates of 0.0% and 0.3% respectively)
Face++ classifiers perform best on darker male faces (0.7% error rate)
The maximum difference in error rate between the best and worst classified groups is 34.4%