What teaching a robot to recognise black women’s hair taught me about the future of AI

New, accessible platforms help under-represented groups create much-needed machine learning models.

6 min readJan 21, 2020

3 black women with various hairstyles — Image via Eloise Ambursley for Unsplash

A few weeks ago I set out to create the first machine learning model that can identify and classify the hairstyles of black women.
Here’s why I did it.

The bias of machine learning models against women and people of colour is a phenomenon so well documented that to attempt to unpack it here would be to make the least use of your time. As humanity speeds headlong into a future where machine learning informs the decisions that affect the lives of people on the road, in places of employment, and in homes the authors of this technology are still not representative of the societies in which the technology will proliferate.

Even as recently as a few months ago the US government’s top performing general application facial recognition systems were found to misidentify people of colour at rates five to 10 times higher than they do white people. In fact, very little progress has been made to train machine learning models to accurately recognise the facial features of black women since Joy Buolamwini’s memorable dissection of the “coded gaze” in 2015.

It was with this context in mind that I decided to train a machine learning model on a dataset that exclusively serves black women: the least represented demographic in the field of machine learning. What began initially as an act of deliberate exclusivity for the sake of representation transformed into a process of exclusivity as a matter of survival.

An idea whose time has come

In an impassioned post on social media, a user of blackgirlhair.js shared her struggle to find relevant content on the internet. She recounted her experience stating:

‘.. in order to collate and aggregate information on the internet which is ethnically linked to me, I have to input the words “black + woman” on the search. I know, tedious and quite honestly it just makes me so mad.’

Her story is a familiar one and speaks to a generation of internet users from the global south whose experience of algorithmic personalisation is that of being treated as second-class citizens on platforms that appear to cater seamlessly to their counterparts elsewhere around the world.

Africa is currently experiencing the fastest internet penetration growth compared to any other region with 300 million new users expected to come online for the first time by 2025. Unfortunately, despite the strides technology firms are making to deliver a personalised internet of me, many of Sub-Saharan Africa’s new connectees will do so on online experiences curated by hand-me-down algorithms as the world’s disenfranchised continue to reap the unfamiliar fruits of a centralised internet.

This is where accessible machine learning platforms like Google’s Teachable Machine come in.

Teachable Machine 2.0

This is a tool made by the team at Creative Lab NYC, and it helps you to train a computer to recognize your own images, sounds, & poses. Powered by Tensorflow.js, Teachable Machine presents an easy to use, no-frills web interface for non-technical people to quickly get up-and-running with machine learning.

What I love about platforms such as this is that they demystify the process of machine learning, allowing makers instead to focus their efforts on curating datasets and thereafter creating meaningful experiences. More importantly, I am excited at the prospect of African creators using data familiar to them to build new machine learning models that help solve pressing socio-economic problems.

An example that immediately jumps to mind is the field of dermatology and the opportunities such platforms present for medical practitioners to curate their own data and train much-needed models to identify potentially fatal skin conditions without supervision.
To understand why this is important we can look at how African Americans, with a five-year survival rate of 73 percent, have the highest mortality rate for skin cancer according to the American Academy of Dermatology yet only a few doctors practising in the United States are trained on that particular skin type. Additionally, machine learning models often lauded for their ability to accurately detect melanomas on fair skin are often ineffective in performing the same on darker skinned patients.
With diligence and collaboration, small players across the region can easily curate critical community data sets and train them using tools like Teachable Machine, inline with the UNDP’s guidelines on using AI to help achieve Sustainable Development Goals.

Using Teachable Machine

Step 1 : Data collection

The trick to any good machine learning model is the quality of the data. Teachable Machine makes the training process easy and the resultant model available for use immediately thereafter. This makes it very easy to validate the accuracy of your model and have a fair idea of what parts of your data need cleaning up.

An image of some of the data samples used in the blackgirlhair.js model — Some of the data samples used in the blackgirlhair.js model

For my image classification model I used Teachable Machine’s “Image Project.” To obtain hairstyle data I scraped popular image cataloguing websites for images of black women’s hairstyles and saved images in corresponding folders on my desktop. I then sifted through each folder checking for and removing mis-labelled images.

Thereafter, I cropped each image in a 1:1 (square) format, making sure the hairstyle was the focal point of the image. Teachable Machine does this automatically for you but an automatic crop will of course not always be accurate. It is also good practice for one to prepare their own datasets.

Step 2: Model training

I uploaded my images to their corresponding class on Teachable Machine and clicked “Train Model.” Once that was done my model was ready for export for use in my website or application.

That’s it. It’s really easy.

Learnings

While Teachable Machine’s solution alone deserves mountains of praise it is the questions it raised that made me sit up and pay attention.

2 young black women looking at a mobile phone — Image via Reginald Sebopela for Unsplash

For starters, what does it mean for underserved communities to finally have access to an intuitive means of machine learning model training and dataset curation? In this future what does community dataset curation look like, especially when community is still geographically constrained?

As a fledgling technologist building tools for future Africa I am excited at the prospect of decentralised algorithms considering how “catch-all” machine learning models continue to fail those whose skin is a few shades shy of pale.
Practically speaking, when technology giants spread their transatlantic tentacles further past African shores it is these localised models that will grant self-driving cars sight on African roads and train the ears of digital assistants on every Nguni click and Yoruba intonation.

Drawing my sights closer to the present, this foray into the world of GUI-enhanced machine learning also taught me that it is okay for a machine learning model to only know how to do only one thing as long as it does it well. History has taught us that general application models do not scale well beyond the confines of an MIT lab and are barely future-proof.

Through the use of emerging democratic platforms such as Teachable Machine oft-ignored communities can finally seize the means of production and chart a path towards an urgent and relevant technologically-enhanced future.