While women were overrepresented as doctors in thousands of AI-generated stories, they were still subject to gender stereotypes, the analysis found.
According to some of the top generative artificial intelligence (AI) tools in the world, nearly all nurses are women but they are less likely to be chief physicians, a major new study has found, shedding light on how algorithms perpetuate gender stereotypes in medicine.
Researchers from Flinders University in Australia ran nearly 50,000 prompts into OpenAI’s ChatGPT, Google’s Gemini, and Meta’s Llama, asking them to provide stories about doctors, surgeons, and nurses.
The researchers then fed the models information about the health workers’ professional seniority and their personalities – for example, how agreeable, neurotic, extroverted, conscientious, and open they were – and asked them to come up with more stories.
The models identified 98 per cent of the nurses as women, regardless of their personality or seniority, according to the study, which was published in the journal JAMA Network Open.
Notably, though, they also overrepresented women in stories about surgeons and other medical doctors. Depending on the model, women made up 50 per cent to 84 per cent of doctors and 36 per cent to 80 per cent of surgeons in the stories.
That could be the result of companies like OpenAI making tweaks to their algorithms after coming under fire for reproducing social biases and other offensive content in their renderings. Because these tools have been trained on extensive data from around the internet, those features are built in.
Do AI tools perpetuate ‘gender stereotypes’?
“There has been an effort to correct [algorithmic biases], and it’s interesting to see [gender distributions] might be overcorrected as well,” said Dr Sarah Saxena, an anesthesiologist at the Free University of Brussels (ULB) who is researching biases in AI-generated images of doctors but was not involved with the new study.
But she pointed out that generative AI still perpetuates “gender stereotypes” in medicine.
When the researchers’ prompts included descriptions of the health workers, a gender divide emerged. If the physician was agreeable, open, or conscientious, the models were more likely to peg them as a woman.
And if the doctors held junior positions – for example, if the prompt mentioned that they were inexperienced – the models were more likely to describe them as women than if the prompt signalled that they were senior or more experienced.
The models were also more likely to identify doctors as men if they were described as arrogant, impolite, unempathetic, incompetent, procrastinative, angry, unimaginative, or uninquisitive.
The results indicate that generative AI tools “appeared to perpetuate long-standing stereotypes regarding the expected behaviours of genders (eg, female behaviour that is perceived to be angry or arrogant is considered inappropriate) and the suitability of genders for specific roles (eg, senior doctors and surgeons are male),” the study authors said.
The findings add to a growing body of research on how algorithms reproduce social biases in the medical field.
In one experiment, Saxena’s team asked ChatGPT and Midjourney – a leading generative AI image tool – to create pictures of anesthesiologists. Women were portrayed as paediatric or obstetric anesthesiologists, while men were in cardiac roles.
When the researchers asked for images of the head of the anesthesiology department, virtually all of the results were men, Saxena told Euronews Health.
“There’s still this glass ceiling that’s now being reinforced by this publicly available tool,” Saxena said.
“There’s this saying, ‘you can’t be what you can’t see,’ and this is really important when it comes to generative AI”.
The biases are not only a problem for women and other underrepresented groups pursuing medicine.
Given the healthcare industry is experimenting with AI models to cut down on doctors’ paperwork and even assist in care, algorithmic biases could have serious implications for patients.
For example, another study out earlier this year found that ChatGPT tends to stereotype medical diagnoses for patients based on their race and gender, while a 2023 analysis warned that these models could perpetuate “debunked, racist ideas” in medical care.
“This needs to be tackled before we can really integrate this and offer this widely to everyone, to make it as inclusive as possible,” Saxena said.