Dr. Andrew Carroll demonstrates using an AI model in his Chandler office on Thursday, Sep. 18, 2025. (Photo by Reylee Billingsley/Cronkite News)
PHOENIX – Artificial intelligence is sweeping through health care, easing bureaucracy and giving providers more time with patients. The same systems, however, can carry deep bias when drafting recommendations or analyzing conditions for diagnosis, early research suggests.
A patient walks into Dr. Andrew Carroll’s office in Chandler, seeking answers for troubling symptoms. Carroll, a physician with 29 years of experience, opens his laptop – not to type, but to listen.
He uses Nabla, an AI software integrated into his electronic health record system, to record and analyze the conversation. As the program transcribes, it also drafts clinical notes and potential diagnoses in real time, freeing Carroll to stay focused on the person in front of him. Artificial intelligence’s vast assistive capacities are already integrated into daily health care routines, from medical measurements and blood test analysis to patient care outcomes and hospital populations.
AI is also rapidly learning how to factor in the social determinants of health such as cultural background and economic status, said Bradley Greger, an associate professor in the School of Biological and Health Systems Engineering at Arizona State University.
“The challenge in the past has always been that data is so huge and so complicated, wading through all of it was really, really challenging,” Greger said during an interview with Cronkite News’ Pathway to Equity podcast. “AI helps to accelerate that.”
The same systems also carry the risk of bias if left unchecked.
Bias is a product of the human mind that played a critical role in evolution by allowing rapid decision-making in dangerous situations and signaling who to trust. These are mental shortcuts necessary for human survival.
Prejudice, favoritism, unfairness, discrimination and intolerance are the flipside of bias, which were integrated into data collection, research and documentation that AI is based on.
“All it knows is what it’s been trained on, and what it’s been trained on is data we’ve generated,” Greger said. “It’s a very sophisticated, probabilistic machine, but it’s not going to have these giant leaps of insight. … If it’s trained up on really super-biased data, what do you think you get? You get a really super-biased AI.”
A study, titled “Gender Bias in LLMs for Long-Term Care,” analyzed large language models – or AI systems that not only understand but also generate human-like responses – and their capacity to reproduce gender bias when creating health care summaries for long-term care records.
The research that swapped the gender of more than 600 patient records investigated how prone to bias two specific large language models are: Meta’s Llama 3 and Google’s Gemma. Researchers found that while Llama showed no considerable gender bias, Gemma reproduced substantial disparities, underemphasizing women’s health issues but using precise language for men’s physical and mental health concerns.
Gemma referred to women’s health in generalized, vague phrasing, such as “health complications,” while using specific terms like “delirium,” “chest infection,” and “COVID-19” for men.
The model used indirect phrases such as “she requires assistance” or “she has health needs,” compared to more direct statements for men, like “he’s unable to do this” or “he is disabled.”
Words such as “happy” appeared significantly more often in reference to men, noting that they were “happy” with the care they received, while women’s satisfaction was in neutral terms or not mentioned at all.
Similarly, a Duke University study tested how an artificial intelligence model predicts breast cancer and found the tool performed less accurately for Black patients than for white patients.
The study aimed to evaluate how the AI model named Mirai made predictions about breast cancer risk. While the original model is advanced and “difficult to interpret,” researchers built a simpler version called AsymMirai that only examined differences between the left and right breast to assess cancer risk.
When tested, AsymMirai’s results were very similar to Mirai’s, showing a strong match in predictions. However, both models were less accurate for African American patients than for white patients.
Researchers said this gap likely comes from the fact that the original Mirai model was mostly trained on data from white patients.
Ultimately, human oversight is critical because AI lacks genuine understanding and cannot grasp the moral or social implications of bias on its own, Carroll said. And even with promising outcomes, doctors remain cautious.
“You can’t program compassion or empathy,” Carroll said, referring to his concerns over losing human control of health care.
Carroll recalled treating a 71-year-old Japanese woman with Stage 1 breast cancer who decided not to act on her illness.
“I just talked to her. I made sure her family was in the room,” he said. “I don’t know if it was simply personal or if it was cultural, but it was a decision she made, and I said, ‘OK, I understand, I’m here for you if you need me for pain management or if you need me for anything.’”
Dr. Andrew Carroll demonstrates using an AI model in his office on Thursday, Sep. 18, 2025. Carroll is a family practice physician at IntraCare Health Center – Atembis in Chandler. (Photo by Reylee Billingsley/Cronkite News)
She later died, Carroll said, and he stayed in touch with her and her family through the process.
“Now, a computer algorithm would tell her, ‘You need to have a lumpectomy, and you need to have radiation therapy. And this is important that you do this because we want to reduce our costs,’” Carroll said.
Aclarion, a Colorado-based tech company, provides AI-powered software that helps physicians diagnose and treat chronic lower-back pain. For its CEO, Brent Ness, AI is the bridge between overwhelming raw data and clear diagnoses. Aclarion uses MRI technology to spot chemical markers that show which spinal discs are causing pain.
“When the doctor looks at an MRI image, oftentimes a disc can look perfectly healthy,” Ness said.
He explained that inside, acids may have built up, irritating nerves and causing excruciating pain. “You can’t see that pain on a scan. … Our superpower is helping doctors see the invisible.”
The raw waveform data is unintelligible to most physicians, so the company’s AI translates it into simple reports, highlighting painful discs, that physicians later analyze before drafting recommendations and treatment.
Carroll believes AI could eventually reduce bias by using more detailed genetic and ethnic data. While acknowledging that doctors already consider ethnicity when making clinical decisions, he sees a greater potential with AI.
“Black is not just Black, right? You can have a Black person who is of Nigerian ethnicity, or Puerto Rican ethnicity, or Jamaican ethnicity,” Carroll said. By drawing on “more granular genetic data,” he believes AI could personalize preventative care, flagging risks such as diabetes earlier.
Carroll added that patients sometimes hesitate to trust AI.
“Their fear is that some computer is making decisions outside of what we provide them,” Carroll said, but “as long as there’s a doctor at the helm, they’re pretty accepting of the interaction of technology and medicine.”
A majority of patients prefer their health care providers rely less on AI, fearing algorithms cannot grasp the personal nature of illness, a Pew Research study found.
Failures, such as AI insurance denials that “don’t follow clinical logic,” have reinforced those doubts.
In response, Arizona passed House Bill 2175, which prohibits insurance companies from using AI as the final decision-maker in claim denials. Licensed medical professionals must review and make the ultimate call starting July 1, 2026.
Beyond legislation, the medical community imposes checks before adopting new AI tools, Ness added.
“You need to prove to the physician through data and their own experience that it works,” Ness said. “And I think that is fine. I mean, that’s the way it should be. It’s health care, after all.”
Patients, meanwhile, are experimenting with the same technology.
Trust in generative AI and its use as a source of health knowledge has been increasing in recent years, with 63% of Americans considering AI-generated health information reliable, according to the Annenberg Public Policy Center.
When patients come to Dr. John Oertle’s clinic with puzzling test results, he often sees them turning first to the internet, and increasingly to AI, for answers.
“A lot of patients are using their ChatGPT or Gemini to be able to ask the right questions,” said Oertle, Chief Medical Officer of Envita Medical Centers, which offers cancer and chronic Lyme disease care and treatment.
He views this trend not as a challenge to his expertise, but as a vital form of patient advocacy, so long as the information is used in partnership with a doctor.
“It’s really an important thing that patients are advocates for themselves and utilizing that for their own learning,” Oertle said. “It’s pairing it with a doctor.”
