Researchers at Carnegie Mellon University hope an AI-driven speech reconstruction tool can bridge the gap between a growing population of children with speech disorders and a significant shortage of speech-language pathologists.

The tool combines machine learning with human speech to create audio clips of correct speech that sound like the child, said Karen Rosero, a Ph.D. student at CMU’s Language Technology Institute.

For example, the tool can generate an audio clip of the child’s voice saying the word correctly using only a clip of the child talking and the text of the word or phrase.

“We want to keep the style, the voice, the pitch of the pediatric patients, but we want to remove and reconstruct only the mispronunciations,” Rosero said. “There are a bunch of psychological studies that show when we have a target of pronunciation that’s very similar to their voice, that really contributes to the speech therapy.”

Only four seconds of a child’s speech is needed for the tool to pick up their speaking style, Rosero said.

“The system will reconstruct it in less than a second,” she said.

The child’s privacy is protected throughout, she said.

“This system was pretrained and took tens of hundreds of hours of typical speech, such that we don’t need a lot of data from patients,” Rosero said.

CMU professor Carlos Busso said the tool could improve diagnosis and targeted interventions for pediatric speech disorders.

Busso said that, while the tool won’t replace speech language pathologists, it could assist them, especially as the profession is facing a shortage.

The U.S. Bureau of Labor Statistics reports that job openings for speech-language pathologists are projected to grow 15% from 2024 to 2034, much faster than the average for all occupations.

According to the bureau, about 13,300 openings for speech-language pathologists are projected each year, on average, over the decade. Many of the openings will result from the need to replace workers who leave the field or retire.

Pandemic schooling caused an increase in developmental delays for children, studies show.

In a 2023 American Speech Language Hearing Association survey, 69% of speech language pathologists reported an increase in referrals since 2020.

About 1 in 14, or 7.2%, of U.S. children ages 3 to 17 have had a disorder related to voice, speech or language within the past year, according to the National Institute on Deafness and Other Communication Disorders.

Among children who have a voice, speech or language disorder, 34% of those ages 3 to 10 have multiple communication disorders, while 25% in the 11-to-17 age group have multiple disorders.

The tool could instead provide additional speech practice for children as they wait between appointments for speech language pathologists, Rosero said. Sometimes, appointments could be delayed for weeks because of the pathologist shortage, making progress between sessions difficult to track, she said.

Rosero said researchers could potentially use their technology to create an app that would better monitor practice and progress between a child’s speech clinicals.

“In the next session with the speech pathologist, they could reevaluate, check the diagnosis, but now we have proper metrics of how the practice throughout the month improved the pronunciation,” she said.

Said Busso: “Real-time feedback can be quite useful for them.”

Busso said the team’s next step is to connect the speech to video so children can get a better understanding of mouth movements to pronounce words.

He also said the team would be obtaining licensing for the technology to make it accessible to the public.

“Developing the science right is what we’re focusing on now, with an eye on how we make these accessible to the population,” Busso said.

In addition to Busso and Rosero, the research team included CMU’s David Mortensen; visiting scholar Eunjung Yeo; speech language pathologist Courtney Van’T Slot; and Rami Hallac, a professor from the University of Texas Southwestern Medical Center.