Can chatbots help choose the right doctor or hospital?

While ChatGPT and Bard cannot provide reliable medical or treatment advice, there is a question of whether these chatbots can assist patients in selecting the right doctor or hospital. Surprisingly, both chatbots occasionally suggest individual physicians and hospitals, providing detailed provider information. However, the accuracy and usefulness of this information are difficult to assess. As I highlighted in a recent Forbes.com column, chatbots like Microsoft and OpenAI’s ChatGPT and Google’s Bard blend truth with “truthiness,” a term coined by comedian Stephen Colbert to describe information that feels true without evidence to support it.

For example, Bard confidently provided me with statistics on the number of knee replacement surgeries conducted by major Chicago hospitals in 2021, along with their infection rates and the national average. It even named the Chicago surgeon with the highest number of knee surgeries and provided their infection rate. When I inquired about heart bypass surgery, Bard furnished both the mortality rate for certain local hospitals and a national average.

While Bard occasionally cited sources like the American Hospital Association, the Centers for Medicare & Medicaid Services, and the Joint Commission, it also referred to its own knowledge. Aware that generative AI can produce false information, I fact-checked the data with these organizations, but none of them had the data attributed to them.

With chatbots, the wording of the query matters. When I asked Bard about the surgeon performing the most knee replacements in Chicago, it provided one name. However, when I broadened the question to encompass the “Chicago area,” Bard listed seven highly skilled and experienced surgeons with a history of successful outcomes and compassionate care.

Although Bard included cautionary language in its responses, such as acknowledging the risks associated with surgery, it still unequivocally recommended scheduling a consultation with one of the seven surgeons for knee replacement. On the other hand, ChatGPT did not make direct recommendations but did provide a list of four top knee replacement surgeons based on their expertise and patient outcomes.

The methods used by Bard and ChatGPT to generate these answers remain unknown. We are uncertain why they provide one, four, or seven names. To assess the validity of the physician recommendations, I conducted a basic test by asking Bard to recommend good restaurants in the Chicago area. The responses included reputable establishments, ranging from a Michelin Guide three-star restaurant in Lincoln Park to a gastropub renowned for its beer selection in the West Loop. Alternative phrasing yielded a list of nine restaurants, including four from the initial list, along with three well-regarded local pizza chains. ChatGPT produced ten restaurant names in response to the first query and eight in the second, with some overlap with Bard’s suggestions.

Ideally, the best way to address these questionable responses from chatbots would involve surgeons and hospitals disclosing accurate and standardized data on procedure volumes and complication rates promptly. While waiting for this transparency (which may take a while), I believe it is essential for physicians to advocate for collaborative efforts involving organizations like the American Medical Association and the American Hospital Association, with the inclusion of patient groups, to develop responsible solutions for this problem.

In the meantime, the public is left to wonder whether the doctors listed by chatbots as heart specialists are akin to a satisfying pizza or chefs who, with three Michelin stars, are among the best in the world.

Michael L. Millenson is president, Health Quality Advisors, LLC and can be reached on his self-titled site, Michael L. Millenson.