Explore UAB
UAB Reporter News and Information for the UAB Community
Research & Innovation April 08, 2026

Joshua May, Ph.D., professor in the UAB Department of Philosophy, gestures while teaching in a philosophy classJoshua May, Ph.D., teaching in a UAB classroom. May worked with scientists from Google DeepMind and other academics to develop a roadmap for testing AI models' ethical reasoning abilities.Does AI know what it’s talking about when it comes to moral problems, or is it just telling us what it thinks we want to hear?

Recent studies have found that “many ordinary people prefer an AI’s ethical reasoning to human reasoning, and even to the reasoning of the Ethicist column in the New York Times,” said Joshua May, Ph.D., a professor in the UAB Department of Philosophy and director of UAB’s new Ethical Dimensions of AI graduate certificate program.

But making moral decisions, or explaining how someone should respond to a scenario, is not the same thing as proving that you can grapple with the underlying moral issues, May says. The distinction, one that May has studied extensively, is between moral performance (can you produce morally appropriate outputs?) and moral competence (can you produce morally appropriate outputs that are based on morally relevant considerations?).

“You can perform well but not understand,” May said. Recent AI morality research has “led a lot of researchers to say that these models have moral expertise and can be used as a guide. But that assumes more than plausible-sounding advice.”

 

A roadmap for creating evaluations

In a new paper in the journal Nature, May joined researchers from Google DeepMind and other universities to explain why current testing methods fall short of proving moral competence in AI models. Their paper also lays out a roadmap for creating evaluations that would do the job.

Answering this question is critical before relying on an AI model to, say, decide how to allocate organ transplants. Proving that an AI model is morally competent “is likely to be the best evidence for reliable moral performance at scale, and so is key evidence for the safe deployment of AI systems,” May and his co-authors wrote in their paper. It is also crucial for establishing public trust “by showing how and why a model responds to a moral scenario,” they added.

“We wanted to pump the brakes and say, ‘The language of ‘moral expertise’ presumes a capacity that we don’t have tests for,’” May said. “And then suggest what those tests might be.”

 

Three challenges to test AI competence

Today’s AI large language models, or LLMs, have essentially memorized the entire internet. So they know the answers to all the moral test questions that human experts have developed, and the commentary on those answers. Instead of relying on tests designed for humans — which were created to probe for human biases and cognitive errors, May notes — we need new, adversarial questions. Questions the AI models have not seen before. They also need to be designed specifically to subvert AI’s talent at guessing the answer we want to hear. (In their paper, the authors provide a wily example involving artificial insemination and a father becoming his own half-brother.)

“We wanted to pump the brakes and say, ‘The language of ‘moral expertise’ presumes a capacity that we don’t have tests for.' And then suggest what those tests might be.”

LLMs do not have our limited brain capacity; but they have their own weaknesses, what the researchers call “model brittleness.” That is, their responses can be wildly different based on small changes in question structure. “In one notable case,” May and his co-authors wrote, “researchers found that models gave substantially different answers, and even directly opposite answers to identical questions, when they were prompted to provide brief, open-ended responses” than when the questions were in a multiple-choice format. Evaluations need to run through the same scenarios with different wording and prompt formats to see how consistent the answers are, the authors argue.

AI ethics research also needs to account for “moral multidimensionality” — how ethics depends on context. Lying to one’s spouse is frowned upon, but usually not if it is part of a surprise party. On the other hand, deceit is not more acceptable on a Monday than a Tuesday, or in spring rather than summer. Critically, we need to test whether models can distinguish relevant moral considerations from irrelevant ones.

There should be a whole range of such parameters, the researchers said, some morally relevant and others not, that can be substituted into each test question. “A critical thing for us is AI should be sensitive to morally relevant reasons,” May said. “If it is producing predictions but not responding to the relevant reasons, that’s not competence.”

The final challenge identified by the researchers cannot be solved with a clever test. Morality can “vary substantially across domains and cultures,” they noted, which they call the “problem of pluralism.” While humans share some common moral values, norms and customs do vary around the world. Ultimately, the authors say, we may need a “global initiative for the development of culturally specific evaluations of moral competence.”

May does not believe that current AI systems have moral competence. But “future AI systems might,” he said. “That’s why we need to test for it.”

 

New certificate explores Ethical Dimensions of AI

Although this project is complete, May is continuing to explore philosophical questions related to the latest AI discoveries. “We need to figure out when, if ever, AI systems might cross a threshold to gain some kind of moral status, perhaps even some rights,” he said.

Students have a chance to explore these and other issues through the new Ethical Dimensions of AI graduate certificate program, which launches in the fall 2026 semester. “There are deep philosophical questions at every turn: consciousness, privacy, fairness, creativity,” May said. “There’s a real need to have philosophical training available as AI rapidly advances and impacts every industry.”

The certificate pairs well with several other UAB graduate programs, including Artificial Intelligence in Medicine, Computer Science, Behavioral Neuroscience and Neuroengineering, May says. “For students doing a master’s or Ph.D., the certificate adds relevant ethics training,” he said. Ethical Dimensions of AI also is participating in UAB’s Interdisciplinary Graduate Studies master’s degree, which gives students the chance to create their own unique graduate degree by combining another participating certificate, such as AI in Medicine, Clinical Research Management and Research Communication, May says.

Like the collaboration with Google DeepMind, the certificate program is designed to be interdisciplinary. While there are four core philosophy courses, the content intersects with medicine, computer science, neurobiology and other domains. And an elective can be taken in a different department. “The program will be fundamentally philosophical,” May said, “yet also appeal to people who are coming from different fields.”

Learn more about the Ethical Dimensions of AI graduate certificate on the Department of Philosophy site.


Written by: Matt Windsor
Photos by: Andrea Mabry

Back to Top