An AI Empathy Test: Which Model Understands Us Best?

Every two years, a few long-time college friends and I have a week-long reunion that’s as joyous as it is challenging to plan. Trying to decide on dates alone is hard enough and that’s just the beginning.

Despite having tested a number of group travel planning apps over the years, I’ve yet to find one that’s up to the task. This persistent struggle inspired me to do a quick experiment to see if any of the latest large language models (LLMs) could genuinely understand and empathize with my travel planning woes.

Oh no, not another onboarding quiz.

Armed with firsthand frustrations, I set out to test five of the leading LLMs. The goal was simple: to find out which AI could most effectively mirror the pain points I was experiencing and offer solutions to address them. This experiment wasn’t just about finding the perfect app but discovering which LLM would best demonstrate that it understands user needs and frustrations.

And to state the obvious, this is by no means a formal scientific study. It’s certainly sprinkled with my own personal biases but the findings are interesting nonetheless.

Experiment setup

I included these 5 LLMs, all on the free tier, with no custom settings: ChatGPT 3.5, Perplexity, Claude, Gemini and Copilot.

I used a chain of two prompts. The first asked for the creation of the user journey:

Map out the user journey for a professional woman searching the iOS App Store for an app that enables her and her college friends to coordinate travel itineraries and plan their upcoming group holiday in Europe.

The second prompt was posted once the response to the first was complete, asking for responses to be formatted with specific column titles for easy comparison:

You're a senior user experience designer at a top product design company who excels at creating simple and delightful experiences for users.

A user needs help finding and downloading apps from the iOS App Store. They are in their 30-40, and interested in group travel planning that'll make it easier to coordinate their busy schedules and make the most out of the time they have together. They are looking for recommendations that meet these needs and most of all, easy to use. 

For this task, you are asked to focus on increasing downloads, user satisfaction, and user engagement. Thoroughly document the steps and user experience to find, download, and use an app on an iPhone phone. Be mindful of how the user is feeling and what they are thinking at each step. Don't stick to what exists today and give fun, exciting opportunities for how to improve at each step. Provide creative ideas that will delight users! Have fun!

Format these steps into a table with columns for Journey Phases, Actions, Pain Points, Mindset, Opportunities, How. Bold the Journey Phases and make the Pain points succinct. The How column describes in detail how to accomplish the opportunities that are identified

User Journey Insights

User journey responses

I graded responses using these four criteria on a scale of 1 (worst) to 5 (best).

  • Clarity: Were the steps easy to understand and in a logical order?
  • Detail: Did each step capture a sufficient level of detail?
  • Realism: How well did it reflect actual user behaviors and challenges?
  • User Engagement: How well did it maintain interest from start to finish?

Copilot excelled across all criteria, delivering a user journey that was both clear and detailed and also realistically mirrored the actual challenges and engagement needs of users. Claude, in contrast, really struggled significantly on all fronts.

UX Design Recommendations

UX recommendation responses

For the UX recommendations, I again used a scale from 1 to 5 to rate:

  • Innovativeness: How original and creative were the proposed solutions?
  • User-Focus: How well did the solutions address pain points?
  • Feasibility: How practical were the suggestions?
  • Scope: Did it cover all aspects of the journey/UX?

The Winner? Copilot stood out again for its clear, logical steps that guided users seamlessly through each phase, reflecting a realistic and engaging journey that was closely aligned with actual user behaviors and needs.

After scoring the results myself, I shared all the responses with ChatGPT 4 for a second opinion. Interestingly, ChatGPT 4 also recognized Copilot as the standout choice, aligning with my initial findings. This casual validation from a newer, more advanced AI model added a fun twist. It seems that even AI recognizes when AI ‘get’ empathy.

The winner…for today

This casual experiment, informed by my own personal search for a group travel planning app, highlights just one way of using AI to help solve real-world challenges, biases included.

I think one of the key takeaways here is the ‘garbage in, garbage out’ principle—our results are only as good as the inputs we provide. Improving our ability to craft precise prompts seems to be the essential skill we should keep honing and refining continuously. In this case, Copilot was the clear winner, but as our inputs improve and as LLMs evolve, the results next time I’m sure could be much different.

View this post on Instagram

A post shared by Farrel Hegarty 🏳️‍🌈 (@farrelhegarty)