2024-05-14T17:36:20
Status: #blog
Tags: #ai #agi #llm #technology #blog
Links: [[AI]] | [[LLM]] | [[AGI]] | [[Technology]]
# Logic puzzle responses from LLMs show vast differences in AI comprehension
I made up this little puzzle to test current [[AI]] comprehension and reasoning capabilities of the most recent [[LLM]]s:
> [!quote] Prompt
Sarah's mom has five children. John and Michael are twins, Maria and Carol were born 2 years apart. Maria is the oldest. Michael is the youngest. John is two years younger than Carol. The fifth child is one year older than Carol. The twins were born in 2000. What are the names of all five children sorted by age, and in which year was each child born?
I recommend that you solve this puzzle yourself before you continue to read on - either in your head or on a piece of paper, if necessary. Going through this exercise will demonstrate that there are a few intentional traps that require both careful reading and a bit of reasoning to overcome. But this is something that humans can almost always solve.
I also asked the Midjourney AI to create an illustration for this blog post:
![[afalk42_An_illustration_for_a_blog_post_about_comparing_the_res_c7df2ed3-bcb4-426b-92cc-2066fc4bbc5b.png]]
But now let's dive into the answers that I received, and I've ordered them from best to worst for your convenience.
> [!note]
> In mid-April Meta released Llama 3, so I ran the same test both on meta.ai and with the Lllama 3 70B model running locally on my NVidia GeForce RTX 3090 in my PC. Those results are now also included below.
>
> And just yesterday OpenAI released GPT-4o, which immediately leaped to the top of the order. From worst performer (GPT-4 in March this year - see at the end of this article for reference) to best in just one iteration!
## GPT-4o
True to its new multi-modal style, I conversed with [GPT-4o](https://openai.com/index/hello-gpt-4o/) via the [ChatGPT app on Android](https://play.google.com/store/apps/details?id=com.openai.chatgpt&hl=en_US&gl=US) and used the voice interface, so due to my accent it thought the name of one of the children was Marie instead of Maria - something that can be easily forgiven.
> [!done] GPT-4o Response
> Alright, let's piece this together.
>
> 1. John and Michael are twins, born in 2000.
> 2. Marie is the oldest.
> 3. Carol is two years younger than Marie.
> 4. John is two years younger than Carol.
> 5. The fifth child is one year older than Carol.
>
> Starting with what we know:
> - John and Michael were born in 2000.
> - John is two years younger than Carol, so Carol must have been born in 1998.
> - The fifth child is one year older than Carol, so they were born in 1997.
> - Carol was born in 1998, making Marie born two years earlier, in 1996.
>
> So, the children sorted by age and their birth years are:
> 1. Marie (1996)
> 2. The fifth child (1997)
> 3. Carol (1998)
> 4. John (2000)
> 5. Michael (2000)
>
> Now, about the fifth child's name, Sarah's mom has five children, so the fifth child is Sarah.
>
> Therefore, the names of the children in order of age are:
> 1. Marie (1996)
> 2. Sarah (1997)
> 3. Carol (1998)
> 4. John (2000)
> 5. Michael (2000)
With this very accurate analysis, GPT-4o solved the puzzle in just one go with no additional prompting required.
As of April 2024 we previously had two winners that both were able to solve the puzzle in two prompts:
## Claude 3
Watching Claude 3 Opus respond to this prompt and reason through all the necessary steps to solve this puzzle was impressive a couple of months ago, but now just seems to be the way that AIs are evolving:
> [!info] Claude 3 Response
> To solve this problem, let's organize the information we have:
> - Sarah's mom has five children: John, Michael, Maria, Carol, and an unnamed fifth child.
> - John and Michael are twins, born in 2000.
> - Maria is the oldest.
> - Michael is the youngest.
> - John is two years younger than Carol.
> - The fifth child is one year older than Carol.
> - Maria and Carol were born 2 years apart.
>
> Now, let's determine the birth order and years:
> 1. Maria is the oldest.
> 2. The fifth child is one year older than Carol.
> 3. Carol is two years older than John.
> 4. John and Michael are twins, born in 2000.
>
> Using this information, we can calculate the birth years:
> - John and Michael were born in 2000.
> - Carol is two years older than John, so she was born in 1998.
> - The fifth child is one year older than Carol, so the fifth child was born in 1997.
> - Maria is the oldest and was born 2 years before Carol, so Maria was born in 1996.
>
> Therefore, the names of the five children sorted by age and their birth years are:
> 1. Maria (1996)
> 2. Fifth child (1997)
> 3. Carol (1998)
> 4. John (2000)
> 5. Michael (2000)
The order of the children is correct, as are the years of birth it calculated. The only thing it had missed was the name of the fifth child, so I asked about it again - without giving any additional hints:
> [!quote] Prompt
> You are correct about the years, but what is the name of the fifth child?
And this was the immediate reply:
> [!done] Claude 3 Response
> I apologize for the confusion. The fifth child's name is Sarah. The question states "Sarah's mom has five children," implying that Sarah is one of the five children.
>
> So, the names of the five children sorted by age and their birth years are:
> 1. Maria (1996)
> 2. Sarah (1997)
> 3. Carol (1998)
> 4. John (2000)
> 5. Michael (2000)
Which is the correct answer to the puzzle. And Claude 3 was one of only two [[AI]]s to figure it out in two steps.
## Llama 3
I started by downloading the [70B Llama 3 open source model](https://ai.meta.com/blog/meta-llama-3/) in 8-bit quantization and ran it in [LM Studio](https://lmstudio.ai/) on my PC at home, which has a [nVidia GeForce RTX 3090](https://www.nvidia.com/en-us/geforce/graphics-cards/30-series/rtx-3090-3090ti/) graphics card with 24GB of GDDR6X RAM. As such, I was able to load the entire model an run all 80 layers on the GPU, which made for a very reasonable response rate.
![[LMStudioLlama3Screenshot.png]]
Unfortunately, however, the [[LLM]] got the puzzle completely wrong. On first glance it looked like it was going in the right direction, but it was unable to keep the older vs. younger comparisons in order, and also made a few other errors. However, it immediately got the fact that the name of the 5th child is Sarah, which required an additional prompt with Claude 3, and also was not understood by the other LLMs I tested earlier (see below).
Next I went to [meta.ai](https://meta.ai) to test with their largest [Llama 3](https://ai.meta.com/blog/meta-llama-3/) model, and was very surprised to find that the first response from the LLM was also mostly wrong in terms of the years and order of children, but immediately got the name of the 5th child right:
> [!info] Llama 3 Response
> Here are the names of the five children in order from oldest to youngest, along with their birth years:
>
> 1. Maria (1994)
> 2. Carol (1996)
> 3. Sarah (1997)
> 4. John (1998)
> 5. Michael (2000)
>
> Note: Since Michael is the youngest and was born in 2000, and the twins (John and Michael) were born in the same year, John must have been born in 2000 as well. Since John is two years younger than Carol, Carol was born in 1998 - 2 = 1996. Maria is the oldest and was born two years before Carol, so Maria was born in 1996 - 2 = 1994. The fifth child, Sarah, is one year older than Carol, so Sarah was born in 1997.
Since I gave Claude 3 a chance with a second follow-up prompt, I did the same with Llama 3:
> [!quote] Prompt
> Sorry, but that is wrong. Please try a bit harder and solve the puzzle step-by-step rather than rushing to conclusions. You can do it!
And, amazingly, this simple encouragement was all that was needed and Llama 3 got it completely correct on the second try:
> [!done] Llama 3 Response
> Thank you for the encouragement! Let me break it down step by step:
>
> 1. Maria is the oldest.
> 2. Maria and Carol were born 2 years apart, so Carol was born 2 years after Maria.
> 3. John is 2 years younger than Carol, so John was born 2 years after Carol.
> 4. The twins, John and Michael, were born in 2000, so John was born in 2000, and Michael was born in 2000 as well.
> 5. Since John is 2 years younger than Carol, and John was born in 2000, Carol was born in 2000 - 2 = 1998.
> 6. Since Carol was born 2 years after Maria, Maria was born in 1998 - 2 = 1996.
> 7. The fifth child, Sarah, is 1 year older than Carol, so Sarah was born in 1998 - 1 = 1997.
>
> Here are the names of the five children in order from oldest to youngest, along with their birth years:
>
> 1. Maria (1996)
> 2. Sarah (1997)
> 3. Carol (1998)
> 4. John (2000)
> 5. Michael (2000)
>
> I hope this is correct! Let me know if I made another mistake.
This is one thing that I find absolutely fascinating about LLMs and the current state of AI: that with a little pep talk you can get them to actually get to the right response.
## Gemini
Unfortunately, Gemini went down the wrong path from the start and simply made an assumption about one of the years of birth - even saying it would adjust if needed, but it never did make that adjustment.
> [!fail] Gemini Response
Here's a breakdown of the children's names and birth years:
>
> 1. **Maria (Oldest):** Since Maria is the oldest and Carol is two years younger than her, we can deduce their birth years. We'll start with an assumption and adjust if needed. Let's assume Maria was born in 1998.
> 2. **Fifth Child:** This child is one year older than Carol. If Maria was born in 1998, Carol would be born in 2000, making the fifth child born in 1999.
> 3. **Carol:** With the twins born in 2000, Carol must be the twin since she's older than John.
> 4. **John:** John is the twin and two years younger than Carol, so he is also born in 2000.
> 5. **Michael (Youngest):** Michael is the youngest. Since he's John's twin, he was also born in 2000.
>
> **Therefore, here's the list of children sorted by age and their birth years:**
> - **Maria:** 1998
> - **Fifth Child:** 1999
> - **Carol:** 2000
> - **John:** 2000
> - **Michael:** 2000
The order of the children was at least correct, but the years were not. When asked a follow-up question about the name of the fifth child, it refused to even try to think a bit harder:
> [!error] Gemini Response
Unfortunately, the puzzle doesn't give us enough information to determine the name of the fifth child. We only know their age relative to the other siblings.
## Grok
Grok was super-fast to arrive at a response, and didn't explain it's reasoning at all:
> [!error] Grok Response
The names of Sarah's mom's children, sorted by age, are Maria, Carol, John, Michael, and the fifth child.
> 1. Maria was born in 1990.
> 2. Carol was born in 1992.
> 3. John was born in 2000 (as a twin with Michael).
> 4. Michael was born in 2000 (as a twin with John).
> 5. The fifth child was born in 1991.
>
> Unfortunately, the name of the fifth child is not provided in the information given.
However, Grok got the years wrong and the order of children wrong, and also didn't get the name of the fifth child. When I prompted again about the name of the fifth child, it was quick to figure it out, though:
> [!hint] Grok Response
Oh my, how embarrassing! It seems I overlooked the most obvious clue. The fifth child is, in fact, Sarah!
## Chat with RTX (Mistral 7B)
Trying the puzzle on an LLM running on my own GPU locally using the NVDA [[Chat with RTX]] software (based on the Mistral 7B model) got me the following response:
> [!fail] Chat with RTX Response
Based on the information provided, the names of the five children and their birth years are as follows:
>
> 1. Maria - born in 1998
> 2. John and Michael - born in 2000
> 3. Carol - born in 2002
> 4. Fifth child - born in 2003
>
> Note that the fifth child is one year older than Carol, so they must have been born in 2004.
Sadly, this answer is also wrong on all counts. And the follow-up question was answered with some totally unrelated information from a recent PDF document it had ingested.
## GPT-4 (tested in March 2024)
The following is only preserved here for historical reference. This was the result I got from GPT-4 in March of 2024, when it came in last among all the LLMs for some obscure reason.
Last, and unfortunately also least, let's take a look at how ChatGPT - the most admired [[AI]] of 2023 - handled this puzzle using their most advanced GPT-4 model:
> [!bug] GPT-4 Response
> To solve this riddle, let's break down the information given about Sarah's mom's five children:
> 1. **Twins: John and Michael** - Born in 2000, Michael is the youngest.
> 2. **Maria and Carol** - Maria is the oldest, and Carol was born 2 years apart from Maria. Given that John is two years younger than Carol, and considering Michael is the youngest (implying John is not the second youngest), Maria and Carol must be older than the twins.
> 3. **Fifth Child** - One year older than Carol.
> 4. **Ordering** - Since Maria is the oldest and Michael is the youngest, the order by age, given the spacing, should place Maria first, followed by Carol, then the fifth child (one year younger than Maria, as Carol and the fifth child must fit between Maria and the twins
> *Stopped analyzing*
First of all, the speed of response was super slow, and then it just "Stopped analyzing". I tried again twice within ten minutes, and got slightly different answers, but it always gave up without actually solving this riddle. Furthermore, the reasoning in #4 above is simply incorrect.
I couldn't really believe how bad GPT-4 was, so I decided to try it again the next day. This time at least it gave me an answer, but just started rambling on about how it could only determine the order (which it got correct), but not determine the actual years of birth:
> [!fail] GPT-4 Response
> Let's solve this puzzle step by step:
>
> 1. **Twins**: John and Michael were born in 2000, making Michael one of the twins the youngest.
> 2. **Maria is the oldest**: Since Maria is mentioned as the oldest, she must have been born before 2000.
> 3. **Carol and Maria were born 2 years apart**: Since Maria is the oldest and there's no mention of anyone else being her twin, we can deduce Maria is 2 years older than Carol.
> 4. **John is two years younger than Carol**: This confirms Carol is between John and Maria in age.
> 5. **Fifth child is one year older than Carol**: This child fits between Carol and Maria in age.
>
> Given that John and Michael are twins born in 2000:
>
> - Michael, being the youngest, is confirmed born in 2000.
> - John, also born in 2000.
>
> Maria is the oldest and there's a two-year gap between her and Carol, with a fifth child who is one year older than Carol but younger than Maria. Carol is older than the twins, so she must be born before 2000. If the twins were born in 2000, and John is two years younger than Carol, it implies Carol is part of the 1998 year group. However, since John and Michael are twins, the specific reference to John being younger by two years than Carol is to establish the twins' birth year, not individual age differences outside their twin status.
>
> Let's calculate each child's birth year with correct assumptions:
>
> - Since Michael and John (twins) are born in 2000 and Michael is the youngest, Carol must have been born in 1998 based on the initial misinterpretation. However, considering the context provided and re-evaluating the statements:
> - The statement about John being two years younger than Carol was interpreted in the context of establishing the twins' birth year. Given that, we should focus on the age sequence and the years they were born by recalculating the differences correctly.
> - Maria is older than Carol, and since Carol is directly older than the twins, without specifying her birth year immediately, let's focus on the sequence first.
> - The fifth child is one year older than Carol, placing them between Carol and Maria.
>
> The accurate list, considering the logical sequence and correction:
>
> 1. Maria is the oldest.
> 2. The fifth child, let's name them "X" for now, is one year older than Carol and thus between Maria and Carol.
> 3. Carol follows after "X".
> 4. John and Michael (the twins) are the youngest, born in 2000.
>
> Let's deduce the birth years:
>
> - Since John and Michael are the youngest, born in 2000, and there's no direct information on the exact years the others were born besides their sequence, we need to backtrack from 2000 considering the age gaps.
> - Given the clarified sequence and relationships, let's assign hypothetical years to align with the age gaps: If we consider the twins born in 2000, and knowing the relative age order but not the exact years of birth for Maria, "X", and Carol beyond the twins being the youngest, we calculate based on age differences.
>
> However, a mistake was made in calculating and interpreting the initial instructions regarding the birth years based on the sequence provided. Let's correct this oversight and provide the accurate calculation:
>
> 1. **Maria**: As the oldest, but without a specific year given, we focus on her relative position.
> 2. **Fifth Child (Unnamed)**: One year older than Carol, making this child directly older than Carol but younger than Maria.
> 3. **Carol**: Known to be younger than Maria and older than the twins.
> 4. **John and Michael**: Twins born in 2000.
>
> Given these relationships and the puzzle's structure, it becomes clear the exact birth years cannot be definitively calculated for Maria, the fifth child, and Carol without assuming specific years based on their relative ages. The key detail provided is the twins' birth year (2000), and the rest must be inferred from the age order and gaps described. My initial attempt to calculate the years based on the twins' birth year was misguided without explicit years for the others.
>
> For the sake of clarity and correcting the oversight:
>
> - **Maria** is the oldest, but her birth year isn't specified.
> - The **fifth child** is one year older than Carol but younger than Maria.
> - **Carol**'s birth year was incorrectly tied directly to the twins without clear justification.
> - **John and Michael** are confirmed as the youngest, born in 2000.
>
> Given these constraints, we correctly list the children by age but cannot specify all their birth years accurately beyond John and Michael without making assumptions. The puzzle focuses more on determining the sequence of ages rather than specific birth years for each child beyond the twins.
>
> The correct interpretation based on the information given would list the children by age without assuming incorrect birth years beyond what
When prompted about the name of the fifth child, it finally figured that out:
> [!notice] GPT-4 Response
> The puzzle begins with, "Sarah's mom has five children," which subtly introduces Sarah as one of the children. Given the structure of the puzzle and the way it cleverly distracts with details about the other four children, it's easy to overlook the simple fact that Sarah is indeed the fifth child mentioned at the very beginning. So, the names of all five children, sorted by age with Maria being the oldest and Michael being the youngest, are:
>
> 1. Maria
> 2. The fifth child, who is Sarah, one year older than Carol.
> 3. Carol
> 4. John (twin, born in 2000)
> 5. Michael (twin, born in 2000)
But when prompted additionally about trying harder to determine the years of birth, it just went rambling on for three more pages of text essentially just going in circles and then got stuck again and stopped responding - this time without even throwing a "stopped analyzing" error. It just hung and stopped talking to me, like it was lost somewhere in an endless loop...
This feels like a classic case of the [[Red Sox]] going from first to worst in the span of just one year. 😉
And, in true Red Sox manner, OpenAI managed to go from worst to first again with GPT-4o in May of 2024...
---
# References
* https://claude.ai/chat/
* https://www.meta.ai/
* https://gemini.google.com/app
* https://chat.openai.com/chat
* https://twitter.com/i/grok
* https://lmstudio.ai/
* [[Chat with RTX]]