Richard Goh uses Anthropic Claude 2.0 to solve challenging PSLE math questions, demonstrating his business transformation skills. Discover valuable insights from this digital strategy expert as he tackles complex problems with innovative technology.
Highlights
- Who is Richard Goh?
- Exploring the Role of AI in Education: A Case Study with Claude 2.0
Who is Richard Goh?
Richard Goh is a professional with expertise in Business Transformation, Digital Strategy, and Innovation Management. He has a proven track record in driving strategic initiatives and managing complex transformation projects. Additionally, Richard Goh serves as an AWS Generative AI Ambassador, showcasing his knowledge and experience in artificial intelligence.
This article discusses tough PSLE Mathematics questions with Anthropic Claude 2.0, highlighting Richard Goh’s insights and experiences.
Exploring the Role of AI in Education: A Case Study with Claude 2.0
AI and Education
I’ve recently been working on some demonstrations for education using AI. Education is one of the new frontiers for AI, specifically Generative AI. AI in education is a much-discussed topic, and different pedagogy and education professionals are still arguing about many aspects. I am not quite one, but having been a Primary School Leaving Examinations (PSLE) parent before, I’d always thought about how having an AI tutor on mathematics would have been a great idea.
Richard Goh: The Potential of AI in Mathematics Education
Mathematics is a subject of logic, and to be good at it, one needs not only to understand the underlying logical thinking but practice thinking of a problem is key. Often, it requires an idea very similar to how we would prompt a large language model (LLM) called chain of thought (COT) to be internalized. I was curious if I could get it to work on some of the toughest mathematics questions from PSLE, as determined by netizens. Setting the context for the model is a key step in getting an LLM to respond according to your required scenario.
Context Setting for Claude 2.0
I instructed Claude 2.0 to discuss mathematics with students using a Socratic method. This requires asking students questions to derive an answer. To limit the conversation, I requested that it refuse to answer questions not related to mathematics. Instead of directly providing solutions or answers, we want the model to provide guidance that leads to the answers. This is part of learning—not getting answers, but learning how to get to the answers. There are instructions for it to go step by step, explaining the steps needed to get to answers. Of course, we always ask it to be encouraging and motivating.
Testing with 2017 Tough PSLE Question
In 2017, one PSLE question ruffled some parents, and the students taking it considered it tricky. It involved working out the number of rolls of ribbons someone called Jess needed. Claude v2 understood the initial context instructions and broke down step-by-step questions that the student needed to look into.
Assessing a Student’s Workings and Answers
Based on the question guides, I found that the answer was 9. Claude v2 agreed and gave me excellent, encouraging statements. But was it the right answer?
Claude v2 Admitting It Made a Mistake
The issue with this PSLE question was that it was logically correct in the calculation method, but logic would have it that if ribbons were in rolls, the remaining 80cm in the roll after each of the 22 110cm ribbons can’t be used to form another 110cm ribbon. Claude admitted to the mistake and explained how that information could be helpful to get to the correct answer of 10. Another flaw in this explanation was how it said there was 100cm left. 80cm would have been the correct leftover.
The Flaws of Large Language Models in Logical Tasks
Herein, some key flaws of Large Language Models have come to bear. Math is a logical task, and utilizing the probability of the answer, which is the fundamental technique for LLMs, isn’t exactly going to give you the right answer every time, as evidenced by the mistake made on ribbons left over. More importantly, the LLM alone may not be able to navigate the additional logical “common sense” of the real world. Common sense is that if one wants a continuous 110cm ribbon, the leftover 80cm of ribbon of each roll cannot fulfill that. Let’s take a step back here. Isn’t this the complaint of some of the parents and students about the trickiness of this question, then? Let’s try a different “common sense” logic question.
2015 Tough PSLE Math Question
In 2015, the parents were up in arms over a question that tested students’ common sense and deduction skills. Based on four multiple-choice questions, it asked what would be the likely weight of eight $1 Singapore coins. Claude went in and asked the student to take it step by step, asking the first question.
Claude Started with Using the Actual Weights of Coins
I went cheeky and told Claude I had no idea. It started saying that based on online search, the weight of $1 Singapore coin weighs approximately 7.5 grams. Now, this is a curious phrasing as we know Claude is not connected to any Internet search. But in an exam condition like PSLE, I wouldn’t be asking for a search engine, so I prompted it and said what if I don’t have access to the internet? Can I derive from the choices? At this point, it started making some interesting derived explanations and asked me to provide a guess.
Claude Explaining the Solution
I gave it a quick answer, and Claude responded with a full explanation and the answer. A bit too quick to jump to the answer, as I would have liked it to prompt me further with questions to get me to the answer.
- 2021 Coins and Value of Money PSLE Question: Since we are on the topic of coins, in 2021, there was a PSLE question relating to coins that got parents and students alike into a frenzy. Let’s see how Claude handles it. This question had two parts to it. I broke up the two parts and tested with the first. Claude broke down the question of facts into bullets and then added on step-by-step guides on how to proceed to solve it. Though it still didn’t seem right.
- Claude Trying to Solve Using the Mass of Coins: So I said I wasn’t sure. It gave me a more detailed explanation based on the premise of one knowing the mass of coins. This is very similar to the previous question. I decided to prompt it based on the facts within the question.
- Claude Using Information It Has About Weight of Coins: With an initial prompt to say I didn’t know about the mass, Claude provided that information and calculated the solution. But this isn’t how we want it to work. It needed to be based only on information it had in the question to answer it.
- Claude Providing the Answer and Explanation to Solving 2021 Tough PSLE Math Question: With the right prompting to say that we need to derive the answer without knowing the weights, Claude managed to provide the full explanation to solve the answer. Helen had more money as she had 40 more 50-cent coins. Notice the small mistake here again where Claude says that 50-cent coins are worth 20 cents more each instead of 30 cents more each than the 20-cent coins. It doesn’t affect the final answer but is still an area of weakness for large language models when it comes to mathematical calculations.
The Capabilities and Limitations of Claude 2.0
Based on the tests above, Claude v2 has some capabilities in answering these tough PSLE questions. Still, it does suffer from some weaknesses of large language models, like the accuracy of calculations. It also requires the right prompting to actually solve the problem accordingly. This is apparent when it does not understand common sense logic, such as ribbons can’t form up a whole piece when left with the remainder. There is also a need to adjust through prompting how it would look to solve the question. There is yet some work to use it directly as an AI tutor, requiring some engineering and prompting work to be done on top of the raw model itself.
Highlights
- Who is Richard Goh?
- Exploring the Role of AI in Education: A Case Study with Claude 2.0