The Mistakenator project was my initial, and, as it turns out, terminal effort to create an app to help students with algebra. I started to make it to help my own students of course, but teaching faculty have very little spare time, so I decided to leave academia in 2016. Here is a link to a stripped down version of the original
https://www.wolframcloud.com/obj/fc9155e9-75d1-4286-8352-0a5531c699aa
If it doesn't work, I just have to reset it since there is no input validation or error handling. I have removed the student record saving and reporting features that I tried to implement in the Wolfram tech stack, albeit somewhat ineptly. At the end I include various files explaining the motivation and implementation of the project.
One might well ask, "Why have you dusted off an old project from 2016 that you stopped working on and that you could not get any VC funding for?" The answer should be obvious: LLM's! The Mistakenator was hard coded with rules, but I suggested, even back then, that a future system would learn the rules that students were following -- both correct and incorrect -- by analyzing student work data. If you think about it, a student following the steps to solve a problem, and making up new math rules as they go along, isn't really all that different from an LLM creatively hallucinating! We will go through some examples now.
Example : (2 +4x)(3 + 4x)
Suppose a student needs to expand (2 +4x)(3 + 4x). Let's ignore the burning issue of whether or not this is a worthwhile task and just assume that it needs to get done. One of the most common wrong answers is 6 + 16x^2 because the student has done the infamous freshman algebra mistake. Here is the output from The Mistakenator. I didn't know how to make it look nice and I was more concerned with which rules were being used.
The incorrect rule has been deduced and a correct solution has been presented as well. No ML/AI magic, all possible reasonable sequences of steps both right and wrong have been hard coded. Here is the output from ChatGPT-3.5.
ChatGPT has presumably seen this problem or something very similar to it before and provides the correct calculation.
There was no mistake of course, and ChatGPT just does the problem correctly again without any effort to tell us what we might have done wrong. Let's switch to GPT-4.
Nailed it! That is as good an explanation as you're going to get. Notice that I didn't request to "use Wolfram" so I believe this is just raw GPT-4. I am encouraged, but this is such a common algebra error, GPT-4 must have encountered it many times during training. Let's look at another example.
Example : (4 +x)(3 + 4x)
Let's consider another common, but less common, error. Suppose a student needs to expand (4 +x)(3 + 4x). Some students will distribute the (4+x) over the 3 correctly but forget to distribute it onto the 4x ,and instead just adds the 4x. This problem occurs from the left and the right of course.
The Mistakenator has recognized that one distribution has been done correctly and one was ignored, but the ignored term was added so 1/3 credit has been assigned. It's debatable how partial credit should be assigned, but we won't be addressing this here. Let's see how GPT-3.5 handled this.
GPT-3.5 is a real push over: polite and deferential to a flaw! Let's see if GPT-4 has a backbone!
Well, first off, GPT-4 does not pretend I am right and I must concede that this is a pretty good answer. It's not quite as specific as I would like, but it pretty much hits the nail on the head. Just for the fun of it, let's engage the Wolfram plug in for a "power boost". I just like saying stuff like that.
Well, this is certainly correct. The expression is not an identity, as we can see by solving and graphing it. Not what I was going for, but this would be good for a slightly more advanced algebra student to understand.
Conclusions
Of course, LLM's have been designed to work with human language, but math...., not so much. Our success in getting meaningful explanations for the above examples was probably due to the relative simplicity of the problems at hand.
Once the LLAMM's (large language and math models -- I made this term up) have matured -- students will finally have tools that won't just show them the answer, or even just a complete solution, they will be able to ask the system to tell them where they made their mistake!
If you're curious to see what it takes to create a rule based, non-machine learning, solution to finding student binomial expansion mistakes I have included some files below.
One file below that is particularly noteworthy involved my efforts to use one and few shot prompting of gpt-3.5-turbo using the Python API with DataCamp hosted jupyter notebooks -- which is a very nice environment. Unfortunately, I was not able to use gpt-4, so my few shot prompting efforts were quite futile. I suspect gpt-4 would have had no problem being appropriately "nudged". I suspect this, and I am not making any comparisons, because the text-bison@001 foundation model on Google Cloud's Vertex AI platform only required one shot with all parameters driven down to their most deterministic values. See the pic below.
Incidentally, among the files below, you can ignore my efforts to create a hit song "It's time for the Mistakenator" unless of course you like that sort of thing. It was going to be for my advertising campaign.