An exploration of using calculation generation to improve GPT-4’s capabilities for numeric reasoning.
GPT-4 is weak at calculating with numbers, compared to other resasoning capabilities. It makes mistakes that lead to problematic user experiences on texts involving basic numeric problems. We describe a simple, general technique to address this, by adding a step generating calculation code, apply it to some widely reported real-world failures of GPT-4-based AI, do some evaluation and discuss related issues.