A new study tested ChatGPT's ability to evaluate the truth of hundreds of scientific hypotheses, revealing significant limitations in the AI's reasoning capabilities. The chatbot achieved approximately 80% accuracy on surface-level assessments but showed concerning inconsistencies in its responses.

The research highlighted a critical flaw in AI reasoning: when researchers accounted for random guessing, ChatGPT's performance dropped significantly, suggesting only modest actual reasoning ability. This indicates the AI may be relying more on pattern matching than genuine scientific understanding.