- Be Datable
- Posts
- AI Models Don't Think Like "Us"
AI Models Don't Think Like "Us"
Four Puzzles Apple Says Prove AI Doesn't Think While ChatGPT Becomes the 5th Most Visited Website on the Planet
The internet exploded this week over new research questioning whether AI models actually "think" or merely simulate reasoning. The paper reveals that our desperate need to anthropomorphize technology has clouded our judgment about what these systems do.
The Great Thinking Divide
Two camps have emerged in this debate. One side insists AI will never truly think. The other claim is that these systems already outperform most humans in reasoning tasks.
To me, both sides miss the point.
SimilarWeb data shows that ChatGPT has secured its 5th place position among global websites, accounting for 1.28% of total web traffic. This isn't just adoption. It's integration into how millions of people process information daily.

But usage doesn't equal thinking.
The research paper The Illusion of Thinking (Shojaee et al., 2025) from Apple provides crucial clarity. When models face increasingly complex problems, they exhibit three distinct phases.
Low complexity: Standard models often outperform "thinking" models while using fewer tokens. The extra reasoning creates inefficiency.
Medium complexity: Thinking models show clear advantages as problems require more sophisticated approaches.
High complexity: Both approaches collapse completely.
This pattern reminds me of the similarities between artificial and human reasoning. Similar to heuristics or Daniel Kahneman and Amos Tversky's dual-process theory, we see that reasoning has natural limits and optimal applications. Just as humans use System 1 (fast, automatic) thinking for simple tasks and System 2 (slow, deliberate) thinking for more complex problems, AI models show similar efficiency patterns. The key insight is that more reasoning isn't always better. It depends on matching the cognitive effort to the problem's actual difficulty.

This chart is a little annoying, and many have pointed out why. First, the tests being done have some real scale that almost guarantees failure.
The title completely misrepresents the findings. This isn't about whether AI thinks, it's about how reasoning effort scales.
The human parallel is perfect: we also peak our effort on medium-hard problems, then mentally check out when things get impossible. Ever notice how you spend 30 minutes on a tough sudoku but give up immediately on an expert-level one?
The real insight: both humans and AI hit cognitive "sweet spots" where effort peaks, then dramatically drop off. It's like having different mental gears for different problem types.
This suggests reasoning inconsistency isn't a bug to fix but a fundamental feature of how intelligence allocates computational resources; be it biological or artificial.
What the Data Shows
The research team at Apple tested reasoning models across four controllable puzzle environments. They used four different puzzles: the Tower of Hanoi, Checker Jumping, River Crossing, and Blocks World. With these puzzles, they could precisely manipulate complexity while maintaining logical consistency.
Again, this is a very clever approach, but the results don’t tackle “thinking”.
At low complexity, models often find correct solutions early but continue exploring incorrect alternatives. Anyone with an indecisive friend recognizes this "overthinking" pattern immediately.
At medium complexity, correct solutions emerge only after extensive exploration of wrong paths.
At high complexity, models fail to generate any correct solutions despite having a massive computational budget. This budget, however, is spent in different ways and may still be inadequate for the size of the task presented in the more complex puzzles.
How is that different from how we think? If you ask a person beyond a certain task complexity, they will not be able to answer without reaching out to external tools like a calculator, book, or other people. This paper outlines exactly the wrong way to look at this.
The most counterintuitive finding is that reasoning models reduce their effort as problems become harder, despite operating well below their generation limits.
This suggests computational scaling limitations that mirror human cognitive patterns.
Reframing Our Approach
We need to stop asking whether AI "thinks" and start asking better questions.
What specific cognitive tasks do these systems perform well? The data show clear advantages in medium-complexity reasoning scenarios where systematic exploration is helpful.
Where do they fail predictably? High-complexity planning tasks consistently break down, regardless of the computational resources allocated.
How can we design better human-AI collaboration? Understanding these performance regimes lets us assign tasks more effectively.
What does this teach us about human reasoning? The parallels between AI and human reasoning failures offer insights into cognitive limitations we all share.
Consider this perspective shift. Instead of anthropomorphizing these systems, we could view them as sophisticated tools that amplify specific types of human reasoning while failing at others.
The Anthropomorphization Trap
We're making this too easy for people to forget we're discussing machines. Mathematical models that mimic thought processes aren't thinking beings.

Humans are Arguing about AI Thinking.
This isn't mere semantics. The language we use shapes policy, investment, and public understanding.
Alex Karp's recent warning about societal upheavals highlights why precision matters. In a Fortune interview (Karp, 2025), the Palantir CEO urgently cautioned that the unregulated introduction of AI poses a risk to social stability, particularly through the elimination of traditional entry-level jobs. When we anthropomorphize AI capabilities, we create unrealistic expectations and miss actual limitations that could cause real harm.
The same thing happens when people treat their dogs like humans. South Korea sold more strollers for dogs than for human children this year. That's not love for pets. That's a projection of human needs onto non-human entities.

There’s a lot of this in my town. (With dogs, not AI).
We risk a similar projection with AI technology.
The Knowledge Revolution
The human race faces questions as significant as any in our history. The advent of the Internet gave us access to content. Now we're stumbling into access to knowledge itself.
Andrej Karpathy recently argued in a commencement speech that AI will eventually be able to do everything we can do, and even better. His reasoning: the human brain is nothing more than an organic computer. We will either recreate that organic computer at some point or create a silicon-based computer that's even more powerful.
You can't argue with the fact that ChatGPT has a better recall of every fact it has ever seen, with mild hallucination problems that continue to decline. But recall isn't reasoning. Processing isn't pondering.
This shift requires rethinking what it means to think.
ChatGPT demonstrates superior recall of facts compared to human memory, with mild hallucination issues that are gradually declining. But recall isn't reasoning. Processing isn't pondering.
The distinction matters for everything from education policy to workforce planning.
The illiterate of the 21st century will not be those who cannot read and write, but those who cannot learn, unlearn and relearn.
This moment demands exactly that type of cognitive flexibility. It reminds me of Adam Grant's excellent book Think Again (Grant, 2021). If you haven't read it, I highly recommend it. Grant demonstrates just how crucial this process of unlearning and rethinking will be as we confront these challenges.

Don’t forget to breathe.
We must unlearn our assumptions about what constitutes thinking. We must learn new frameworks for evaluating artificial reasoning. We must relearn how to collaborate with systems that process information differently from biological brains.
The debate isn't really about whether AI thinks.
It's about whether humans can think clearly enough about AI to navigate the next steps.
The Apple research (Shojaee et al., 2025) shows these systems have clear strengths and predictable weaknesses. They excel at medium-complexity reasoning tasks but struggle with both simple efficiency and complex planning.
That's not thinking. That's processing with particular characteristics.
Understanding those characteristics helps us build better tools, set appropriate expectations, and avoid both the hype and the fear that cloud rational decision-making.
The future depends not on creating thinking machines, but on thinking more clearly about the machines we're creating.
Reply