Forbes contributors publish independent expert analyses and insights. I write about the economics of AI. What looks like intelligence in AI models may just be memorization. A closer look at benchmarks ...
Nous Research's open-source Nomos 1 AI model scored 87/120 on the notoriously difficult Putnam math competition, ranking second among 4,000 human contestants with just 30 billion parameters.
Large language models (LLMs), artificial intelligence (AI) systems that can process and generate texts in various languages, ...
At the heart of this breakthrough lies AlphaProof, a sophisticated formal reasoning AI model developed by the brilliant minds at Google DeepMind. This innovative system has demonstrated an ...
In 2025, large language models moved beyond benchmarks to efficiency, reliability, and integration, reshaping how AI is ...
Google DeepMind, Google LLC’s artificial intelligence research unit, today unveiled two new AI models that are capable of advanced mathematical reasoning for solving complex math problems, which ...
Phi-4 will compete with other small models such as GPT-4o mini, Gemini 2.0 Flash, and Claude 3.5 Haiku. Share on Facebook (opens in a new window) Share on X (opens in a new window) Share on Reddit ...
The floodgates have opened for building AI reasoning models on the cheap. Researchers at Stanford and the University of Washington have developed a model that performs comparably to OpenAI o1 and ...
In a new paper from OpenAI, the company proposes a framework for analyzing AI systems' chain-of-thought reasoning to understand how, when, and why they misbehave.
Deepseek, a Chinese company, has introduced its Deepseek R1 model, attracting attention for its potential to rival OpenAI’s latest offerings. Reportedly outperforming OpenAI’s o1 Preview in benchmarks ...