Here’s how it works. Microsoft has released a set of benchmarks showing Phi-4 outperforming even large language models like Gemini Pro 1.5 on math competition problems. Small language models ...
When benchmarked using math competition problems, Phi-4 has been able to beat out heavyweights such as Claude Sonnet 3.5, GPT 4o, and Google Gemini Pro 1.5. Microsoft has been able to achieve ...
Some results have been hidden because they may be inaccessible to you