Here’s how it works. Microsoft has released a set of benchmarks showing Phi-4 outperforming even large language models like Gemini Pro 1.5 on math competition problems. Small language models ...
When benchmarked using math competition problems, Phi-4 has been able to beat out heavyweights such as Claude Sonnet 3.5, GPT 4o, and Google Gemini Pro 1.5. Microsoft has been able to achieve ...
Results that may be inaccessible to you are currently showing.