Meta-analysis of reasoning LLM evaluation, benchmarks, and experimental data from the OpenThoughts3 project.
Jun 4, 2025