In some ways, data and its quality can seem strange to people used to assessing the quality of software. There’s often no observable behaviour to check and little in the way of structure to help you ...
Imagine starting your day with a quick, digestible summary of the most important tech conversations happening on Hacker News.
The result is Humanity’s Last Exam (HLE). The dramatically titled test is 2,500 questions, crowdsourced from more than 1,000 ...
Claude Opus 4.6 expands to a 1 million token context window and retrieves info at 76% success, improving large code reviews.
This tracker is no longer being maintained. Numbers and graphics on this page will continue to update automatically but may become out of date as public health agencies wind down reporting of various ...