Anthropic launches Claude 4.5, a powerful AI model that outperforms GPT-5 in coding, aiming to dominate the enterprise ...
Like ACP, AP2 is an open-source protocol designed to let AI agents securely complete purchases. But while ACP emphasizes keeping merchants in control using their existing processors, AP2 focuses on ...
MMLU-Pro holds steady at 85.0, AIME 2025 slightly improves to 89.3, while GPQA-Diamond dips from 80.7 to 79.9. Coding and agent benchmarks tell a similar story, with Codeforces ratings rising from ...
Will the application of AI reduce staff in pursuit of efficiency, or can we design systems that preserve human dignity, agency and shared meaning? This is the tension driving cognitive migration. But ...
Microsoft unveils new AI agents in GitHub Copilot and Azure Migrate that automate legacy code modernization, helping ...
Yet, here comes another model family worth consideration: Meituan, a Chinese food delivery and e-commerce app, attracted the ...
Perplexity AI launches comprehensive search API giving developers access to hundreds of billions of web pages, challenging Google's dominance in search infrastructure.
According to the company, Liquid Nanos deliver performance that rivals far larger models on specialized, agentic workflows such as multilingual data extraction, translation, retrieval-augmented (RAG) ...
Meta released an agentic testing environment, Agents Research Environment, and a new benchmark called Gaia2 to measure agent's real-world adaptability.
The Parallel-R1 framework uses reinforcement learning to teach models how to explore multiple reasoning paths at once, leading to more robust and accurate problem-solving.
ChatGPT Pulse is OpenAI's experiment in creating more autonomous, ambient agents for ChatGPT Pro subscribers on mobile.
Agent Payment Protocol, a new open source standard from Google and 60 other payment players, aims to make transactions made ...