The Reality of AI in Consulting: A Mercor Perspective

Florian Smeritschnig AI in Consulting Consulting Applications Consulting Career February 8, 2026 | 0

Last Updated on February 10, 2026

The question of whether artificial intelligence can truly perform professional consulting work has moved from academic curiosity to practical urgency. Businesses, prospects, and even consulting firms themselves are experimenting with autonomous AI agents in hopes they might eventually replace or augment human consultants. But recent research from Mercor, a research and expert-marketplace startup, suggests that today’s AI agents are nowhere near ready to take over the core analytical and decision-making work that drives consulting value.

A New Way of Evaluating AI in a Professional Consulting Context

Mercor’s new benchmark, called APEX-Agents, was intentionally built to fill a gap left by traditional AI evaluations. Instead of testing isolated question–and–answer ability, APEX-Agents simulates real professional workflows – long-horizon, multi-step tasks that professionals actually face when delivering client work. The researchers involved in Mercor’s project surveyed hundreds of experts from top firms, then constructed data-rich “work environments” with documents, spreadsheets, emails, and tool interactions that an agent would need to navigate to complete meaningful tasks.

In one simulated consulting world, for instance, an AI agent might be tasked with navigating a file system filled with client reports, analyzing consumption patterns, computing penetration metrics using spreadsheets, and then drafting a concise summary of insights. These are not multiple-choice quizzes; they are complex missions that professionals expect to spend hours digesting and executing.

The Results are Out

When Mercor tested a range of leading AI agents on these scenarios, the results were sobering. Even the best-performing models completed fewer than 25 percent of tasks correctly on the first try (Pass@1). According to Mercor’s Pass@1 leaderboard, models like Google Gemini 3 Flash and OpenAI GPT-5.2 performed best, but their success rates hovered near 24% and 23% respectively. Others lagged behind.

Bar chart showing Mercor APEX-Agents Pass@1 performance for “Management Consultant” tasks, comparing leading AI models. Gemini 3 Flash ranks highest at about 24%, followed by GPT-5.2 at roughly 23%, Claude Opus 4.5 and Gemini 3 Pro around 18%, GPT-5 about 18%, Grok 4 at roughly 15%, and GPT-OSS-120B and Kimi K2 Thinking below 5%, illustrating that even top models succeed on fewer than one-quarter of real consulting-style tasks. — Mercor AI Benchmark (Source: Mercor)

What do these numbers mean in practice?

A Pass@1 score reflects whether an agent can finish a complex task correctly on its first attempt. That is, to pass an APEX-Agents task, the output must meet all the rubric criteria defined by consulting experts. A score in the low 20s suggests that even state-of-the-art systems fail the vast majority of real professional tasks.

Mercor’s benchmark also finds that even when agents are allowed multiple tries, performance remains sub-par. With eight attempts on each task, the best agent’s success rate rises only to about 40 percent. Even with retries and retries, agents lack the reliability many business leaders assume they already have.

This is not because generative AI lacks intelligence, nor because it cannot produce formatted slides or drafts of analytical text. It is because consulting thinking is fundamentally an end-to-end process: it involves defining the right questions, prioritizing where to look for information, navigating fragmented knowledge across domains, and synthesizing coherent recommendations that hold up in an ambiguous context.

AI Models Still Fall Short

Mercor’s research highlights precisely where agents fall short. They struggle to manage ambiguity, find the relevant facts buried across multiple files and formats, and maintain context over prolonged sequences of decisions – exactly the challenges consultants face every day.

Put simply: today’s AI can assist analysis, speed up drafts, and improve routine tasks, but it does not yet replicate the core cognitive work of consulting professionals.

For StrategyCase readers, this has implications on two fronts. First, it should temper over-optimistic narratives that AI will soon make consultants obsolete. The ability to execute complex, multi-step client tasks reliably and coherently remains a distinctly human strength. Second, for candidates preparing for consulting interviews, it underscores why strong reasoning, judgment, and prioritization skills still matter far more than polished outputs alone. Agencies like Mercor intentionally modeled their tasks around real consulting workflows because that is ultimately what employers care about — not just clean text or slides, but decisions that flow from structured thinking. McKinsey has already introduced an AI interview that evaluates how candidates perform when working collaboratively with their in-house AI.

The APEX-Agents benchmark is also an open-source contribution, giving researchers and developers a high-fidelity yardstick for improving agent architectures and training methods. Mercor has released the entire dataset, evaluation infrastructure, gold outputs, and rubrics under a Creative Commons license, inviting the AI community to iterate on these tasks.

Challenges Ahead, for AI and Humans

In the broader discussion about automation and knowledge work, Mercor’s results remind us of a fundamental truth: AI today excels at parts of a job; humans still excel at the job itself. Consultants integrate information across domains, wrestle with ambiguity, make judgment calls that blend experience with data, and deliver coherence at scale. Benchmarks like APEX-Agents show that while AI will increasingly augment human capability, the leap to full autonomy in professional consulting remains a substantial challenge.

In other words, AI will change how consulting work is executed, but not necessarily replace the role of the consultant in shaping decisions.

For full details, see Mercor’s APEX-Agents summary (https://www.mercor.com/blog/introducing-apex-agents) and the underlying arXiv research paper (https://arxiv.org/abs/2601.14242), which rigorously document the benchmark design, evaluation rubrics, and model results.

Share the content!

Author

Florian Smeritschnig

Florian spent five years at McKinsey as a Senior Consultant and is a highly experienced consulting interviewer and problem-solving coach. Having conducted interviews with hundreds of candidates, he understands exactly what top-tier firms look for in successful applicants.

He founded StrategyCase.com to make top consulting careers more accessible by providing tailored, up-to-date insights into the recruiting process. Recognized as the most successful case and fit interview coach, Florian has helped his clients secure 700+ offers at MBB, Tier-2 firms, Big 4 consulting divisions, in-house consultancies, and boutique firms over the past four years.

He is the author of “The 1%: Conquer Your Consulting Case Interview,” “The 1%: Case Interview Workbook,” and “Consulting Career Secrets”, all available on Amazon.

The Reality of AI in Consulting: A Mercor Perspective

A New Way of Evaluating AI in a Professional Consulting Context

The Results are Out

AI Models Still Fall Short

Challenges Ahead, for AI and Humans

Author

Search website

StrategyCase.com

👉 Get it on Amazon!

Cart

The Reality of AI in Consulting: A Mercor Perspective

A New Way of Evaluating AI in a Professional Consulting Context

The Results are Out

AI Models Still Fall Short

Challenges Ahead, for AI and Humans

Author

Search website

StrategyCase.com