Tuesday, 02 January 2024 12:17 GMT

Kushoai Unveils Apieval-20 To Benchmark AI Agents In API Testing


(MENAFN- PR Newswire)
    Authentication Failures Drive 34% of API Outages, New Benchmark Aims to Address Gap APIEval-20 Built for APIs That Change Faster Than They're Documented Records 100+ downloads in its first week of public release

SAN FRANCISCO, April 8, 2026 /PRNewswire/ -- KushoAI, an AI-native platform for API testing and software reliability, has introduced APIEval-20, an open benchmark designed to evaluate how effectively AI agents can identify functional bugs in APIs, using only a request schema and a sample payload, with no access to source code or documentation.

This shift comes at a time when API reliability remains a growing concern. Analysis of over 1.4 million AI-driven test executions across 2,616 organizations indicates that authentication failures alone contribute to 34% of API outages, while 41% of APIs experience undocumented schema changes within a month. Despite this, most existing evaluation methods fail to capture whether AI tools can systematically detect such issues.

Instead of replicating ideal testing environments, APIEval-20 intentionally introduces constraints that mirror real-world conditions, incomplete context, evolving schemas, and hidden dependencies, pushing AI agents to operate more like human QA engineers than automated validators.

Abhishek Saikia, Co-Founder & CEO of KushoAI, said, "The conversation around AI in testing has largely been about automation. What's been missing is accountability, a way to measure whether these systems actually work. APIEval-20 brings that accountability into the equation."

Saikia added, "The conversation we expected was about the benchmark itself. What we actually heard from engineers in week one was that they had been sitting with this question for months and had no way to answer it. That validation matters more to us than the download number."

The benchmark includes 20 scenarios spanning domains such as payments, authentication, e-commerce, scheduling, user management, notifications, and search. Each environment is seeded with 3 to 8 bugs, ranging from straightforward validation issues to deeper logic flaws that require multi-step analysis.

Measuring What Actually Matters

APIEval-20 introduces a scoring model aligned with real-world priorities:

    Bug detection (70%) to capture practical effectiveness Coverage (20%) to assess breadth of testing Efficiency (10%) to evaluate resource usage

Benchmark Report: Dataset: huggingface/datasets/kusho-ai/api-eval-20

About KushoAI

KushoAI is an AI-native API testing and software reliability platform. Used by 30,000+ engineers across 6,000+ organizations, backed by Antler and Blume Ventures. Visit kusho or contact [email protected].

Logo:

SOURCE KushoAI

21% more press release views with Request a Demo

MENAFN08042026003732001241ID1110958284



PR Newswire

Legal Disclaimer:
MENAFN provides the information “as is” without warranty of any kind. We do not accept any responsibility or liability for the accuracy, content, images, videos, licenses, completeness, legality, or reliability of the information contained in this article. If you have any complaints or copyright issues related to this article, kindly contact the provider above.

Search