API quality at scale.
Checkout verification across 50–100 merchant sites. How bespoke Newman automation, dual-environment regression, and a structured API test library let a small team validate thousands of executions per quarter for the agentic-commerce infrastructure layer — and gracefully scale down as the platform matured.
Test cases created
Peak quarter executions
Sites covered
Defect closure rate
Quality engineering for the agentic-commerce checkout layer.
Rye is the universal agentic checkout infrastructure that solves the core unsolved problem in AI-driven commerce: AI agents can browse and recommend products, but consistently fail at checkout because merchant fraud-detection systems block automated transactions. Rye's Universal Checkout API breaks through this “checkout wall” — provide a product URL and payment token, get back true landed costs and a completed order.
Testing that checkout layer is unlike conventional web QA. Validation must happen across 50–100 live merchant sites in parallel — each with its own page structures, auth flows, fraud thresholds, and rate limits — spanning an API surface of checkout, pricing, tax, shipping, variants, and payments. We built Rye's QA capability from the ground up: bespoke Newman automation that verifies checkout reliability across every site, every cycle, without triggering fraud detection or burning the team out on manual regression.
Testing the checkout layer of the agentic web.
Testing Rye's checkout infrastructure is unlike conventional web application QA. The product sits at the intersection of AI automation, payment processing, and live merchant websites — each introducing its own layer of complexity. The brief: build bespoke automation that verifies checkout reliability across 50–100 live merchant sites every cycle, without triggering fraud detection or burning the team out on manual regression.
Scale: 50u2013100 live merchant sites
Rye's checkout must work reliably across 50u2013100 merchant websites u2014 each with unique page structures, auth flows, fraud thresholds, and checkout behaviours. Manual regression across this many sites per cycle is unsustainable.
Fraud detection sensitivity
Testing automates checkout on live sites, so runs must be carefully managed to avoid triggering fraud detection or rate-limiting. Rate limits on Production and Staging add constraints on how frequently scenarios can be validated.
Dual-environment validation
Every regression cycle requires parallel validation across Production and Staging u2014 98 sites u00d7 3 scenarios u00d7 2 environments = 588 executions per run. This demands a disciplined, automated execution model.
API coverage breadth
Rye's API surface spans checkout, pricing, tax, shipping, variants, payment, and auth. Meaningful coverage from scratch u2014 across POST and GET u2014 required systematic test design across all core payment and commerce flows from quarter one.
API design, then Newman automation, then graceful scale-down.
API test case design u2014 500+ cases
Built a comprehensive library from scratch covering checkout, pricing, tax, shipping, variant handling, payment processing, and authentication. Both POST and GET coverage, with heavy emphasis on the checkout POST flows critical to Rye's core reliability.
Newman automation for multi-site regression
Automated checkout verification using Newman to execute regression across 50u2013100 sites systematically u2014 each cycle covering 98 sites across 3 scenarios in both Production and Staging: 588 executions per run, with no manual effort.
Dual-environment checkout verification
Every cycle executed across Production and Staging in parallel, validating that checkout behaviour, pricing accuracy, and API responses stayed consistent u2014 catching environment-specific discrepancies before they could affect production.
Automation Feature Matrix u2014 115 features
Maintained a structured matrix tracking automation readiness, coverage status, and prioritisation u2014 giving engineering full visibility into which flows were automated, which were pending, and where manual coverage was still required.
Checkout monitoring and verification
As the engagement moved to on-demand, the team maintained daily and weekly checkout verification u2014 sharing structured results, identifying pricing discrepancies, and raising defects with clear repro evidence for rapid resolution.
New feature and tool testing
Beyond regression, tested new product capabilities as they launched u2014 including the ChatGPT Rye tool integration, where 16 bugs were identified and 13 retested and resolved, extending QE coverage into Rye's expanding agentic surface.
Three phases. One product, maturing.
The engagement evolved naturally as Rye's product matured — from foundation-building, to regression-intensive scaling, to on-demand maintenance as the platform stabilised.
What the work delivered.
Newman automation built from scratch
Covering 50u2013100 merchant sites per regression cycle across Production and Staging u2014 588 executions per run, eliminating the manual overhead of large-scale multi-site validation.
10,108 executions in a peak quarter
Across 98 sites, 3 scenarios, and 2 environments u2014 the scale of coverage automated regression enables within the allocated weekly hours.
500+ API test cases built
Covering Rye's full checkout surface u2014 pricing, tax, shipping, variants, payment, and auth u2014 across both POST and GET.
Product stability confirmed by QA data
Defect volume dropped from 8 in the foundation phase to just 2 in a recent period, with a 100% closure rate across the engagement.
Feature Matrix maintained across 115 features
A clear, up-to-date view of automation coverage, readiness, and prioritisation at all times.
New product areas validated on launch
The ChatGPT Rye tool integration was tested with 16 bugs identified and 13 resolved u2014 extending QE coverage into Rye's agentic commerce suite.
The stack.
The Rye engagement shows what QE looks like for cutting-edge API infrastructure. Verifying checkout reliability across 50–100 live merchant sites at scale required building bespoke Newman automation from scratch — not just writing test cases. The result was a QA practice that could validate thousands of executions per quarter, confirm product stability with data, and scale down gracefully as the platform matured — without ever compromising coverage quality.