All projects

From flaky to flawless.

Rebuilding a Cypress pipeline from the ground up. How an 80%-flaky, 12-minute Cypress suite was rebuilt into a sub-5-minute pipeline with on-demand execution, custom failed-test rerun, 100+ scripts — plus a bespoke LLM evaluation tool for AI chatbot accuracy testing.

Cypress + GitLab CI Flaky Test Eradication LLM Evaluation
From flaky to flawless. project cover
0% - 0%

High/med flakiness eliminated

0%

Execution time reduction

0+

Automation scripts

<5 min

CI pipeline duration

Overview

Quality engineering for golf's leading revenue management platform.

Noteefy is golf's leading demand and revenue management platform, trusted by over 800 courses nationwide including 80 of the top 200 public courses and 9 of the top 12 multi-course operators. Its product suite — Waitlist, Confirm, AI Pro Shop Assistant, and Lead Management — helps course and resort operators automatically fill cancelled tee times, reduce no-shows, and deliver a better booking experience.

The platform handles high-throughput, real-time automation across multiple chatbot integrations and AI-powered features, making quality engineering critical to product reliability and customer trust. We rebuilt the Cypress automation suite from the ground up — stabilising a fragile pipeline, engineering new CI capabilities, and delivering bespoke tooling that turned a slow, unreliable CI environment into a fast, trustworthy foundation for continuous quality delivery.

Project at a Glance
Client
Engagement
Automation Engineering
Automation
Cypress (JavaScript)
Environments
GitLab CI/CD
Industry
Golf Tech / SaaS
Project Type
Automation Engineering
Onboarded
November 2025
Automation Framework
Cypress (JavaScript)
AI Coding Tool
Cursor AI
Code Review
Greptile
Repo + CI/CD
GitLab
The Challenge

A CI pipeline nobody trusted.

When the team joined the project, the Cypress automation suite was in a fragile state. The CI pipeline was slow, unreliable, and difficult to debug — making continuous delivery more painful than productive. The brief: stabilise the suite, engineer new pipeline capabilities, and turn a slow, unreliable CI environment into a fast, trustworthy foundation for continuous quality delivery.

01

Pervasive test flakiness

Over 80% of the Cypress suite was flaky on arrival u2014 including high and medium priority failures. Tests failed intermittently without code changes, making it impossible to trust CI signals or distinguish real bugs from infrastructure noise.

02

No on-demand execution

All Cypress tests ran only as part of deployment-triggered GitLab pipelines. The QA team had no way to trigger a run independently u2014 any manual validation required a code push, creating friction for every quality check.

03

No failed-test rerun mechanism

When tests failed, there was no way to rerun only the failures. The entire suite had to be re-executed, wasting time and making it slow to confirm whether a fix resolved a specific failure.

04

Slow CI execution (12u201315 minutes)

The full Cypress CI job took 12u201315 minutes per run. With frequent commits and pipeline-triggered execution, this stacked up quickly across multiple runs per day.

What We Did

Stabilise, then engineer, then expand.

01

Flaky test eradication

Systematic audit of all 70+ Cypress scripts identified root causes: back-to-back deployments leaving the server unresponsive, data collisions in parallel runs, fragile locators. By mid-February, all high and medium priority flakiness was eliminated.

02

On-demand GitLab pipeline

A configurable manual trigger was built directly into the GitLab CI/CD pipeline u2014 allowing the QA team to execute the full Cypress suite on demand against any environment or branch without needing a deployment event.

03

Custom failed-test rerun script

Without a Cypress Business plan subscription, a custom rerun script was written to extract and re-execute only failed tests from a prior GitLab run u2014 directly from the GitLab environment. One-click rerun, immediate results.

04

Execution time reduction

Pipeline count was increased and Gmail session logins made reusable through shared session storage u2014 dramatically reducing per-test auth overhead. The full CI job went from 12u201315 minutes to under 5 minutes.

05

Pipeline failure monitoring and triage

A dedicated Slack channel received all CI pipeline failure notifications in real time. The team monitored, classified, and reported on every failure u2014 distinguishing genuine test failures from environment noise.

06

Automation coverage expansion to 100+

New Cypress scripts continuously added for new features u2014 AI Assistant, Lead Management, Analytics, Booking Engine, Admin Dashboard u2014 growing the suite from ~70 to 100+ scripts. Cursor AI accelerated authoring; Greptile reviewed code.

Engagement Journey

Three phases. One pipeline, rebuilt.

The engagement progressed methodically — stabilising the existing suite first, then engineering new pipeline capabilities, and finally expanding coverage and introducing AI tooling to raise overall quality velocity.

Phase
Focus
Automation State
Coverage Scope
Stabilise
Flaky test eradication across 70+ scripts; root-cause audit of intermittent failures.
Existing suite
High/med priority flows
Engineer
On-demand triggers, custom failed-test rerun, and execution-time reduction to under 5 minutes.
New CI capabilities
Full pipeline + GitLab
Expand
Coverage grown to 100+ scripts; testBerry LLM evaluation tool delivered for chatbot accuracy.
AI-accelerated
Full product + AI surface
Results & Impact

What the work delivered.

All high/medium priority flaky tests eliminated

From 80%+ flakiness on arrival to zero high/medium priority failures by mid-February 2026 u2014 a trustworthy CI signal for the first time.

CI pipeline cut from 12u201315 minutes to under 5

A 70%+ reduction achieved through parallelisation, increased pipeline count, and shared Gmail session storage.

On-demand test execution delivered in GitLab

The QA team can now trigger the full Cypress suite against any environment or branch independently, without waiting for a deployment event.

Custom failed-test rerun mechanism

Built without Cypress Business subscription u2014 a bespoke GitLab-native script for one-click rerun of only failing tests, saving hours of unnecessary re-execution.

Automation coverage grew from ~70 to 100+ scripts

Covering AI Assistant, Lead Management, Analytics, Booking Engine, Admin Dashboard, and Player Groups modules.

testBerry LLM evaluation tool delivered

Structured A/B testing of AI chatbot accuracy with ground truth management, Slack integration, and a full run history dashboard.

Tools & Technology

The stack.

Automation
Cypress (JS) u2014 full end-to-end test suite
AI Development
Cursor AI u00b7 Greptile u2014 accelerated authoring and code review
CI/CD + Repo
GitLab u00b7 Cypress Cloud
Communication
Slack u2014 real-time pipeline failure alerts and result summaries
LLM Evaluation
testBerry u2014 bespoke AI chatbot accuracy evaluation platform
The Takeaway

The Noteefy engagement is a case study in pipeline rehabilitation. By addressing flakiness at its root, engineering custom tooling that worked within the client's existing subscription constraints, and delivering a purpose-built LLM evaluation platform on top — the team transformed a slow, unreliable CI environment into a fast, trustworthy foundation for continuous quality delivery.