From flaky to flawless.
Rebuilding a Cypress pipeline from the ground up. How an 80%-flaky, 12-minute Cypress suite was rebuilt into a sub-5-minute pipeline with on-demand execution, custom failed-test rerun, 100+ scripts — plus a bespoke LLM evaluation tool for AI chatbot accuracy testing.
High/med flakiness eliminated
Execution time reduction
Automation scripts
CI pipeline duration
Quality engineering for golf's leading revenue management platform.
Noteefy is golf's leading demand and revenue management platform, trusted by over 800 courses nationwide including 80 of the top 200 public courses and 9 of the top 12 multi-course operators. Its product suite — Waitlist, Confirm, AI Pro Shop Assistant, and Lead Management — helps course and resort operators automatically fill cancelled tee times, reduce no-shows, and deliver a better booking experience.
The platform handles high-throughput, real-time automation across multiple chatbot integrations and AI-powered features, making quality engineering critical to product reliability and customer trust. We rebuilt the Cypress automation suite from the ground up — stabilising a fragile pipeline, engineering new CI capabilities, and delivering bespoke tooling that turned a slow, unreliable CI environment into a fast, trustworthy foundation for continuous quality delivery.
A CI pipeline nobody trusted.
When the team joined the project, the Cypress automation suite was in a fragile state. The CI pipeline was slow, unreliable, and difficult to debug — making continuous delivery more painful than productive. The brief: stabilise the suite, engineer new pipeline capabilities, and turn a slow, unreliable CI environment into a fast, trustworthy foundation for continuous quality delivery.
Pervasive test flakiness
Over 80% of the Cypress suite was flaky on arrival u2014 including high and medium priority failures. Tests failed intermittently without code changes, making it impossible to trust CI signals or distinguish real bugs from infrastructure noise.
No on-demand execution
All Cypress tests ran only as part of deployment-triggered GitLab pipelines. The QA team had no way to trigger a run independently u2014 any manual validation required a code push, creating friction for every quality check.
No failed-test rerun mechanism
When tests failed, there was no way to rerun only the failures. The entire suite had to be re-executed, wasting time and making it slow to confirm whether a fix resolved a specific failure.
Slow CI execution (12u201315 minutes)
The full Cypress CI job took 12u201315 minutes per run. With frequent commits and pipeline-triggered execution, this stacked up quickly across multiple runs per day.
Stabilise, then engineer, then expand.
Flaky test eradication
Systematic audit of all 70+ Cypress scripts identified root causes: back-to-back deployments leaving the server unresponsive, data collisions in parallel runs, fragile locators. By mid-February, all high and medium priority flakiness was eliminated.
On-demand GitLab pipeline
A configurable manual trigger was built directly into the GitLab CI/CD pipeline u2014 allowing the QA team to execute the full Cypress suite on demand against any environment or branch without needing a deployment event.
Custom failed-test rerun script
Without a Cypress Business plan subscription, a custom rerun script was written to extract and re-execute only failed tests from a prior GitLab run u2014 directly from the GitLab environment. One-click rerun, immediate results.
Execution time reduction
Pipeline count was increased and Gmail session logins made reusable through shared session storage u2014 dramatically reducing per-test auth overhead. The full CI job went from 12u201315 minutes to under 5 minutes.
Pipeline failure monitoring and triage
A dedicated Slack channel received all CI pipeline failure notifications in real time. The team monitored, classified, and reported on every failure u2014 distinguishing genuine test failures from environment noise.
Automation coverage expansion to 100+
New Cypress scripts continuously added for new features u2014 AI Assistant, Lead Management, Analytics, Booking Engine, Admin Dashboard u2014 growing the suite from ~70 to 100+ scripts. Cursor AI accelerated authoring; Greptile reviewed code.
Three phases. One pipeline, rebuilt.
The engagement progressed methodically — stabilising the existing suite first, then engineering new pipeline capabilities, and finally expanding coverage and introducing AI tooling to raise overall quality velocity.
What the work delivered.
All high/medium priority flaky tests eliminated
From 80%+ flakiness on arrival to zero high/medium priority failures by mid-February 2026 u2014 a trustworthy CI signal for the first time.
CI pipeline cut from 12u201315 minutes to under 5
A 70%+ reduction achieved through parallelisation, increased pipeline count, and shared Gmail session storage.
On-demand test execution delivered in GitLab
The QA team can now trigger the full Cypress suite against any environment or branch independently, without waiting for a deployment event.
Custom failed-test rerun mechanism
Built without Cypress Business subscription u2014 a bespoke GitLab-native script for one-click rerun of only failing tests, saving hours of unnecessary re-execution.
Automation coverage grew from ~70 to 100+ scripts
Covering AI Assistant, Lead Management, Analytics, Booking Engine, Admin Dashboard, and Player Groups modules.
testBerry LLM evaluation tool delivered
Structured A/B testing of AI chatbot accuracy with ground truth management, Slack integration, and a full run history dashboard.
The stack.
The Noteefy engagement is a case study in pipeline rehabilitation. By addressing flakiness at its root, engineering custom tooling that worked within the client's existing subscription constraints, and delivering a purpose-built LLM evaluation platform on top — the team transformed a slow, unreliable CI environment into a fast, trustworthy foundation for continuous quality delivery.