Enquiry

Your journey to success starts here.Let us help you with your inquiries!

Name ^* Company Name

Email Address ^* Phone (Optional)

Description

By submitting this form I accept the Privacy Policy of this site.

From flaky to flawless.

Rebuilding a Cypress pipeline from the ground up. How an 80%-flaky, 12-minute Cypress suite was rebuilt into a sub-5-minute pipeline with on-demand execution, custom failed-test rerun, 100+ scripts — plus a bespoke LLM evaluation tool for AI chatbot accuracy testing.

Cypress + GitLab CI Flaky Test Eradication LLM Evaluation

0% - 0%

High/med flakiness eliminated

Execution time reduction

Automation scripts

<5 min

CI pipeline duration

Overview

Quality engineering for golf's leading revenue management platform.

Noteefy is golf's leading demand and revenue management platform, trusted by over 800 courses nationwide including 80 of the top 200 public courses and 9 of the top 12 multi-course operators. Its product suite — Waitlist, Confirm, AI Pro Shop Assistant, and Lead Management — helps course and resort operators automatically fill cancelled tee times, reduce no-shows, and deliver a better booking experience.

The platform handles high-throughput, real-time automation across multiple chatbot integrations and AI-powered features, making quality engineering critical to product reliability and customer trust. We rebuilt the Cypress automation suite from the ground up — stabilising a fragile pipeline, engineering new CI capabilities, and delivering bespoke tooling that turned a slow, unreliable CI environment into a fast, trustworthy foundation for continuous quality delivery.

Project at a Glance

Client

Website

noteefy.com

Engagement

Automation Engineering

Automation

Cypress (JavaScript)

Environments

GitLab CI/CD

Industry

Golf Tech / SaaS

Project Type

Automation Engineering

Onboarded

November 2025

Automation Framework

Cypress (JavaScript)

AI Coding Tool

Cursor AI

Code Review

Greptile

Repo + CI/CD

GitLab

The Challenge

A CI pipeline nobody trusted.

When the team joined the project, the Cypress automation suite was in a fragile state. The CI pipeline was slow, unreliable, and difficult to debug — making continuous delivery more painful than productive. The brief: stabilise the suite, engineer new pipeline capabilities, and turn a slow, unreliable CI environment into a fast, trustworthy foundation for continuous quality delivery.

Pervasive test flakiness

Over 80% of the Cypress suite was flaky on arrival u2014 including high and medium priority failures. Tests failed intermittently without code changes, making it impossible to trust CI signals or distinguish real bugs from infrastructure noise.

No on-demand execution

All Cypress tests ran only as part of deployment-triggered GitLab pipelines. The QA team had no way to trigger a run independently u2014 any manual validation required a code push, creating friction for every quality check.

No failed-test rerun mechanism

When tests failed, there was no way to rerun only the failures. The entire suite had to be re-executed, wasting time and making it slow to confirm whether a fix resolved a specific failure.

Slow CI execution (12u201315 minutes)

The full Cypress CI job took 12u201315 minutes per run. With frequent commits and pipeline-triggered execution, this stacked up quickly across multiple runs per day.

What We Did

Stabilise, then engineer, then expand.

Flaky test eradication

Systematic audit of all 70+ Cypress scripts identified root causes: back-to-back deployments leaving the server unresponsive, data collisions in parallel runs, fragile locators. By mid-February, all high and medium priority flakiness was eliminated.

On-demand GitLab pipeline

A configurable manual trigger was built directly into the GitLab CI/CD pipeline u2014 allowing the QA team to execute the full Cypress suite on demand against any environment or branch without needing a deployment event.

Custom failed-test rerun script

Without a Cypress Business plan subscription, a custom rerun script was written to extract and re-execute only failed tests from a prior GitLab run u2014 directly from the GitLab environment. One-click rerun, immediate results.

Execution time reduction

Pipeline count was increased and Gmail session logins made reusable through shared session storage u2014 dramatically reducing per-test auth overhead. The full CI job went from 12u201315 minutes to under 5 minutes.

Pipeline failure monitoring and triage

A dedicated Slack channel received all CI pipeline failure notifications in real time. The team monitored, classified, and reported on every failure u2014 distinguishing genuine test failures from environment noise.

Automation coverage expansion to 100+

New Cypress scripts continuously added for new features u2014 AI Assistant, Lead Management, Analytics, Booking Engine, Admin Dashboard u2014 growing the suite from ~70 to 100+ scripts. Cursor AI accelerated authoring; Greptile reviewed code.

Engagement Journey

Three phases. One pipeline, rebuilt.

The engagement progressed methodically — stabilising the existing suite first, then engineering new pipeline capabilities, and finally expanding coverage and introducing AI tooling to raise overall quality velocity.

Phase

Focus

Automation State

Coverage Scope

Stabilise

Flaky test eradication across 70+ scripts; root-cause audit of intermittent failures.

Existing suite

High/med priority flows

Engineer

On-demand triggers, custom failed-test rerun, and execution-time reduction to under 5 minutes.

New CI capabilities

Full pipeline + GitLab

Expand

Coverage grown to 100+ scripts; testBerry LLM evaluation tool delivered for chatbot accuracy.

AI-accelerated

Full product + AI surface

Results & Impact

What the work delivered.

✓

All high/medium priority flaky tests eliminated

From 80%+ flakiness on arrival to zero high/medium priority failures by mid-February 2026 u2014 a trustworthy CI signal for the first time.

✓

CI pipeline cut from 12u201315 minutes to under 5

A 70%+ reduction achieved through parallelisation, increased pipeline count, and shared Gmail session storage.

✓

On-demand test execution delivered in GitLab

The QA team can now trigger the full Cypress suite against any environment or branch independently, without waiting for a deployment event.

✓

Custom failed-test rerun mechanism

Built without Cypress Business subscription u2014 a bespoke GitLab-native script for one-click rerun of only failing tests, saving hours of unnecessary re-execution.

✓

Automation coverage grew from ~70 to 100+ scripts

Covering AI Assistant, Lead Management, Analytics, Booking Engine, Admin Dashboard, and Player Groups modules.

✓

testBerry LLM evaluation tool delivered

Structured A/B testing of AI chatbot accuracy with ground truth management, Slack integration, and a full run history dashboard.

Tools & Technology

The stack.

Automation

Cypress (JS) u2014 full end-to-end test suite

AI Development

Cursor AI u00b7 Greptile u2014 accelerated authoring and code review

CI/CD + Repo

GitLab u00b7 Cypress Cloud

Communication

Slack u2014 real-time pipeline failure alerts and result summaries

LLM Evaluation

testBerry u2014 bespoke AI chatbot accuracy evaluation platform

The Takeaway

The Noteefy engagement is a case study in pipeline rehabilitation. By addressing flakiness at its root, engineering custom tooling that worked within the client's existing subscription constraints, and delivering a purpose-built LLM evaluation platform on top — the team transformed a slow, unreliable CI environment into a fast, trustworthy foundation for continuous quality delivery.