Back to blog
X24LABS

How AI agents fix CI pipeline failures (v1 architecture)

A technical walkthrough of Stitch v1: log parsing, regex-based classification, scoped model context, and single-branch fix delivery. Written against the v1 design, kept for context on how our thinking evolved.

Historical post. Describes the v1 architecture, including the weighted regex classifier and the CI-triggered single-branch model. Most of what this post describes has been deliberately removed in v2; see Deleting our regex classifier and Local-first CI: the shift for what replaced it and why.

When a CI pipeline fails, someone on the team has to stop what they are doing, read the logs, figure out what broke, fix it, push, and wait for the pipeline to pass. This cycle repeats multiple times per day in most engineering teams.

AI agents can automate this entire loop. Here is how Stitch approaches it.

Step 1: Detect the failure

Stitch runs as a downstream job in your CI pipeline. When an upstream job fails, Stitch activates. It does not poll or webhook. It is triggered by the CI system itself, which means zero latency and no external dependencies.

Step 2: Read and parse the logs

CI logs are messy. They contain ANSI color codes, timestamps, progress bars, and thousands of lines of output. Stitch strips the noise and extracts the relevant error signatures.

It groups errors by type: lint failures, type check errors, test failures, dependency issues, build errors. Each category has a different diagnosis strategy.

Step 3: Diagnose the root cause

This is where the AI model earns its keep. Stitch sends the cleaned error context to a language model along with the relevant source files. The model does not see the entire repository. It sees only the files referenced in the error trace, plus their immediate dependencies.

This scoped context is intentional. It prevents hallucination from irrelevant code and keeps token usage predictable.

Step 4: Generate and validate the fix

The model produces a patch. Stitch applies it locally and runs the failing checks again. If the checks pass, the fix is committed and pushed. If they fail, Stitch reports the diagnosis and attempted fix so a human can pick up with full context.

This validation step is critical. An AI-generated fix that is not tested is just a suggestion. Stitch treats every fix as a hypothesis that must be verified.

Step 5: Group and deduplicate

When multiple jobs fail for the same root cause, Stitch groups them. If five test files fail because of a missing import, Stitch fixes the import once and reports the fix for all five jobs. No duplicate commits, no redundant work.

Guardrails

Stitch operates within strict boundaries:

The result

Teams using Stitch report fewer context switches for trivial CI failures. The fix-push-wait cycle for common issues like lint errors, missing imports, and type mismatches drops from 15 minutes to under 2 minutes. More importantly, developers stay focused on the work that matters.

CI pipelines will always break. The question is whether a human needs to be in the loop for every single failure.

Back to blog