Blitzy Fixed and Enhanced Claude's C Compiler
Mar 19, 2026 • Murph Vandervelde • 5 min read

370 Engineering Hours in 4 Days. 226,000 LoC. 1,260 Tests. Here's How.
Our first two Blitzy Open Source Initiative posts focused on Great Refactor projects in dnsmasq and curl. For this blog, given all the attention it has garnered, we set Blitzy on a mission to fix existing issues with Claude's C Compiler (CCC) and enhance the functionality of CCC in 1 run.
Claude's Very Public Compiler Release
On February 6, 2026, Anthropic announced that 16 Claude Opus 4.6 agents had built a C compiler from scratch in 14 days. A run of this length is quite impressive. Claude Code is an amazing platform for accelerating developer velocity on smaller software engineering tasks, as Blitzy's internal team are large users ourselves. The compiler project also illustrates the gap between research and production standards. A senior Anthropic researcher still had to act as full-time conductor, logging over 2,000 interactive turns with the Claude agents and writing many of the test cases.
The project generated enormous excitement: a working compiler, written primarily by agents, capable of compiling the Linux kernel. Headlines called it "a new era of autonomous coding". The GitHub repo racked up thousands of stars overnight, but when engineers tried to use it, they were met with a variety of unexpected problems.
The very first issue opened on the GitHub repo was titled, "Hello world does not compile." The compiler couldn't find its own system headers. John Regehr, one of the world's foremost experts on compiler correctness, ran his fuzzing tools against CCC and found 14 miscompilations out of 101 Csmith programs and 5 out of 101 YARPGen programs. His conclusion: "From the point of view of people working with production compilers, CCC isn't useful."
Deep Codebase Understanding: Tech Spec
Before modifying a single line of Rust, Blitzy ingested and analyzed the entire Claude C Compiler codebase and created a detailed technical specification of 351 source files totaling 186,696 lines across 17 C header files and 6 shell stubs.
The requirements mapped the full scope of what needed to be fixed across 9 audit areas including:
- 13 active bugs causing incorrect codegen or blocking real-world compilation, from ARM assembler CASPAL instruction encoding failures to macro parameter prefix-matching substitution errors.
- C11 language feature gaps with incomplete
_Atomicsupport, missing VLAs, absent_Genericselection, and a_Pragmaoperator explicitly skipped in the source code. - Zero external Rust crate dependencies, meaning every fix had to be implemented from scratch using only the Rust standard library.
- Single-tier optimization pipeline where every
-Olevel ran the same passes, confirming what benchmarkers had already discovered: optimization flags did nothing.
The analysis also surfaced the structural issues identified from the outside: optimization toward test-suite passage rather than general correctness, silent acceptance of invalid C programs that GCC would reject, and register spilling patterns that inflated binary sizes by 2.7–3x compared to GCC output.
What Blitzy Built
Blitzy generated the entire run from this prompt. See the pull request and project guide that recaps the work. The PR includes detailed descriptions for all relevant commits to walk through the changes and decisions made.
Here are the numbers:
- 92% of the entire project (370 engineering hours) completed autonomously in 4 days by parallelizing development work
- 10,785 autonomous agent turns (communication points between Blitzy agents, akin to human and agent interaction in iterative tools like Claude Code) compared to Claude Code's 2,000 human/agent turns
- 39,455 net new lines of code expanding the codebase from 186,696 to 226,151 lines, with 527 new files and 134 modified files
- 1,260 tests (753 unit, 507 integration) with a 100% pass rate across all four architectures. Zero compilation errors.
- All 13 active bugs resolved with dedicated regression tests for each, stabilizing the compilation baseline
- Full C11 language conformance including
_Atomicwith architecture-specific codegen across all 4 backends,_Generic,_Static_assert,_Alignas,_Noreturn, VLAs,_ComplexAnnex G fixes,restrictqualifier propagation, andinlinelinkage semantics - 6-tier optimization pipeline replacing the single-tier system:
-O0through-Ozwith distinct pass configurations, a new 1,893-line loop unrolling pass at-O3, and configurable inlining thresholds - Tail call optimization extended from x86-64-only to all 4 architectures
- 1,883-line linker script parser supporting
SECTIONS,MEMORY,ENTRY,PROVIDE, andKEEPdirectives - 62 ABI compliance tests cross-linking CCC caller and GCC callee across all 4 architectures
The architectural decisions matter as much as the volume.
The original CCC ran every optimization pass at every level. Passing -O3 did nothing different from -O0. The binaries were byte-identical. That's why independent benchmarks showed CCC producing executables 737–158,000x slower than GCC on complex operations.
Blitzy replaced this with six distinct tiers, each with a documented pass configuration. The difference is not cosmetic. The tiers change what the compiler actually does with your code.
Performance Improvements
After Blitzy's work, CCC's benchmarking profile shifted significantly. Compilation speed is 1.4–4.5x faster than GCC across all optimization levels. The gap widens at -O3, where GCC's heavier optimizer passes cost more time. Binary size is consistently 10% smaller than GCC. Sequential integer codegen is production-quality: FNV-1a hashing achieves full parity with GCC at -O3, proving the scalar code generation pipeline is solid. Loop-depth-aware spill weight in the register allocator reduces the stack frame bloat that benchmarkers had flagged as a major issue. The compiler frontend and scalar codegen are now solid.
Remaining Work
With 92% completion executed by Blitzy, the remaining 8% is well-understood. Auto-vectorization is absent, and floating-point optimization shows zero scaling from -O0 to -O3, leaving Blitzy's CCC 3–12x slower on data-parallel workloads like matrix multiply and float arithmetic. Blitzy understands the gaps and has provided a clear plan to human engineers to work with copilots (perhaps Claude Code) to finalize the build.
We have tested this across 1,260 different tests, but have not tested for every edge case. If you find any gaps, thank you for helping improve this project. Please report your findings to [email protected], and we can show you how to refine our PR in Blitzy. Review how to use Blitzy in our dev docs.
What's Next
CCC was the most high-profile AI coding demonstration of the year. Claude Code is a revolutionary addition to the arsenal of developer tools globally, but when building a large codebase autonomously, the results revealed that under real-world scrutiny, assembling known techniques and passing test suites is not the same thing as building production software.
Blitzy's fixes and enhancements signal what is possible with autonomous software development. The platform completed 370 engineering hours of autonomous work on this project. Every bug fix, C11 feature, optimization tier, and ABI test got the same level of attention. Claude Code, designed to accelerate developer productivity, is not equipped with the level of system understanding and agentic orchestration that Blitzy possesses.
Claude C Compiler was our third open source project, demonstrating Blitzy's technical maturity and our complementary nature to Claude Code. Blitzy is the only platform that generates code that can stand up to the scrutiny of real-world edge cases in production.
To stay current on Blitzy's Open Source Enhancement Initiative, subscribe to our newsletter and follow us on LinkedIn.

