Adventures in Vibe Coding
Through two experiments in “vibe coding,” we discover that AI tools like Claude can produce impressive results but require human guidance, structure, and patience to be truly effective.
TABLE OF CONTENTS
- When AI Builds Your App Its Own Way
- 5 Lessons from my Second Adventure in Vibe Coding (Coming soon!)
- Vibe Coding the Forecasting Model Comparison Tool
- Implementing Claude’s refactoring
- Five Lessons Learned
When AI Builds Your App Its Own Way
A few months back, we wrote a blog about vibe coding. In that post, we talked about how I re-wrote a simple bot to automate tennis court reservations on our HOA app. While the exercise was successful in resolving the problem, I noticed a few concerning behaviors in Claude's approach to design.
Since then, I have read several articles about senior engineers abandoning AI coding assistant tools. I even read about a company that fired a software engineer because he was generating most of its code with AI. I also spoke with some friends who are very successful in using AI-generated code in their apps. Plus, I continue to read examples of people successfully enhancing their development experience with AI.
Reflecting on the diverse experiences of professionals in the field, I found myself pondering: What accounts for this stark contrast? Why does person A (a CTO) seamlessly generate 75% of his code with AI, while person B (a Senior) has abandoned AI coding tools?
I decided to explore the challenges I encountered with AI coding tools by rewriting an app I wrote years ago. The difference is that this app is larger and requires more thought in its design and code organization.
First, I want to explore the concerning patterns I saw using Claude for the first time.
Pattern 1: One file and one humongous class
While working on the bot, I only provided business requirements in my prompts to Claude. I did not offer any guidance regarding how to organize the code. I just kept asking for features and fixes as I tested the app. But my vibe coding exercise was not a black box experience. I can see the code Claude was generating, and I even made some changes directly to it later in the process.
I noticed that, besides the separate config file, Claude was putting all the code in one class inside a single Python file. At no point did it try to refactor or suggest a refactoring of the app. It combined many different responsibilities into one class. Since we worked on a simple bot, I wonder if it was too small to highlight the issue I foresaw.
A larger app might make the issue apparent.
Pattern 2: The Insanity Loop - Lying about fixing bugs
I used the word “lying”, but in reality, it did not look like a lie and more like a typical bug. As Claude made progress with the bot, we started by addressing larger defects and eventually moved to minor bugs. While dealing with minor bugs, I noticed a curious pattern. Claude told me the fix was done, but when I tested it, the same issue persisted. I told Claude the problem was still present. Claude apologized and tried again. It said to me once again that it was fixed, but that wasn't the case when I tested it. Claude failed three times before giving up and starting over to regenerate the whole app from scratch.
Is Claude lying? Is this an example of a hallucination? I decided to watch closely as Claude tried to fix another bug. One of the advantages is that you can see Claude writing the code. What I saw was the following:
- First, Claude will implement the fix. It was usually a few lines of code that needed to be changed.
- Second, Claude will erase that new code and restore the previous code.
- Third, Claude will tell me it fixed the issue, and here is the new version of the code to try.
As I challenged Claude each time, it would repeat the same steps twice more before giving up and regenerating the whole app. From the outside, it looks like Antrophic implemented a guardrail that stops Claude after three failed attempts. I called this one the insanity loop.
Understanding these behaviors and their potential limitations is crucial as we tackle a more complex application. It's not just about what AI can do, but also about what it can't do.
The Second Application
Many years ago, before the arrival of ChatGPT and GenAI, I was attending an AI conference. Most of what was discussed is what we today call Traditional AI. One of the presenters worked at one of the largest investment banks; I do not remember which one. I do remember his presentation. He was comparing traditional statistical analysis techniques with AI/ML techniques for forecasting stock market movements. His data showed that AI/ML was consistently better, but not by a significant percentage. He demonstrated his application by charting the Dow Jones at the beginning of the 21st century, including the considerable decline during the Great Recession.
It was a great way of learning about AI/ML. You can use the stock market data or similarly publicly available data and compare different models. You can adjust parameters and see if you can improve them. I built my own app to do something very similar.
In part 2 of this post, we discussed how I built the forecasting model app with Claude.
About the Author
Testaify founder and COO Rafael E. Santos is a Stevie Award winner whose decades-long career includes strategic technology and product leadership roles. Rafael's goal for Testaify is to deliver comprehensive testing through Testaify's AI-first platform, which will change testing forever. Before Testaify, Rafael held executive positions at organizations like Ultimate Software and Trimble eBuilder.
Take the Next Step
Testaify is in managed roll-out. Request more information to see when you can bring Testaify into your testing process.