Adventures in Vibe Coding

Through two experiments in “vibe coding,” we discover that AI tools like Claude can produce impressive results but require human guidance, structure, and patience to be truly effective.

TABLE OF CONTENTS

When AI Builds Your App Its Own Way
5 Lessons from my Second Adventure in Vibe Coding

When AI Builds Your App Its Own Way

A few months back, we wrote a blog about vibe coding. In that post, we talked about how I re-wrote a simple bot to automate tennis court reservations on our HOA app. While the exercise was successful in resolving the problem, I noticed a few concerning behaviors in Claude's approach to design.

Since then, I have read several articles about senior engineers abandoning AI coding assistant tools. I even read about a company that fired a software engineer because he was generating most of its code with AI. I also spoke with some friends who are very successful in using AI-generated code in their apps. Plus, I continue to read examples of people successfully enhancing their development experience with AI.

Reflecting on the diverse experiences of professionals in the field, I found myself pondering: What accounts for this stark contrast? Why does person A (a CTO) seamlessly generate 75% of his code with AI, while person B (a Senior) has abandoned AI coding tools?

I decided to explore the challenges I encountered with AI coding tools by rewriting an app I wrote years ago. The difference is that this app is larger and requires more thought in its design and code organization.

First, I want to explore the concerning patterns I saw using Claude for the first time.

Pattern 1: One file and one humongous class

While working on the bot, I only provided business requirements in my prompts to Claude. I did not offer any guidance regarding how to organize the code. I just kept asking for features and fixes as I tested the app. But my vibe coding exercise was not a black box experience. I can see the code Claude was generating, and I even made some changes directly to it later in the process.

I noticed that, besides the separate config file, Claude was putting all the code in one class inside a single Python file. At no point did it try to refactor or suggest a refactoring of the app. It combined many different responsibilities into one class. Since we worked on a simple bot, I wonder if it was too small to highlight the issue I foresaw.

A larger app might make the issue apparent.

Pattern 2: The Insanity Loop - Lying about fixing bugs

I used the word “lying”, but in reality, it did not look like a lie and more like a typical bug. As Claude made progress with the bot, we started by addressing larger defects and eventually moved to minor bugs. While dealing with minor bugs, I noticed a curious pattern. Claude told me the fix was done, but when I tested it, the same issue persisted. I told Claude the problem was still present. Claude apologized and tried again. It said to me once again that it was fixed, but that wasn't the case when I tested it. Claude failed three times before giving up and starting over to regenerate the whole app from scratch.

Is Claude lying? Is this an example of a hallucination? I decided to watch closely as Claude tried to fix another bug. One of the advantages is that you can see Claude writing the code. What I saw was the following:

First, Claude will implement the fix. It was usually a few lines of code that needed to be changed.
Second, Claude will erase that new code and restore the previous code.
Third, Claude will tell me it fixed the issue, and here is the new version of the code to try.

As I challenged Claude each time, it would repeat the same steps twice more before giving up and regenerating the whole app. From the outside, it looks like Antrophic implemented a guardrail that stops Claude after three failed attempts. I called this one the insanity loop.

Understanding these behaviors and their potential limitations is crucial as we tackle a more complex application. It's not just about what AI can do, but also about what it can't do.

The Second Application

Many years ago, before the arrival of ChatGPT and GenAI, I was attending an AI conference. Most of what was discussed is what we today call Traditional AI. One of the presenters worked at one of the largest investment banks; I do not remember which one. I do remember his presentation. He was comparing traditional statistical analysis techniques with AI/ML techniques for forecasting stock market movements. His data showed that AI/ML was consistently better, but not by a significant percentage. He demonstrated his application by charting the Dow Jones at the beginning of the 21st century, including the considerable decline during the Great Recession.

It was a great way of learning about AI/ML. You can use the stock market data or similarly publicly available data and compare different models. You can adjust parameters and see if you can improve them. I built my own app to do something very similar.

In part 2 of this post, we discussed how I built the forecasting model app with Claude.

5 Lessons from my Second Adventure in Vibe Coding

As we mentioned in part I, we want to rewrite the forecasting model app with Claude doing most of the coding. Let's see if the anti-patterns we saw on our first app show up on this one.

Vibe Coding the Forecasting Model Comparison Tool

I had limited time to work on this rewrite. As such, I made the mistake of providing some guidance early on about the organization of the code. I made the mistake of telling Claude to separate the data loading and file generation (training, validation, and testing files) from the forecasting functions. Anyway, I caught myself and stopped providing that information as we went deeper into implementing the different models, the forecasting, and reporting. We wanted to see if Claude would display the same behaviors we saw during the vibe coding exercise while building the tennis reservation bot.

As before, Claude continued building a large class in a single file as we added new functionality. After a few sessions, the insanity loop showed up. The frustration started climbing. I kept pushing to see if Claude would stop itself, but it kept regenerating the huge class every time it tried to escape the insanity loop. Will this monstrosity end at some point? It did. During one session, Claude was not able to regenerate the whole class anymore. Claude finally suggested we might have to refactor the code. I must admit that I did not react well to the wasted time and even called Claude a “condescending prick.”

I did apologize the next time. Manners are everything, even when working with a highly sophisticated system like Claude.

After my apology, I decided to start treating Claude like I would any junior engineer. I asked Claude for his recommendation regarding the code refactoring. In other words, we stopped vibe coding and moved to a more pair programming type of engagement.

Implementing Claude’s refactoring

I knew how I had organized the code before, so I wanted to see what Claude would do. Claude came up with a good proposal. It overlapped with my original design by about 70%.

Claude decided to keep the forecasting class and let it be the orchestrator of the forecasting process. Its recommendations included creating separate classes (files) for the configuration management, data management, model runner, reporting, and even an individual package with multiple files, one for each model.

There were a couple of significant differences between what I had done before with Claude’s proposal. Claude was not proposing to add any tests. Also, Claude did not take into consideration the expanding config file. In my previous implementation, every AI/ML model had their own config file. AI/ML is all about adjusting parameters, and having all the models in one config file didn't make sense to me since I usually work with one or two models at a time. Running all of them takes too much time.

Still, in the spirit of letting Claude lead, I accepted its recommendation. We kept one config file, but I asked Claude to change the structure to make it easier for me to update it as needed.

The first disagreement was how to proceed with the refactoring. Claude wanted to regenerate everything in one shot. I stopped it. I directed Claude to implement the refactoring one change at a time. I specifically asked to start with the ConfigManager. I tested every refactor and provided feedback about defects as we implemented the changes. Claude seems to have a short memory. It forgot to create a separate package for the models. But it was easy to fix after I told Claude what it proposed before.

We finished the refactoring. It took a couple of sessions. The original forecasting class was significantly smaller and easier to review.

I wonder if forcing Claude to follow the process of making one change at a time slowed us down. There is no reason to think Claude couldn't implement all the changes at the same time, but the testing would be a pain that way.

Lessons Learned

I am still tinkering with the new forecasting program. The last change made one of the models significantly faster. The new application is even larger than the one I wrote a few years back. It is very easy to add new models.

I find AI coding tools extremely useful, but it is essential to keep the following lessons in mind as we move forward:

Lesson 1: You are working with a new technology that is still evolving. Since upgrading to Sonnet 4.5, I have not seen the insanity loop issue anymore.
Lesson 2: A vibe coding approach currently has its limitations. If you focus purely on business requirements, you will hit a wall at some point. A pair programming approach, where the AI tool is the junior engineer, is a better choice at this point.
Lesson 3: Be aware of the emotional connection. As my emotional outburst illustrates, the conversational nature of LLMs produces the erroneous sensation that you are working with another person. You are not. It's a sophisticated tool that can feel surprisingly collaborative, nothing more.
Lesson 4: You need to think about the whole system if you want to be effective. Only when I talked to my friends who are systems thinkers did I hear about the most successful AI implementations. They have a clear picture of the whole process and outcome. As such, they can provide the AI coding assistants with precise instructions as they move forward, including information about code design and organization.
Lesson 5: Keep an eye on performance. After these attempts, I noticed that Claude does not optimize the code performance unless you ask it to.

I am planning to continue this journey with a more complex application. Keep checking our blog post; the next AI coding post might be about how I want to kill Claude. Just kidding.

By the way, I asked Claude to comment on the blog post. A couple of its comments caught my attention:

“I wasn't being deceptive, just struggling with context/state management.”

“Good luck with your next complex application project (and I promise to try not to drive you to any more frustrated outbursts! 😄).”

About the Author

Rafael E Santos is Testaify's COO. He's committed to a vision for Testaify: Delivering Continuous Comprehensive Testing through Testaify's AI-first testing platform. Testaify founder and COO Rafael E. Santos is a Stevie Award winner whose decades-long career includes strategic technology and product leadership roles. Rafael's goal for Testaify is to deliver comprehensive testing through Testaify's AI-first platform, which will change testing forever. Before Testaify, Rafael held executive positions at organizations like Ultimate Software and Trimble eBuilder.

Take the Next Step

Testaify is in managed roll-out. Request more information to see when you can bring Testaify into your testing process.

Adventures in Vibe Coding

When AI Builds Your App Its Own Way

Pattern 1: One file and one humongous class