The Marvel that is Gen AI & the Emergence of Vibe Coding

Follow along as I build a real-world app with vibe coding, netting a 5x productivity boost, surprising accuracy, and renewed excitement!

TABLE OF CONTENTS

I Used Vibe Coding to Rebuild a Bot to Schedule Saturday Morning Tennis

I Used Vibe Coding to Rebuild a Bot to Schedule Saturday Morning Tennis

I usually write about testing. On occasion, I expand into a tangentially related topic. That is the case of this blog post. However, do not worry, I also discuss testing.

The New Trend: Vibe Coding

Recently, our engineering team got into a debate on Slack about vibe coding. As you know, software engineers rarely express their opinions; they tend to be unassuming individuals - yeah, right. Of course, the debate continued for two days, with pauses. I must admit I was the one who caused the controversy. More on that later.

First, what is vibe coding? Andrej Karpathy posted on Twitter in early February 2025 the following:

There's a new kind of coding I call "vibe coding", where you fully give in to the vibes, embrace exponentials, and forget that the code even exists. It's possible because the LLMs (e.g. Cursor Composer w Sonnet) are getting too good.

He goes further in explaining how he allows the LLM to keep writing and refining the code without his review. If you are unfamiliar with Karpathy, he worked on Tesla’s AI team and was part of the founding team at OpenAI.

In essence, during vibe coding, you instruct the LLM on what you want, guiding it through the process as you test the app it creates and let it handle all the code changes. Some purists insist that you should not look at the code, but that is not what Karpathy does all the time. In the responses to his post, he wrote the following:

The amount of LLM assist you receive is clearly some kind of a slider. All the way on the left you have programming as it existed ~3 years ago. All the way on the right you have vibe coding. Even vibe coding hasn't reached its final form yet. I'm still doing way too much.

My Journey to Vibe Coding

As a VP of Engineering and a CTO, I've been writing code since I was 14, in various programming languages. Most people assume I enjoy coding. However, the truth is that I find it to be a rather tedious task. I see it as a means to an end, a way to bring a product to life. I only truly enjoy coding when I'm solving a problem, not when I'm building the scaffolding needed to create a product.

I'm an avid tennis player, hitting the court three to five times a week. Every Saturday at 8 am, I played doubles with friends. A few years ago, our HOA switched to an app-based court reservation system—a step up from calling the office—but with only two courts, competition skyrocketed, especially on Saturday mornings. Reservations opened at midnight three days prior, sparking a weekly race for a slot.

As an engineer, I developed a Python/Selenium bot to automate the reservation process. After a week of tinkering and test runs, it was fast enough to beat everyone else. With the bot securing our Saturday slot consistently, others eventually stopped trying.

I recently got a new laptop. I thought this was a good time to try out vibe coding. As readers of my blog, you may be aware that I have written about the capabilities of LLMs in software testing. Recently, I have been reading the work of people such as Henrik Kniberg. Henrik has been posting about how much the newer models, mainly from Antropic, have helped him increase his coding productivity by an incredible factor.

I thought it would be interesting to rewrite the bot and use it as a test to see how well GenAI handles coding.

The New Booking Bot

I decided not to spend any money on my test. I used Antropic’s Claude Sonnet 4 free plan and copied and pasted the code into VSCode to test it. I started by providing a single prompt with the primary use case path and some requirements regarding when to execute, making it configurable through a configuration file. I required the code to be in Python.

From the first version, Claude blew me away with the result. The class definition, the functions, all the scaffolding to handle error paths and logging were in place, and I did not write a single line of code. In seconds, a running application was in front of me. Claude also provided all the instructions to ensure I installed the correct packages.

I started running the application, and we hit an error. I provided the error to Claude, and it created a new version. The fix worked on the first try. I was trying hard not to look at the code. Just copy and paste the latest version of it. Claude kept telling me what it was doing every step of the way.

After we resolved a couple of technical issues, we began encountering the apparent problems in my instructions. I did not conduct any analysis or look at my previous bot. I wrote what I thought were the steps that define the main success scenario for my use case. I missed some steps. As soon as I provided the information, Claude regenerated a new version and immediately fixed the problem.

Ultimately, we encountered some typical issues that you face with UI test automation. It is fragile, as everyone knows. I provided Claude with more information about what was happening, and it resolved the first issue on the first try once again.

I had only one functional issue to resolve to complete the main path and address the need for performance improvement. It was at this point that Claude stopped accepting my prompts and sent me a message stating that I had reached the limit for a single chat session.

It was late, and I decided to go to sleep and finish the bot the next day. The next day, while working, I could not stop thinking about the bot and what I would do next. I couldn't wait for the workday to end, have dinner, and get back to working on the bot. I was excited in a way I have not been in a long time.

Finally, I got back to the bot. I started a new Claude session. I explained I needed a fix for a program written by Claude in a previous session. I explained the issue, and Claude generated the code to add to the program. At this point, I decided to examine the code so I could implement the solution provided by Claude. It worked on the first try.

My vibe coding attempt produced a fully functional bot. The use case main paths and many of the alternative paths (mostly error paths) were in place. The only problem left was performance. It was too slow. Thanks to how easy it was to read the code, I was able to delve into it, make a few enhancements, and increase the execution speed by a factor of 10. Looking at it from the execution time perspective, I reduced the execution time by more than 90%. Can I make it even better? Yes, I can, but I need to find time to work on it.

Sharing this experience started the recent debate on our Slack channel. Some engineers are entirely against it, while others are in favor of it.

Four Findings From My Vibe Coding Experience

The following are my findings from this simple experiment.

First, I went through my records to determine the productivity improvement. While not a perfect calculation, as I do not have precise hours for each day when I did it manually, I estimate the improvement to be around 5 times. I completed the work 5 times faster than the first time.

Second, Claude Sonnet 4 writes code better than all the junior engineers I have worked with. It writes code better than a significant number of mid-career engineers I know. It even writes better code than a few so-called Senior engineers I know, too. The code can become somewhat verbose at times, as it attempts to cover all possible error scenarios; however, this is preferable to missing those paths, as many engineers do. One question I was unable to answer was about performance improvement. Can it improve the code based on the logs captured during execution? I will have to run another experiment for that.

Third, Claude makes mistakes too. While a minor miss, Claude forgot to add logging to one of the functions. I discovered the problem while trying to improve the application's performance. For the most part, the logs provided me with the necessary information, except in that section of the code.

Fourth, being a multidisciplinary person, an agile team of one, is invaluable. I have led product management, UX, software, QA, and Operations engineering teams. I can wear any of those hats at any time. I can provide a well-written, step-by-step use case main success scenario. I can review and enhance the code. But more importantly, I can test the app by applying different testing techniques. In this case, I focused on use case testing as I vibe-coded the application.

Finally, it is incredibly refreshing and exciting to be able to focus on intention instead of dealing with all the minutiae involved in coding a new application. These are exciting times!

Now, the controversial statement: someone posted the following comment on Karpathy’s tweet:

It's amazing to see someone as smart as you embrace AI on this level. And yet, I have seen a few takes from devs who think it's a badge of honor to write everything themselves and not use AI, unaware someone using AI will replace them first and soon.

Let's see what happens when I post this on our Slack channel. 😄

About the Author

Rafael E Santos is Testaify's COO. He's committed to a vision for Testaify: Delivering Continuous Comprehensive Testing through Testaify's AI-first testing platform. Testaify founder and COO Rafael E. Santos is a Stevie Award winner whose decades-long career includes strategic technology and product leadership roles. Rafael's goal for Testaify is to deliver comprehensive testing through Testaify's AI-first platform, which will change testing forever. Before Testaify, Rafael held executive positions at organizations like Ultimate Software and Trimble eBuilder.

Take the Next Step

Testaify is in managed roll-out. Request more information to see when you can bring Testaify into your testing process.

The Marvel that is Gen AI & the Emergence of Vibe Coding