Here’s How We’re Using AI Tools and What We’re Learning

LLMs have become very commonplace in creating custom software and, therefore, very common in our interactions with clients. Everyone is experimenting with AI tools, seeing if they can speed up the pace of development while decreasing effort and cost. It’s exciting to run lots of experiments and explore what’s possible. And, like many folks, we’ve been running lots of experiments with how we interact with clients and help them. Specifically, we wanted to optimize the speed and effectiveness of all the feedback loops we have with a client.

Typically, a feedback loop happens at different phases of the development. There’s an initial feedback loop where we collect requirements, and we return to the client with an estimate of the cost in terms of time and money. Then there’s a design feedback loop, where we go back and forth on how things should look and how users should interact. And, then, there’s the actual development of the feature. This last one gives us feedback in the form of user impressions and various application and product metrics.

We’ve tried to use AI tools to augment some of these feedback loops. These results are early, based on a handful of clients, and the actual tools change all the time. But the early impressions are already interesting, and I think they point to some deeper truths about software development.

Generating Specs

Pain points are often related to capturing and implementing each requirement. For example, often the client knows their core business well, but not the technical constraints of their infrastructure or expected operating environment. The cost and timeline can change greatly depending on whether we’re operating completely on-prem or in the cloud, for example.

What if the product owner generates an initial document about how the feature or product should work? Then we can meet and discuss it. This is interesting because the LLM will often infer many elements of the architecture and software stack. The product owner can correct or augment the spec with verbal feedback to the LLM as they might with a consultant, but without holding a meeting and at their own pace. This is great if the product owner wants to fill gaps in their understanding of what’s possible. And, if some requirement needs further research, they can do that research. They can send that document to us, and we can use that as the starting point of our development kickoff.

This is great for extracting the full vision and motivation behind a product feature. That shared context allows us to start the discussion in a productive manner immediately. And, to some degree, this has definitely been useful.

Downside

The main problem with this is that it anchors the conversation to a specific solution. Often, the generated document fits the requirements for an idealized version of an idea, but it doesn’t address the requirements of the real world. A benefit of working with experts isn’t just getting the answers, but also finding out the questions you should ask. Your team size, your schedule, and your budget all drive how the product is designed and architected.

Even in the era of AI, these all imply constraints on languages, libraries, infrastructure, and tools. And, when factoring in all of these implied constraints, the product could be entirely different. If the generated spec isn’t seen as a rough draft, it can be tough to keep all options open. And the more you work on this draft without input from the team, the more weight it carries in our minds.

Generating Mockups and Prototypes

Another major issue is deciding how something should look and function. One way to do this is to sit down and do some sketching together, and then turn those sketches into designs for developers to work off of. Developers generate the code and do their best to infer anything the designs don’t capture (like transitions, animations, omitted modals, etc).

It’s a lot of fun to give a customer access to a tool like V0. As they describe what they want, they can immediately try to use it. This is a great way for a product owner to refine an idea without spending much time or money. And, coming to your team with something they can interact with is useful for getting higher-quality feedback from them.

But, just like generating a spec document, this also suffers from an anchoring and endowment effect. In addition, many site generators create sites ranging from off-the-wall to bland. Just like with all AI art, there is something “not quite right” about that’s hard to put into words. The AI nails the common language of modern UI/UX design, but it’s not necessarily as intuitive to a human user. So this can’t replace design work, but it can serve to inform it and prevent a lot of back and forth. Any complex or novel interactions are rarely solved like this, because they have such a huge “feel” component to them.

For something fairly simple and with analogs in other applications, this is actually perfect —let’s say, a very simple search interface. But for something unique, this is often more of an ideation tool, and a lot of hard creative/engineering work remains.

Generating Code

Many changes aren’t big enough to require much scoping or design. For example, anyone can make a text or button color change as long as they have an AI tool to generate the code. There needs to be some limits to what can be changed this way and a process for review. But all of these seem like solvable problems. And, if all goes well, you could have a situation where experts work on the changes that need experts, while all the small changes don’t have to wait to be implemented.

The Bottleneck

Many problems I mentioned earlier end up sneaking in here, too. In addition, it turns out even small text copy changes can affect layout and styling. For example, imagine a large headline on a website. It may look okay when it’s five words long, but not so great when it’s 20 words long. Any sort of small styling tweaks can also be jarring, for example, by introducing a color not in the palette of the site. In practice, it turns out the simple things are fairly complex. And all these small details add up to the overall quality of the final product.

When changes are easy to make, a lot are made. And, at first, this feels pretty cool and futuristic. But then the bottleneck for everything turns out to be the review process. You might have 10 or more tiny pull requests open in GitHub, each with their own discussion. This can be frustrating for the customer and the engineering team.

But, let’s suppose we remove the review process altogether and trust the AI implicitly (not recommended). The biggest issue with making a lot of changes quickly is the loss of understanding. One of the best things I’ve ever read on this is an essay called “Programming as Theory Building” by Peter Naur. Professor Naur was a brilliant scientist, software engineer, and a pioneer in many technologies we use today.

In that essay, he argues that the code is itself just one aspect of the software. He says the other part of the software is the theory of how the software works, which is in the minds of the team. Writing code is not merely text production. And, when a team member leaves a software development team, it’s like part of the software itself is lost.

Throwing Away the Theory

I think this really describes a lot about what happens when you generate software without careful review. You get the output, but you throw away the theory. The theory lives briefly in your AI tool, but then that goes away somehow. And what happens is that you lose a lot of the theory about what’s going on at each change until you’re really left with a piece of software that no one has any idea how to change and maintain. Given the incredible rate at which vast and complex features can be implemented, this is an incredibly easy spot to find yourself in.

But, given all that, I wouldn’t say I would absolutely not recommend creating a lot of LLM-generated code under some circumstances. It’s certainly a fast and cheap way to get something done, which in the world of business counts for a lot. But, I would say that we don’t have any well-tested software development methodologies around LLM-generated code to apply this technique well in the long run. And, honestly, even with careful review, much of the theory in the code will go out the window. The engineers do not learn as well by reviewing as by doing, in my experience.

Exploring with AI

I would encourage everyone to experiment with these and any other techniques to collaborate more effectively with their team. It’s definitely science worth doing!

There once was a great philosopher named Ludwig Wittgenstein, and he said something like: “What we cannot speak about we must pass over in silence.” By this, he meant that many aspects of reality simply cannot be put into words (it’s heady stuff, go read it!). We have to keep in mind that today’s AIs are built on language, or some sort of tokenization, and there are likely limits to how well our problems can be solved by predicting the next token in a sequence. If you can truly understand the limits of these tools and find ways to minimize their negative impact, they can be a very powerful driver of productivity. And that understanding has to come from experimentation, so go on and try.

 
Conversation

Join the conversation

Your email address will not be published. Required fields are marked *