Build an RFP Finder with Spec-Driven Development and SpecKit

At Atomic, we’ve always depended on a healthy pipeline of interesting client work. As managing partner, one of my ongoing responsibilities is to help find those opportunities for our custom software development team.

Building an RFP Finder with Spec-Driven Development and SpecKit

Historically, one area we’ve underused is U.S. government RFPs. It probably comes as no surprise that the government issues thousands of RFPs, including everything from lawn care to janitorial services to highly complex software systems. Buried in that noise are projects where Atomic could be a great fit.

Earlier this year, I decided to tackle that problem.

Rather than manually comb through RFP portals, I wanted an internal tool that could automatically pull in RFPs, filter out everything unrelated to software, and then score the remaining opportunities for how well they fit Atomic. My background is in programming, but these days I don’t have the time or desire to hand-code an entire system from scratch.

So I tried something new: building this internal tool using spec-driven development with GitHub’s spec-kit and Cursor as my IDE.

In this post, I’ll walk through what I built, how I used SpecKit, and the biggest lessons I took away from putting spec-driven development to work on a real, non-trivial project.

  • Project goal: Automatically find promising software-related government RFPs from a high-volume feed.
  • Approach: Spec-driven development with SpecKit and AI coding agents inside Cursor.
  • Bottom line: I’m bullish on spec-driven development. Today, it’s already good enough for internal tools and prototypes, as long as an experienced human stays firmly in the loop.

Why This Was a Good Fit for Spec-Driven Development

The core problem looked like this:

Data source: A paid API from rfpmart.com that returns a firehose of government RFPs across every imaginable domain.

Need: Filter that firehose down to the tiny fraction that represents software projects we’d realistically want to pursue.

Constraints:

  • This is an internal tool.
  • I wanted a simple, boring tech stack.
  • I didn’t have time to write and maintain a full modern web app.

SpecKit offered an appealing promise: describe what you want in natural language, define a clear specification, and let AI agents handle the bulk of the implementation. The hope was it would allow me to operate at the level of architecture, constraints, and review.

I started by writing a project constitution: this is an internal tool, keep the tech stack simple, avoid bleeding-edge frameworks, and prioritize idempotency on the backend since we’d be repeatedly syncing data from the RFPMart API into a database.

SpecKit suggested a reasonable stack: React on the frontend, a backend deployed on Vercel, and Supabase/Postgres for data—and that was a perfectly reasonable starting point. But right away, I ran into my first important lesson.

Takeaway 1: You Can Move Very Fast, But Only If You Stay Defensive.

One of the most striking parts of this experiment was how quickly I could move. Within about a week, I had a deployed a working system to production that probably would have taken me several weeks (at least) to build on my own, especially since I’m not exactly a strong React programmer.

SpecKit and the agents helped me:

  • Set up the project scaffolding.
  • Wire up the RFPMart API integration.
  • Build basic views with sortable tables.
  • Add support for fetching and storing large batches of RFPs.

However, this speed only felt safe because I approached everything the agents produced with a dose of skepticism, it was important to maintain discipline and review what was being output. Here are two examples.

Versioning details were often wrong.

SpecKit recommended sensible technologies, but it was off on concrete version numbers for things like Node and React. That’s the kind of mistake an experienced developer will catch quickly (“Is that really the version of Node we want?”), but it could easily trip up someone newer to the stack.

Local changes could create subtle regressions.

Even though I had shared components, the agents would sometimes make page-specific tweaks rather than properly generalizing a reusable component, introducing inconsistency and the occasional breakage elsewhere.

The lesson for me was simple: Spec-driven development gives you leverage, not absolution. You still have to read, review, and reason about what’s being generated.

To mitigate the risks of ending up with a spaghetti mess, I took the following approach for each spec:

  1. Wrote a clear, high-level specification of the feature I wanted.
  2. Used SpecKit’s task tooling to break it into phases.
  3. Instructed the agents to:
    • Implement one phase at a time.
    • Add automated tests and integration tests at the end of each phase.
    • Stop for my review before moving on.

This cadence felt like working at a higher level of abstraction as a product designer, technical lead and engineering manager all at once—without ever typing out application code myself. But it only worked because I treated every agent-generated change as something that needed a real human code review.

Takeaway 2: Specs Keep Agents on the Rails—but You Still Own the Architecture.

One of the promises of spec-driven development is that the specification itself becomes the main control surface: you describe what you want, and the agents fill in the details.

That promise largely held up.

For this project, my specs evolved roughly along these lines:

  • Initial spec: Pull RFPs from the RFPMart API, display them in sortable tables, and provide a way to score each RFP for suitability.
  • Filtering spec: Apply a non-AI, keyword-based filter to automatically discard ~98% of RFPs that are obviously unrelated to software.
  • Scoring spec: For the remaining 2–3%, send RFP details to OpenAI or Anthropic along with a prompt describing Atomic and score each opportunity along six different axes (fit, budget, timeline, strategic value, etc.).
  • Content processing spec: Automatically download attachments (PDFs, documents, ZIP files), extract text, and send the relevant content along with the prompt to the LLMs for better scoring.

SpecKit did a good job following these specifications. The agents were particularly strong at:

  • Generating the glue code to integrate APIs.
  • Building out React views that reflected the data model.
  • Adding nice visual and UX polish like loading states and async popups when tasks completed. This nice-to-have polish would never make it into an internal tool in the past.
  • Implementing the keyword filters and LLM scoring pipeline.

Where things got shaky was in the emergent architecture:

  • The agents were happy to follow my instructions literally, for better or worse.
  • They would sometimes implement very similar logic in slightly different ways on two screens, instead of refactoring to a shared component.

That’s not a criticism of SpecKit so much as a reminder of what’s still my job. The spec defines what gets built. The architect defines how it should hang together.

In practice, that meant I spent a non-trivial amount of time nudging the agents toward shared components and cleaner boundaries, asking them to refactor duplicated logic after the fact, and tightening the specs when I saw architectural drift.

If you treat the tools as junior developers who are good at following instructions but bad at long-term architectural stewardship, you’ll have the right mental model.

Takeaway 3: Data, Migrations, and Production Reality Still Require Human Judgment.

The place where the cracks really started to show was around data modeling and deployments.

As the project evolved, we naturally changed the schema. Those included new fields to better represent RFP metadata and structure changes to support scoring along multiple dimensions. We also made adjustments to support new filtering or sorting behaviors.

What I expected (or at least hoped) was that the agents would:

  • Recognize when a schema change required a data migration.
  • Proactively generate migration scripts.
  • Help me reason about how those changes would affect existing production data.

What actually happened was:

  • The agents happily modified the schema using Prisma on the fly.
  • Initially, they did not automatically suggest or create data migration scripts.
  • Once we had real data in production, they were not proactive about preserving or transforming that data safely.

Because I’ve screwed this up in the past and learned from it, I recognized that changing the schema with no way of handling migrations wasn’t a workable solution so I prompted the agents to add these and modified the constitution/technical plan so it would always consider migrations when changing the schema.

Similarly, when we hit issues in the production environment, the agents were helpful for a while but eventually ran out of creative ideas. Debugging real-world systems—especially when logs, network behavior, and third-party APIs get involved—still benefits enormously from human intuition.

This experience reinforced a key belief for me: Spec-driven development does not remove the need for operational awareness. If anything, it makes it more important for someone to own the data lifecycle, deployment pipeline, and production health.

Takeaway 4: Where Spec-Driven Development Shines (and Where It Doesn’t).

Despite the limitations, I came away from this experiment genuinely impressed. For this project, spec-driven development with SpecKit felt especially strong in a few areas:

  • Internal tools and prototypes: I was able to build an acceptable-quality internal system, with a reasonable UI and a solid backend, in a fraction of the time it would normally take.
  • Teams with uneven engineering strength: A group with strong domain knowledge but limited full-stack depth could use this approach to get a working tool off the ground, as long as they have at least one experienced engineer overseeing things.
  • Higher-level thinking for experienced developers: I never wrote a line of application code in the repo, but I spent a lot of time refining specs, reviewing diffs, and steering architecture. It really did feel like operating at a higher abstraction layer.

On the other hand, I’d be more cautious using this approach for:

  • Complex, long-lived production systems where data correctness, migrations, and uptime are critical.
  • Highly regulated domains where traceability and formal verification matter.
  • Teams without any experienced engineering oversight, where it would be too easy to accept whatever the agents generate as “probably fine.”

I see spec-driven development today as a powerful multiplier for the right kinds of work, not a wholesale replacement for thoughtful software engineering.

Cautiously Optimistic About the Future

Overall, I’m bullish on spec-driven development and tools like SpecKit. For this RFP finder project, I ended up with a working internal system that automatically pulls, filters, and scores government RFPs. The code quality I’d consider comparable to what a human team might produce for an internal tool—largely because I insisted on testing, linting, and review discipline. I also got a much clearer sense of how AI coding agents can fit into a professional development workflow.

At the same time, this experiment reinforced a few non-negotiables for me:

  • You still need human experience to catch versioning issues, enforce architecture, design migrations, and debug production.
  • You still need process discipline around specs, tests, and reviews.
  • You still need to think in systems, not just in prompts.

I’m confident that in a few years (or maybe much sooner) most teams will work this way: interacting with coding agents via natural language, grounded in clear specifications, and focusing more of their energy on design and stewardship than on line-by-line implementation.

I’m excited about that future, but I’m also cautious. The tools are powerful, and they can help you move fast—but you’re still responsible for where you’re going and what you ship. I’d love to hear how you’re using spec-driven development (or tools like SpecKit) in your own work. Where has it helped you, and where have you run into the limits?

Conversation

Join the conversation

Your email address will not be published. Required fields are marked *