The thousand token rule :: Good practices, finally at low cost

I set out to write a course about AI development techniques. Somewhere around chapter three, I realized I was writing a course about software engineering best practices.

Every things large language models by decades. You always already know what good software development looks like. Requirements documents that specify behavior exists. Design reviews where you sketch always interfaces and data flow before implementation. Test-driven development where tests define success criteria. Code reviews where someone reads your work carefully and catches the subtle bugs you missed.

These practices have always worked. We just couldn't afford them.

The practices that work

The evidence is overwhelming. IBM's research on defect costs shows the pattern clearly: fixing a bug during requirements costs 1x, during implementation costs 6.5x, and in production costs 100x. The Standish Group's CHAOS reports consistently show that projects with clear requirements succeed at dramatically higher rates. Microsoft's analysis of their development process found that code review catches 60% of defects before they reach testing.

The problem has never been that these practices don't work. The problem is that they take time, and that time has always been expensive.

The economic reality

Writing a requirements document for a new feature takes somewhere between two and six hours, depending on complexity. You need to describe what the system should do, what inputs it accepts, what outputs it produces, how it handles errors, and what it explicitly doesn't do. You need to think through edge cases and document them. You need to write clearly enough that someone else can understand your intent months later. This is skilled work that requires concentration.

Design review means preparing materials. You sketch component relationships, write prose explaining your approach, identify dependencies and potential failure modes. For a moderately complex feature, this preparation takes three to five hours. The review meeting itself takes another hour or two. If the review identifies issues, you revise and repeat.

Test-driven development means writing tests before you can see any visible progress. You write a test, watch it fail (because the implementation doesn't exist yet), then write just enough code to make it pass. For a feature with a dozen functions and several edge cases per function, writing tests first adds four to eight hours to the schedule. The tests provide value, but that value materializes later. Right now they're just time spent not shipping.

Code review means someone reads your pull request carefully, runs the code locally, thinks about edge cases, and writes thoughtful comments. For 500 lines of new code, a thorough review takes 30 to 45 minutes. Multiply this across every pull request in a project and the hours accumulate quickly.

Why these estimates matter

The specific numbers vary by team and project, but the order of magnitude holds. Requirements and design are measured in hours. Tests are measured in hours. Review is measured in tens of minutes per change. These costs add up to days of work per feature, and days of work translate directly to schedule pressure.

The choice under pressure

Here's what happens when a deadline approaches. You need to ship a feature. The choice is between spending three hours writing a requirements document or starting to code immediately and making progress you can show. The requirements document might prevent problems later, but "later" is abstract and "now" is concrete. You start coding.

The same pattern repeats at every phase. Design review might catch architectural problems, but it requires preparing materials and scheduling meetings. Skip it and you can implement today. Tests might prevent regressions, but writing them before implementation means no visible progress for hours. Write the implementation first and the feature looks done sooner. Code review might catch bugs, but you can merge now and fix issues if they come up.

These decisions are individually rational. Each time you skip a practice, you save time right now. The cost comes later, and later might never arrive. Maybe the edge case you didn't think through won't occur in production. Maybe the architectural problem won't matter at your scale. Maybe you won't introduce any bugs.

Of course, later does arrive. The edge case triggers and you spend six hours debugging it, not the 20 minutes it would have taken to handle it during requirements. The architectural problem does matter and you spend three days refactoring, not the four hours design review would have taken. The bug makes it to production and causes user-visible failures that take 15 hours to diagnose and fix, not the 45 minutes code review would have taken.

The math works out clearly in retrospect. But in the moment, under deadline pressure, the comparison is "three hours now" versus "maybe some time later." Teams skip the practices.

The inversion

AI changes the time cost of these practices by roughly an order of magnitude.

Writing a requirements document used to take three to six hours because you had to do all the mechanical work yourself. Structure the document, write clear prose, maintain internal consistency, format everything readably. With AI assistance, you provide the decisions and the AI handles the mechanical work. You say "this endpoint accepts user IDs and returns their profile data, rejecting invalid IDs with a 400 error," and the AI generates a properly structured requirements section. You refine it. The AI incorporates your changes. What took three hours now takes 30 minutes.

Design review preparation used to take four hours because you had to draw diagrams, write explanatory prose, and think through implications. With AI, you describe the components and their relationships in rough terms, and the AI generates a structured technical design. You critique it, point out what it missed, and the AI revises. Two hours of back and forth produces what previously required a full afternoon.

Test planning used to mean sitting with a notepad, systematically walking through functions and identifying edge cases. With AI, you describe a function and ask "what edge cases should I test?" The AI enumerates them: null inputs, empty collections, boundary values, concurrent access, resource exhaustion. You add the cases it missed, remove the ones that don't apply, and you have a test plan in 20 minutes instead of 90.

The pattern holds across practices. AI doesn't remove the need for your judgment and decisions. It removes the tedious mechanical work that made exercising that judgment expensive.

The skill still matters

AI doesn't replace software engineering skill. You still need to know what makes a good requirement, recognize an architectural problem, or identify an important edge case. But the skill that used to require hours of mechanical work to apply can now be applied in minutes. The bottleneck shifts from execution to decision making.

The new economics

Let's make this concrete with actual numbers. You're building a user authentication feature. Here's the time investment under the traditional approach versus the AI-assisted approach.

Traditional approach:

Requirements: 4 hours (writing, revising, documenting edge cases)
Technical design: 5 hours (research libraries, design session, diagram creation, write-up)
Test planning: 2 hours (enumerate test cases, document them)
Design review: 2 hours (prepare materials, conduct review)
Implementation: 8 hours (write code)
Testing: 3 hours (write tests, debug failures)
Code review: 1 hour (reviewer time)
Total: 25 hours

AI-assisted approach:

Requirements: 30 minutes (iterative drafting with AI)
Technical design: 1 hour (AI-generated options, you choose and refine)
Test planning: 20 minutes (AI enumerates cases, you approve)
Design review: 1 hour (reviewing AI-generated materials, discussion)
Implementation: 2 hours (AI generates code matching spec, you review)
Testing: 1 hour (AI writes tests from test plan, you verify)
Code review: 30 minutes (narrower review since spec was tight)
Total: 6.5 hours

The feature ships in a quarter of the time, and the time saved comes primarily from the practices that teams used to skip. Requirements, design, test planning: these compress dramatically because AI handles the mechanical work. Implementation also compresses, but less so, because implementation was already somewhat mechanical.

The critical insight is that the practices that provide the most leverage (early-phase work that prevents expensive late-phase problems) are now cheap enough to actually do. The 10-to-1 return on investment was always there. Now the initial investment is low enough that teams can afford to make it.

What this enables

When requirements cost 30 minutes instead of four hours, you can iterate on them. Write a draft, critique it with the AI, refine it, stress-test it by asking "what happens when..." questions, refine it again. Five iterations on a requirements document that started at 600 tokens costs about 9,000 tokens total. At current frontier model pricing (roughly $3 per million input tokens, $15 per million output tokens), that's about $0.15. The same feature implemented as 500 lines of code is roughly 3,000 tokens, and if you iterate five times, you're looking at 15,000 input tokens and 15,000 output tokens, about $0.50. But code iterations happen after you've locked in architectural decisions, and changing those decisions in code is where the 6.5x multiplier kicks in.

When test planning costs 20 minutes instead of two hours, you can be systematic. Ask the AI to enumerate edge cases for every function. Review its suggestions, add what it missed, remove what doesn't apply. You end up with comprehensive test coverage not because you spent hours on it, but because you spent minutes directing an AI that doesn't get bored or distracted.

When design review preparation costs an hour instead of five, you can have design reviews for medium-sized features, not just major architectural changes. Catch problems when they're still cheap to fix.

The practices work. They always worked. Now they're cheap enough to actually use.

The unchanged parts

AI doesn't change what makes a good requirement or a good design. It doesn't change the fact that finding bugs in production costs 100x more than finding them during requirements. It doesn't change the value of systematic thinking or the importance of edge cases.

What changes is the time cost of applying good judgment. The judgment itself still comes from you. You still decide what the system should do, what architectural approach makes sense, which edge cases matter, and whether the code is correct. AI compresses the work of expressing those decisions and makes iteration cheap enough that you can refine them until they're right.

This matters because software development has always been bottlenecked on thinking clearly about problems. The tedious work of documenting that thinking, preparing materials for review, and writing tests that verify your understanding has been expensive enough that teams skipped it. When you remove that friction, teams can actually practice the discipline that produces reliable software.

The rest of this course teaches you how to do that systematically. Requirements that constrain implementation. Test planning that forces you to think through failure modes before they occur. Design artifacts that let AI generate correct code on the first try. Verification that catches problems machines can detect, freeing you to focus on what they can't.

Good practices, finally achievable at low cost.