"Tested" is a word that hides a measurement problem. In a status meeting it means a column in a spreadsheet, ticked. In a system that has to last five years across half a dozen services and three locales, it means a regression suite that runs at every step toward release and would have caught yesterday's bug before it landed. The gap between those two meanings is where most of the cost of software lives - and it's also where AI just shifted the ground hard.
What testing actually costs
First, the budget reality. The numbers below are widely cited industry ranges (ISTQB, IEEE, decades of project surveys) and they don’t shift much by tech stack.
- New software development: testing typically eats 25% to 40% of total project effort. Higher in regulated or safety-critical domains.
- Software being actively developed (post-MVP, still adding features): each sprint spends 30% to 50% of its capacity on testing, and the share grows with the surface area. Every new feature has to coexist with everything that already shipped, in every locale the product runs in.
- Software in maintenance: testing dominates. 50% to 70% of maintenance effort is verification work. The ROI target here is straightforward: every euro spent on automated regression saves multiples in production-incident cost. A bug caught in test typically costs 10x to 100x less than the same bug caught in production.
What changes is what those percentages buy. The same 30% sprint budget can buy a one-shot manual test pass that’s forgotten the next morning, or a regression suite that compounds for five years. That is the choice.
The asset, not the artifact
Software doesn’t last because it was built well once. It lasts because it stays shippable, month after month, year after year, without breaking what was already paid for. The thing that protects that compound investment is regression: a battery of automated checks that runs at every step toward release - the PR check, the integration build, the staging promotion, the pre-prod gate - and that says, with confidence, “you didn’t just break the invoice export, the SSO login, or the PDF rendering for French-language customers”.
A test that ran once, was filled into a Jira X-Ray ticket marked “Passed”, and was never run again is not an asset. It is a documentation artifact. The cost is real - someone spent a day producing it. The compounding value is zero. The next time anyone touches that area of the code, a human has to redo the work, or, more often, doesn’t, and ships the change anyway.
The only kind of testing that protects the investment is the kind that keeps running without you.
Did anyone test in the target language at the end?
The other thing that doesn’t survive contact with a “tested” stamp is locality. European software ships into a fragmented market. A Swiss app has German, French, and Italian-speaking customers. Number formats split between CHF 1'000.50 (apostrophe thousands separator) and 1.000,50 (comma decimals). Date strings like “01/06/2026” can be valid in two locales and mean different days. Tax codes, postal-code regexes, the difference between “Mwst.” and “MwSt.”, the difference between “courriel” and “e-mail” - none of this surfaces in a test plan executed by people who don’t read the customer’s letters.
A tester who passes every functional flow can still ship software that reads broken to the customer. By the time someone flags it, the fix has compounded into a hotfix release, an apology email, and (if the customer is regulated) a compliance question.
Local knowledge is a testing asset. Most test budgets don’t price it that way.
What AI just changed
For a long time the math behind “outsource testing to a big offshore team” looked stable. Manual exploratory testing scaled linearly with people, and the lever was hiring more testers.
The math has flipped. A senior engineer who knows the product, the locale, and the regulatory frame can now sit with an LLM-assisted exploration tool, walk through a feature once, and have that walkthrough captured into a deterministic, repeatable end-to-end test that joins the regression suite the same afternoon. What used to be a week of test-script writing is now an hour. What used to be a one-shot manual pass becomes an asset that keeps running, on every promotion stage, in every language the product ships in.
We have seen verification cycles compress from multi-week regression passes to suites that finish in tens of minutes at every release-bound step. Teams that used to ship monthly now ship weekly, or per-feature, without losing coverage. The 10x to 100x is in the time-from-change-to-verified-shippable, not in raw release count - and that is exactly the metric the people accountable for the system care about.
This is not a “use ChatGPT to write tests” claim. The interesting part isn’t a single tool. It is that the same senior person who can read the customer’s language, judge whether the new feature is the right thing to build, and design the regression strategy can now also produce in a few hours a test asset that used to require a remote team of ten. The leverage went somewhere unexpected.
The urgency: dev speed without test speed is loss of control
This is the part most organizations are still not pricing in. AI is accelerating development teams everywhere. The same engineer who used to ship one feature a week is now refactoring three modules and prototyping a fourth in the same week. That throughput compounds every quarter.
If the testing function does not speed up at the same rate, the outcome is not “faster shipping”. It is loss of control. More changes flow through the same narrow verification gate, the gate becomes the bottleneck or starts being skipped, and production quietly turns into the test environment. The first symptoms are subtle - a few more rollbacks, an unexplained rise in customer-reported defects, a regression that escapes for two sprints before anyone notices. The later symptoms are not subtle. They are an audit finding, a regulatory letter, or a paying customer telling you they no longer trust the software they built their operation on.
Organizations that pair an AI-accelerated dev team with a 2018-shaped test process are, usually without realising it, dialling up risk every sprint. The only stable answer is that test speed has to grow at least as fast as dev speed. That is the part that makes regression automation strategic now, not tactical.
Why the local / hybrid model out-delivers now
The offshore-testing model was a procurement choice, not a quality one. It made sense when test labour was the bottleneck and headcount was the lever. The contract structure, billed by tester-hours, matched the work shape.
That shape is outdated for the new economics. A team billed by the hour has no built-in reason to compress two-thirds of the manual work into automation. Filling Jira X-Ray tickets with “tested” generates a billable hour but does not secure the long-term stability of a digital product. Producing an automated, semi-autonomous regression suite would shrink the invoice. Most teams in that contract therefore keep producing the artifact, not the asset - and they cannot, on the side, also be the people who notice that the German invoice header says “MwSt.” where it should say “Mwst.”, because they don’t read the language the customer reads.
A small local or hybrid team optimized for test automation works the other way around. The faster the regression suite catches defects unattended, the sooner the team is free to ship the next features and run a few exploratory manual tests to surface the real issues. Someone on the team understands the customer’s business context - including its language - and spots when the date format is rendering the wrong day. The leverage from AI compounds. Resources naturally align with delivered value and the lasting value of the product, not with how many tester-hours got logged this sprint.
If this is the shape of test process you want, that is the brief of our QA & Test Automation service.
Where the testing budget actually has to go
Testing isn't a phase you finish. It is a 30% to 70% line item that runs for the life of the software, and the only question worth asking is whether your spend is buying an artifact or an asset.
Regression that runs unattended at every step toward release - and that someone on the team can read in the target language - is what protects the product's value over five years. Everything else is documentation.
AI is accelerating development teams faster than most testing functions can keep up. If your test process hasn't compressed in step with your dev throughput, you are losing control even if the dashboards still look green. Pick a team whose contract is aligned with the software lasting, not with the hours logged this sprint.
