Senior Director of Million-Greenback Regexes – O’Reilly

| The next article initially appeared on Medium and is being republished right here with the creator’s permission. |
Don’t get me improper, I’m up all night time utilizing these instruments.
However I additionally sense we’re heading for an costly hangover. The opposite day, a colleague advised me a couple of new proposal to route 1,000,000 paperwork a day via a system that identifies and removes Social Safety numbers.
I joked that this was going to be a “million-dollar common expression.”
Run the maths on the “naïve” implementation with full GPT-5 and it’s eye-watering: 1,000,000 messages a day at ~50K characters every works out to round 12.5 billion tokens day by day, or $15,000 a day at present pricing. That’s practically $6 million a yr to verify for Social Safety numbers. Even in case you migrate to GPT-5 Nano, you continue to spend about $230,000 a yr.
That’s successful. You “saved” $5.77 million a yr…
How about working this code for 1,000,000 paperwork a day? How a lot would this value:
import re; s = re.sub(r”bd{3}[- ]?d{2}[- ]?d{4}b”, “[REDACTED]”, s)
A plain outdated EC2 occasion may deal with this… A single EC2 occasion—one thing like an m1.small at 30 bucks a month—may churn via the identical workload with a regex and value you a number of hundred {dollars} a yr.
Which implies that in apply, firms will probably be calling folks like me in a yr saying, “We’re burning 1,000,000 {dollars} to do one thing that ought to value a fraction of that—are you able to repair it?”
From $15,000/day to $0.96/day—I do suppose we’re about to see a whole lot of firms notice {that a} considering mannequin linked to an MCP server is far more costly than simply paying somebody to put in writing a bash script. Beginning now, you’ll be capable of make a profession out of un-LLM-ifying functions.