Senior Director of Million-Greenback Regexes – O’Reilly



The next article initially appeared on Medium and is being republished right here with the creator’s permission.

Don’t get me improper, I’m up all night time utilizing these instruments.

However I additionally sense we’re heading for an costly hangover. The opposite day, a colleague advised me a couple of new proposal to route 1,000,000 paperwork a day via a system that identifies and removes Social Safety numbers.

I joked that this was going to be a “million-dollar common expression.”

Run the maths on the “naïve” implementation with full GPT-5 and it’s eye-watering: 1,000,000 messages a day at ~50K characters every works out to round 12.5 billion tokens day by day, or $15,000 a day at present pricing. That’s practically $6 million a yr to verify for Social Safety numbers. Even in case you migrate to GPT-5 Nano, you continue to spend about $230,000 a yr.

That’s successful. You “saved” $5.77 million a yr…

How about working this code for 1,000,000 paperwork a day? How a lot would this value:

import re; s = re.sub(r”bd{3}[- ]?d{2}[- ]?d{4}b”, “[REDACTED]”, s)

A plain outdated EC2 occasion may deal with this… A single EC2 occasion—one thing like an m1.small at 30 bucks a month—may churn via the identical workload with a regex and value you a number of hundred {dollars} a yr.

Which implies that in apply, firms will probably be calling folks like me in a yr saying, “We’re burning 1,000,000 {dollars} to do one thing that ought to value a fraction of that—are you able to repair it?”

From $15,000/day to $0.96/day—I do suppose we’re about to see a whole lot of firms notice {that a} considering mannequin linked to an MCP server is far more costly than simply paying somebody to put in writing a bash script. Beginning now, you’ll be capable of make a profession out of un-LLM-ifying functions.

Leave a Reply

Your email address will not be published. Required fields are marked *