There are some books that serve as dividing lines in the technology world. I can not count all the times I’ve sent some inquisitive kid out to read TCP/IP Illustrated Volume 1. The people who understand IP networks have read it. The rest may have some amount of knowledge, but they’ve not explored all the nooks and crannies, so they periodically get caught out.
Mastering Regular Expressions is required reading for anyone seeking expert status on any Unix derived OS. Even if you are not destined to be some sort of programmer, system administrators or any other duties with command line access will be MUCH smoother if you understand this ubiquitous text matching library.
Attention Conservation Notice:
Only of use to those who are on a highly technical path, either as developers or system administrators. The rest of you should avert your eyes …
The Basics:
There are some simple regular expressions one would encounter even using DOS back in the day.
dir *.txt
dir whatever.*
The first lists all text files, the second lists every file type associated with a given name. Now here’s an artifact from earlier this evening:
find . -type f -name "*.php" -exec perl -pi -e "s/'(.*?)'/\1/g" {} \;
This string is meant to find all *.php files in a directory tree and then apply a bit of Perl code that finds ‘$variable’ and replaces it with $variable. This is a minor issue in a disaster recovery process … that happened … about 5,800 times … across many, many files.
Now I read that book in … 1996 or 1997, when I first started moving away from Pascal and instead programming in Perl. Then I bought another copy and read it again in 2012, when I switched from Perl to Python. Now it’s 2025 and for a slightly creepy reason I’m being forced to install a smoldering pile of osprey poo known as Visual Studio Code and then debug PHP(!) Having never programmed in anything other than a language that starts with “P”, I had been congratulating myself for having avoided this web developer’s language.
But here we are, thirteen years after my last change of life, and it’s game on …
Complex Characters:
The reason that regular expressions are so complex are the metacharacters.
I am going to propose that the acid test for any AI is whether it can write functional regex, because it’s a non-trivial skill, and requires understanding incredibly dense strings containing these metacharacters.
#/\.^$*[]()|\&\n
The letters and numbers do what you’d expect, they’re interpreted literally. But these special characters are modifiers for pattern matching.
The ^ in a pattern means start of a line, unless you escape it with a backslash - \^
The $ in a pattern means end of line, unless you “escape” it with a backslash - $
The evil trio of period, asterisk, and question mark are ascii gremlins, period matches any single character, a period followed with an asterisk greedily matches all following characters, while a .?* matches all following characters, but in a non-greedy fashion. The .* will run to the end of line, while .*? will run just until the next portion of the regex would match.
Presented with the need to produce a regex that matches strings containing the presence of literally interpreted metacharacters has proven a bridge to far for ChatGPT. It simply can not handle the not that a single quote is anything other than a delimiter, when it was the thing I needed to match.
I just asked. It failed. I chided it to apply the correct rules to its regex. It failed. I gave it simple examples of proper use of single/double quotes and escaped metacharacters. It failed. I had wanted it to produce a one liner for sed, the Unix stream editor. It failed. I gave it the options of writing commands for both awk and perl. It failed.
Conclusion:
I have found ChatGPT to be an excellent simulacrum of a frisky college sophomore intern that can be dispatched to read man pages on tools I seldom use, then produce working examples. My find-fu and rsync-fu have improved a great deal since I began using it.
I mostly know what I want from Python, but there are a LOT of libraries and API wrappers out there. Being able to quickly get some example invocations I can test has been a great accelerator. I go up and down in terms of mental energy, it’s a lingering part of my long term battle with Lyme and the aftereffects. ChatGPT isn’t always right, but it’s generally close, and I can forge ahead on things where I’d previously have given up.
I’ve found LLM hallucinations pretty comical when I’m just poking around for fun. I know quite a bit about inter-war cruiser designs. You ask an LLM and you’ll get an essay full of plausible sounding statements about century old warship designs that are just utterly laughable. It IS entirely possible to find ships from that era with their B or C turret lower than the A turret, but that feature is archaic. So when ChatGPT comes back with the Iowa class battleship(!) as an example of a cruiser, then assures me that their second turret mounted lower than the first(!) in order to improve their ability to hit targets at close range(!) … LOLWUT?!?!?!
Tonight’s experience … I’m writing this at 0549 in an all night restaurant after nine hours of intense troubleshooting … was extremely frustrating. I spent a good ninety minutes trying to get a workable find/replace regex, and finally resorted to some slash and burn shell scripting and explicitly coding in Python to hunt down all the special cases. Demanding quick, accurate answers on regex is going to be my go-to for any LLM evaluation.
Perhaps my poor attitude on all this has to do with endless hours of smelling delicious gluten laden donuts that I am forbidden to touch. BART started again about three hours ago, so I’ll be able to wobble home and score some rabbit food … assuming we actually get this silly thing running again.
Coda:
The little Orange Pi came with me again this trip, although I had no use case for it. I plugged it into Pinky’s ethernet dongle, found the IP by examining the Mac’s arp table, and I could ssh into it. The WiFi I could reach was a hotspot, which demanded various bits of personal information before it would pass traffic. I was just starting to set up tinyproxy, so I could originate web traffic from the Pi, as we slipped into the deep debugging waters.
I could not get internet sharing on Pinky to behave. Possible culprits include the Docker install, TailScale, and the fact that my Mac runs in journalist’s lockdown mode. I really want this to end up with clean access from Pinky to the Pi, and then a fail closed VPN. I need to script something that 1) preserves the default gateway, 2) turns off the DHCP client, 3) installs a route for whatever VPN concentrator I want to use, and 4) stands up a VPN connection.
There are not enough hours in the day to do all that I think I should …
Thank you for sharing. Fun read and nap some to get that body rebuilding and brain level dropping needed to come out fighting the good fight.