Clear back on June 2nd MIOS: Iran’s PressTV appeared, a month after we heard it was in the pipeline at Washington Post, and two months after MIOS: Doin’ A FISA teased that we had something big.
The first real world effect showed up August 8th, 2024 in MIOS: PressTV First Blood, wherein an Iranian agent, who also happens to be some sort of journalist, got pinched on arrival at Heathrow.
There was another story in the queue at Washington Post, but it has been nixed. I think this happened over concerns about the incoming Trump administration, although it’s not clear precisely why.
Yesterday I was asked to do some accessibility work on the PressTV data and some new reporter is going to pick up the story …
Attention Conservation Notice:
Just a little tease, the lede tells you the important bits, the gritty meta-details on what was done to the data are all that follow.
Process:
Like so much of what I’ve done, starting roughly with the Russian invasion of Ukraine, yesterday’s work was all towards the purpose of turning a terrible muddle into something a journalist can quickly and easily evaluate.
Instead of the original file names, which likely had lots of Persian script, Black Reward renamed them using their 64 character sha256 checksums. This was a kindness done for westerners trying to examine the leak, which helped people initially. Over the long haul it served to make the content opaque, until I got after it with awk/grep/sed, as well as placing the content into a Disinfodrome instance of Open Semantic Search.
The original almost 1,300 files of interest looked like this. I replaced some characters with dashes or underscores, so that I can show you guys this without stepping on the reporter’s scoop.
So I did a variety of things to make the files less impenetrable:
Extracted text from each file using pdftotext from Poppler.
Extracted date stamps from the text files.
Used awk, grep, sed, and MicroEmacs to get names from the text files.
Used awk to combine date stamps, individuals mentioned into new file names.
With each file I started with a name like this:
01ba4719c80b6fe911b091a7c05124b64eeece964e09c058ef8f9805daca546b.pdf
And finished with two files named like this:
2024-12-18-NealRauhauser-01ba4719c80b6fe911b091a7c05124b64eeece964e09c058ef8f9805daca546b.pdf
NealRauhauser-2024-12-18-01ba4719c80b6fe911b091a7c05124b64eeece964e09c058ef8f9805daca546b.pdf
These files were placed into a Dropbox with two subfolders named bydate and byname. This facilitates finding all information for a given name, or seeing all events in a specific time period. I finished the work off by creating one single large zip file of the whole folder.
Qualitatively examining those files would be no big deal for me, after almost forty years of unix experience, my keyboard sounds like an old fashion teletype when I’m stringing together unix tool chains to handle stuff. A reporter would have to laboriously open 1,300 files, taking notes as they go … and in a world of deadlines, a complex problem like this would fall by the wayside. Presenting in the aforementioned fashion makes reporting on it a much less forbidding process.
Conclusion:
Since this is going to lead to a published article, I better bring up Scientology’s Dullest Tool - there’s a mentally ill former journalist named Ron Brynaert who has been obsessing about me since 2011. Any time it becomes known that I’ve talked to a reporter, he turns up, waving around his “executive editor” title from Raw Story, and sniveling about my “hoaxing”.
The only vaguely journalistic person I’ve ever hoaxed is … Ron Brynaert. I finally lost my patience with his cyberstalking me - it took just 72 hours to goad him into a course of action that cost him his sealed settlement with Raw Story, as well as piling on another $500,000 judgment.
But no matter how high pitched his keening becomes, it won’t matter a lick. The provenance on the PressTV leak is very well understood at this point. And I’m glad to see things are once again moving in this area.