Concurrent with Telegram Founder Pavel Durov Arrested, one of the usual suspects showed up with a find in the Offshore Leaks material. I’ve been on the trail of GraphRAG for Disinfodrome and despite my preference for ArangoDB, it looks like the first workable solutions will involve Neo4j.
This potential find in Offshore Leaks seems quite a bit like the work that led to MIOS: Iran’s PressTV. The ICIJ makes the data available as Neo4j version 4 and version 5 exports. There’s a nice Github repo on how to handle it.
Disinfodrome started life as a cluster of Open Semantic Search instances. It’s sprouted a Datasette capability. There is an OpenLDAP implementation that has languished for a couple months now. When complete it will be a directory of names/emails/phone numbers, and which leaks exposed them.
I have been dealing with a maze of a docker-compose.yml the last few days, and while it doesn’t include Neo4j, I might as well do this stuff in parallel.
Neo4j has some oddities in terms of network usage - it’s got an https web service interface, but there’s another port for BOLT and I recall desktop access needing that. Cloudflare won’t proxy that one, so this will be a Tailscale thing once it’s running.
If you want to poke around in Offshore Leaks, don’t wait for me - the Neo4j Desktop should be sufficient for that.