Artificial Intelligence For Disinfodrome

An update on what I've been doing and a fund raiser.

Feb 28, 2024

As I hinted at in Retiring Almost Everything, I’ve been looking into how I can use artificial intelligence. A couple months ago this would have elicited a shrug from me, but I got some assistance and there’s a prototype coming together.

Origins:

First, an engineer’s eye view on what I’ve done in the past. If you’ve ever looked at the Neal Rauhauser Figshare you know that I used to have a social media activity tracking practice at large scale. This involved making content searchable using Elasticsearch, then later ArangoDB. I often used Gephi for data visualization and I started playing with Graphistry about the time that business ran out of steam. The need is still there but when the Twitter API went from free to $500k/month minimum that excluded civil society organizations, and then the wave of bots began.

Disinfodrome is a document search service that uses Open Semantic Search. I’ve said much less about it but I’ve spent time with Paliscope and YOSE. I had a go at moving from OSS to using MayanEDMS, but it’s less capable of the sort of free form digging that’s done with the document collections I curate. I managed to come up with a repeatable install for OSS, which has not been updated in quite a while, and that’s OK since my offerings are hidden behind Cloudflare Access.

Possible Future:

AI is an absolute madhouse, there are many new companies, products, and alliances announced every week. I got some tips from a friend and they’ve led to something that both works on the new laptop I have and looks like something that people who use Open Semantic Search could put to work.

This may not be the best solution, but it’s one I can get running and show within a couple days. The basic layout is simple - the LLM is akin to a dictionary/thesaurus/grammar checker, and you use a vector database of your documents to provide knowledge to the LLM, so it doesn’t just make up stuff that sounds good.

The software challenges are:

Finding the right LLM.
Picking a workable vector database.
Vectorizing a mountain of PDFs.
Restoring ArangoDB/Elasticsearch and integrating them.

And then there’s the matter of capital cost. I’ve solved the development system problem with the new laptop but one of my elderly rackmounts must give way to a newer system. The minimum is a Dell PowerEdge R730XD like this one. These are the very first processors with AVX2 support and I have to stick to this model because I own a dozen high capacity 3.5” NAS drives. Shifting to 2.5” disks would cost a fortune.

And the other piece that’s needed is an Nvidia RTX 40x0 GPU. The GTX 10x0 I have now are enough for some experiments, but that’s about it. The sweet spot is $500 for a 16GB RTX 4060. The first generation RTX 30x0 look attractive … until you talk to someone who has used both types. They’re a penny wise pound foolish sort of thing.

Conclusion:

Longtime readers are aware that I am slowly standing back up after more than a decade of illness. Getting an R730 in a rack with an RTX 4060 is going to cost around $850 - $1,000, and going from a 1U to 2U machine will take my monthly cost from $100 to $200.

And that’s not gonna happen without an assist from you guys.

As I’ve said before, those who need the conflict skills I’m attempting to transfer with this Substack are going to be irregulars, often in countries with dramatically lower median income than the U.S. If I make the good stuff paid, I would be losing the audience who need these things the most. But I really do need an assist in the form of some of you signing up as founders. If each of the eighty five of you who subscribe came up with $10 that would just cover the cost of this move.

🇺🇦 Netwar Irregulars Bulletin 🇺🇦

Artificial Intelligence For Disinfodrome

An update on what I've been doing and a fund raiser.

Origins:

Possible Future:

Conclusion: