Playback speed
×
Share post
Share post at current time
0:00
/
0:00
Transcript

Artificial Intelligence vs. Search Engines

Things are going to change and I don't mean incrementally.

Much like LSM-265, a lot of things tied to search engines, which de facto means Google, have been made obsolete this year. We haven’t hit the keel breaking explosive steam bubble phase yet, but the fan of bubble trails from various AI initiatives look like they’re all going to hit.

These are my own meandering observations as I simultaneously climb a couple of learning curves in this area.

Attention Conservation Notice:

I am just gonna ramble on a bit about the macro changes I see coming in the search field and all of the stuff related to it.

Public Search:

I’ve spent much of the last two years using DuckDuckGo first, then switching to Google if I can’t get what I need that way. About two months ago I made the switch to Microsoft’s Bing and it’s just as bad as I expected - some good results, but anything Linux related immediately turns it into a Microsoft marketing tool. Which is not worth a fuck IMHO.

Google’s results have been skidding, too. Where a few keywords used to get the job done, now you need to maneuver a bit to get what you want. And if there’s the slightest chance you might be in the market for some gadget based on the search …. I’ve started reflexively hitting page down before I even look at the responses.

Perversely, sites that exist purely to push product seem to be having a terrible time getting quality results in front of you. Someone gave me a BestBuy gift card and the only things I wanted, which were various types of storage, was just a bridge too far for their system. Amazon is fairing a bit better, but there too there are signs of trouble.

So here I sit, Rushing Into Semrush and hoping a reporter notices the opening described in MIOS: Russpublicans. Despite the oncoming storm I think this is going to be a good investment - everybody will face a epochal change, and I’m good in that sort of environment.

Private Search:

Open Semantic Search has been a companion for me the last five years, an irascible private document search engine I’ve managed to make supportable by dint of cleverness and stubbornness in equal measure. The faceted search it offers is a tolerable tool for people facing enormous piles of public documents.

A couple months ago I noted in Artificial Intelligence For Disinfodrome that the platform needed some attention and to do that I’d need both new software and likely some new hardware. As things are today, there’s a 10TB drive sitting here that needs to make its way to my hosting facility and a pair of 24TB are making their way through purchasing.

The tool that made the most sense in this environment was Dify, which offers a Retrieval Augmented Generation pipeline. I got started on this, but the incessant barking from NAFO’s Canine Intelligence Agency because too strident to ignore, and it turns out they ARE on to something, so the load average on the OSS box has looked like this for the last week straight.

While the system has been churning on that pile of documents I’ve been looking at all the things, and I discovered there’s something even better than RAG - GraphRAG.

Instead of combining an LLM with a vector database representing the content of a cache of private documents, which is what Retrieval Augmented Generation does, GraphRAG employs a knowledge graph. LLM responses in general are limited by the volume of tokens used for the context - the size of the working memory associated with any task. This gets more accurate when you use a private cache of quality documents to extend the LLM. And if your document cache has had Named Entity Recognition employed to create a graph of concepts, and a graph database is employed to model their connections, then the context stops being just the text around the keywords. Instead the material used for the generation will be the conceptually connected to the keywords, instead of limited to merely the words in proximity to them.

RAG is dramatically more accurate than a plain ol’ LLM, and GraphRAG will be another order of magnitude improvement.

Considerations:

All of the examples I can find involve using Neo4j. I’ve never cared for it and when it came time for me to build for Twitter indexing I used ArangoDB instead. Am I really going to have to learn another graph database? Longtime readers will recall that I typically put the exploding head gif from the movie Scanners in whenever I’m confronted with ANOTHER unavoidable opportunity to learn yet another complex system.

I had a go at converting Open Semantic Search’s knowledge graph from Neo4j to ArangoDB. This is the first I’ve ever mentioned that in public, so you can guess how well the process went for me.

Conclusion:

Reality is not doing so well, overall.

Remember Regarding Those Kremlin Trolls from 2016, which contained my first mention of The Menace of Unreality? Remember Reality Under Siege way back in September of 2023, where I agonized a bit about AI?

One of the links my browser opens to is Misinformation research is buckling under GOP legal attacks, which was published four days after I wrote Reality Under Siege.

Twitter lost its role as the global common the day the Musk purchase was finalized about a month after that and the termination of easy API access a year ago disempowered every researcher who cared to look at the platform.

Today the entire industry that exists to adjust Google search results stands to get the same treatment that excess warships did in the later 1940s. I just hope we can get through this without the whole thing going Skynet on us.

Discussion about this podcast

Netwar Irregulars Bulletin v2.0
Netwar Irregulars Bulletin v2.0