I have been looking for new sources and I really like what I’ve seen from Asianometry. This video covers the history from the first Nvidia Graphics Processing Units, the GeForce 256, way back in 1999, through their current offerings. You can watch the video, which is quite accessible for those who aren’t deep into this topic, and I’ll try to provide some additional context as well.
Primary Motivation:
Two weeks ago I posted Artificial Intelligence For Disinfodrome, which included a fund raiser, and you guys really came through, with about $700 in donations and $650 in small investigative jobs. I haven’t hit go on any new hardware yet, because with each day there are new revelations.
There’s a working group among my associates now, a chat room for those of us dealing with applying AI to various problems, which mostly center around Retrieval Augmented Generation. We’ve all got mountains of documents and limited resources for understanding things. The goal is embedding these document caches into a vector database that’s suitable for use with a Large Language Model.
The delay has to do with optimizing the solution. Adding a GPU to one of our existing servers means replacing a 1U (1.75” tall) system with a 2U (3.5” tall) machine. This doubles the monthly cost from $100 to $200. HOWEVER … that $100/month would pay for a lot of time on a potent remote system. I haven’t said a lot about it, but my 2012 vintage HP Z420 workstation has been showing signs of mainboard trouble for more than a year. I pulled its twin out of storage for Qubes experiments, then put it back when I moved.
At the moment, I think the plan is to retire my desktop, replacing it with a Dell Precision 7740 with an Nvidia RTX 5000A. The Dell Precision 7730 I got in January has the CPU I need, but the Nvidia Quadro P4200 GPU is the equivalent of an Nvidia GTX 1080, which is just a tiny bit better than the GTX 1060 in my workstation now. The 7730 will resume its Qubes duties when this happens.
Here’s a bit more on remote GPUs and their economics. I just started following Matt Williams and his content is good, bite sized expositions on precisely the problems that I’m trying to solve.
Translation To Plain English:
I’m trading in a twelve core Xeon E5-2695v2 with a Passmark of 13264 for a six core Xeon E-2176M with a Passmark of just 10878. This is a more forward because the 6GB GTX 1060 GPU with a Passmark of 10084 gets replaced by a 16GB RTX 5000A with a Passmark of 14832. The key changes here are the tremendous boosting in working memory for the GPU and replacing 1st generation tensor cores with 4th generation. The Passmark is a one dimensions general purpose benchmark metric that fails to capture the nuance of the situation.
Around the turn of the century CPUs started offering more than one core and hyperthreading makes each core appear like two in terms of compute capability. The twelve cores in my workstation with single core score of 1636 feel much faster than the six cores with a score of 2470 in the laptop. My systems are typically doing many things at once. The hit in aggregate performance makes sense when you consider the workload.
Intel CPUs have SIMD (Single Instruction Multiple Data, Each of the cores has AVX (Advanced Vector Extensions). This is a broader sort of parallelism than the cores themselves, AVX can perform the same instruction on eight single precision (32 bit) floating point numbers at once. AVX2 extends this by allowing operations on integers, which are handled one at a time by AVX cores. AI is based on lots of matrix math, and these extensions are called array or vector processors. A vector is a one dimensional array, a matrix is two dimensional. This is a bit like rows in a spreadsheet.
The GPU takes this parallelism even further. Dedicated tensor cores first appeared in the RTX 20x0 PCIe desktop products, the implication here being the GTX 10x0 GPUs do not even HAVE this feature. I’ve spent some time digging on Ebay. The least cost 8GB RTX3000s are over $600 while the best of breed 16GB RTX5000 equipped machines are just $900.
The Passmark GPU rating is a one dimension consumer/game related number. The quality comparisons of the 3rd and 4th generation tensor cores are focused on the high end datacenter products. The academic/hobbyist AI world uses desktop machines in order to be able to deploy multiple GPUs in the same chassis. So I could go a lot of hand waving here, or I can just summarize:
The 304 tensor cores in a Dell Precision 7740 RTX5000 GPU should be around 100x faster than the software only solution in the Dell Precision 7730 P4200 GPU I have today.
Conclusion:
As usual, I’m exploring optimization in a weird corner of the overall user’s envelope. As usual, I’m trying to solve more than one problem simultaneously. As usual, I’ve found a product meant to do one thing, but I’m going to use it in a completely different fashion that the designers envisioned.
I bet about five of you read this far. Thanks for the assist in getting the gear I need to do these things.