Introduction
In our new office at Pattern, I have 42U of server rack space to play with, so I want to get an AMD EPYC server for some long-running bioinformatics tasks. EPYC Genoa looks like the sweet spot for price to performance, but which of the 24 SKUs is the best for typical bioinformatics workloads? Obviously, more cores and more frequency is more better, but are there additional factors to consider?
Specifically, I’m interested in comparing the 9654 and 9684X CPUs. Both are 96-core, 192 thread monsters that can boost up to 3.7 GHz, but the 9684X has over a gigabyte of L3 cache, three times that of the 9654. That’s AMD 3D V-Cache, which became famous through it’s use in gaming desktop CPUs and has now made its way to the server market. 3D V-Cache is also supposed to help certain productivity workloads, but there’s not many benchmarks that cover bioinformatics specifically. The only mention I could find was this post on Mark Ziemann’s blog.
In this post, I benchmark a few common bioinformatics tools with the AMD 7950X3D processor, which has both 3D V-Cache and normal cores. In the end, I’m surprised to find a little to no increase in performance when running on the 3D V-Cache cores, at least for the algorithms I tested.
Methods
- Processor: AMD Ryzen 9 7950X3D: 16-core / 32-thread. 2 × 8‑core Core Complex Dies (CCDs). One CCD has 3D V-Cache. 128 MB L3 cache total, split 96/32 across the different CCDs.
- BIOS setup: For the V-Cache test, I disabled the non‑V-Cache CCD in BIOS. The reverse was done for the non V-Cache test.
- Operating system: Ubuntu 22.04.
- Other hardware: 2TB M.2 SSD, 96GB RAM. Memory overclocking and Precision Boost Overdrive (PBO) were disabled for this test.
- The V-Cache CCD boosts up to ~4.8 GHz under load, but the non V-Cache CCD can reach ~5.8GHz. To control for frequency, I ran a third test locking the non‑V-Cache CCD at 4.8 GHz via the
cpupower
command. - Bioinformatics tools: We do a lot of short and long-read alignment, so I used
minimap2
,STAR
, and a full run of the nf-core/RNAseq pipeline. All with real-world data from one sample. - Measurement: Wall time for the completion of the single command or entire pipeline. Average of 3 replicates reported for each test. The results of each test were surprisingly tight, within a few seconds. The same datasets and command were used for each test. Non-essential background processes as possible were closed during the test.
Results
Processor section | STAR (s) | minimap2 (s) | nf-core/RNAseq (m) |
---|---|---|---|
V-Cache CCD 4.8 GHz | 368 | 493 | 60.1 |
Non V-Cache CCD 5.8 GHz | 354 | 427 | 54.2 |
Non V-Cache CCD 4.8 GHz | 384 | 469 | 63.2 |
V-Cache improvement compared to 5.8 GHz | -3.8% | -15.5% | -10.9% |
V-Cache improvement compared to 4.8 GHz | 4.2% | -5.1% | 4.9% |
These results were quite interesting. 3D V-Cache offers a modest improvement compared to a frequency-matched processor, but only for certain tools and workflows. When the non V-Cache CCD was allowed to use the full 5.8GHz, it was always the winner.
Conclusions
For alignment-based bioinformatics tasks, a processor with 3D V-Cache may gain a task-dependent and small improvement in runtime. These improvements were completely negated by a higher-frequency processor. This is nowhere near the halving of runtime seen with computational fluid dynamics and other workloads.
Buying the more expensive and higher powered EPYC 9684X likely isn’t worth it for my use case. I need to learn more about how these algorithms take advantage of different CPU cache levels in order to attempt to explain these results. Additional investigation with AMD μprof might be helpful.
These results significantly more modest than what was reported at Genome Spot, although that post looked at Intel processors.
Limitations
These results could differ for other bioinformatics tasks, like variant calling. Additionally, I attempted to simulate the performance difference of two separate server processors by using different CCDs on my desktop processor. This method could give different results than separate server CPUs.