In direct continuation of yesterday's article about easily causing segmentation faults on AMD Zen CPUs, I have carried out another battery of tests for 24 hours and have more information to report today on the ability to trivially cause segmentation faults and in some cases system lock-ups with Ryzen CPUs.
If you didn't read yesterday's article be sure to do so for more background information and my initial steps on causing segmentation faults for Ryzen as well as testing some theories about the faults, but so far haven't been able to find a workaround to completely avoid this problem... This article is mostly about other attempts made at trying to nail down the issue, but to no avail, as well as showing other areas where these segmentation faults can be reproduced on Ryzen.
For this article all of my testing was done using Phoronix Test Suite's stress-run functionality. As explained in yesterday's article, the stress-run command within the Phoronix Test Suite has been used by enterprise customers for stress testing / burn-ins of hardware and checking for stability. Rather than benchmarking for performance, stress-run allows executing multiple test profiles in parallel for fully loading the system with whatever workloads you would like. Using PTS_CONCURRENT_TEST_RUNS=4 TOTAL_LOOP_TIME=60 phoronix-test-suite stress-run build-linux-kernel build-php build-apache build-imagemagick will have the Phoronix Test Suite continually running four different benchmarks simultaneously for a period of 60 minutes. As soon as one test finishes, another is fired up. The stress-run algorithm randomly picks the tests of your set to run, but does look at the test profile to ensure if the tests stress multiple subsystems, it tries to ensure stress on all subsystems are always being stressed. The Phoronix Test Suite's stress-run functionality isn't advertised as much as its other features, but is very useful for loading up a system with plenty of real-world workloads concurrently. This has been my main means of reproducing the Ryzen bug.
Many of my tests were done from the Ryzen 7 1800X box with MSI X370 XPOWER TITANIUM GAMING motherboard with Linux 4.13 and Corsair DDR4-3200 memory. But I also went ahead and with my other main Ryzen testing box, the Ryzen 7 1700 (stock speeds, always) with MSI B350 TOMAHAWK motherboard, 2 x 8GB Corsair DDR4-3000MHz memory, 120GB Samsung 840 SATA 3.0 SSD, and Radeon HD 7750 also ran the Phoronix Test Suite stress-run command...
With this completely different box:
Sure enough with this completely different system, with all different hardware components, segmentation faults were happening. With all the tests for this article I was running the stress-run process for one hour. In that time of stressing the Ryzen 7 1700 with parallel compilation workloads, there were 53 segmentation faults recorded and the first one about two minutes after the system booted.