As others have pointed out, it depends. The most I would need to do is identify the faulty stick and pull it. That’s pretty trivial to do and I would be back up and running within the hour.
The only thing that can’t fail (or the system stops working), is the motherboard and the power handling equipment. The power supply is redundant but the power from the power supplies needs to be handled before it gets to the system components, so there’s a small PCB that connects to both power supplies and provides power to everything in the system.
Of course the motherboard connects everything together.
My GPU is also unique in the system, but I have less powerful spares laying around that I have stopped using, so I can literally just grab a different GPU and be back up and running at nearly the same capacity, minus the gaming graphics performance that’s lost…
On top of all of that, I upgraded to my current Dell precision from a different Dell precision. It’s an older model and definitely showing it’s age, but it still works. So that’s available to me in a pinch.
Basically, I have backups for my backups when it comes to hardware. I have a handful of spares for my SSD array disks too…
Worst case, I hook up my framework laptop as a stand in for my desktop until a replacement part ships to me.
are you talking ECC RAM? because normal RAM sticks tend to fail in ways like “after 2 years of use this random page of memory is now reporting the 17th bit of every word as 1, no matter what value you actually write to it”. if you have faulty RAM like that, RAID won’t save you; copy a 1 GB file from ~/a to ~/b: it’s read into RAM, 100 bits are flipped, the RAID system takes what’s in RAM and commits it to disk, congrats: your RAID array will reliably preserve that corrupted file for 100 years. worse is when the bad bits aren’t file data but internal filesystem structures, and then you just lose half your directory tree.
so yeah, get ECC RAM. or if you don’t for some reason then when your bulletproof storage system starts reporting filesystem-level corruption 2 years from now, remember this comment and run memtest before spending $100’s replacing all the drives, motherboard, etc.
My system is a Dell precision rack 7910, with dual Xeon E5 processors (I forget the exact model off the top of my head) and yes, 64G of ECC memory. DDR 4 IIRC.
I’m telling you, the system is rock solid. I got it refurbished, so it was put through its paces before I got it. Any defective parts will have already failed and been replaced.
As others have pointed out, it depends. The most I would need to do is identify the faulty stick and pull it. That’s pretty trivial to do and I would be back up and running within the hour.
The only thing that can’t fail (or the system stops working), is the motherboard and the power handling equipment. The power supply is redundant but the power from the power supplies needs to be handled before it gets to the system components, so there’s a small PCB that connects to both power supplies and provides power to everything in the system.
Of course the motherboard connects everything together.
My GPU is also unique in the system, but I have less powerful spares laying around that I have stopped using, so I can literally just grab a different GPU and be back up and running at nearly the same capacity, minus the gaming graphics performance that’s lost…
On top of all of that, I upgraded to my current Dell precision from a different Dell precision. It’s an older model and definitely showing it’s age, but it still works. So that’s available to me in a pinch.
Basically, I have backups for my backups when it comes to hardware. I have a handful of spares for my SSD array disks too…
Worst case, I hook up my framework laptop as a stand in for my desktop until a replacement part ships to me.
Backups of backups.
are you talking ECC RAM? because normal RAM sticks tend to fail in ways like “after 2 years of use this random page of memory is now reporting the 17th bit of every word as 1, no matter what value you actually write to it”. if you have faulty RAM like that, RAID won’t save you; copy a 1 GB file from ~/a to ~/b: it’s read into RAM, 100 bits are flipped, the RAID system takes what’s in RAM and commits it to disk, congrats: your RAID array will reliably preserve that corrupted file for 100 years. worse is when the bad bits aren’t file data but internal filesystem structures, and then you just lose half your directory tree.
so yeah, get ECC RAM. or if you don’t for some reason then when your bulletproof storage system starts reporting filesystem-level corruption 2 years from now, remember this comment and run
memtest
before spending $100’s replacing all the drives, motherboard, etc.My system is a Dell precision rack 7910, with dual Xeon E5 processors (I forget the exact model off the top of my head) and yes, 64G of ECC memory. DDR 4 IIRC.
I’m telling you, the system is rock solid. I got it refurbished, so it was put through its paces before I got it. Any defective parts will have already failed and been replaced.