• MystikIncarnate@lemmy.ca
      link
      fedilink
      English
      arrow-up
      1
      ·
      15 days ago

      As others have pointed out, it depends. The most I would need to do is identify the faulty stick and pull it. That’s pretty trivial to do and I would be back up and running within the hour.

      The only thing that can’t fail (or the system stops working), is the motherboard and the power handling equipment. The power supply is redundant but the power from the power supplies needs to be handled before it gets to the system components, so there’s a small PCB that connects to both power supplies and provides power to everything in the system.

      Of course the motherboard connects everything together.

      My GPU is also unique in the system, but I have less powerful spares laying around that I have stopped using, so I can literally just grab a different GPU and be back up and running at nearly the same capacity, minus the gaming graphics performance that’s lost…

      On top of all of that, I upgraded to my current Dell precision from a different Dell precision. It’s an older model and definitely showing it’s age, but it still works. So that’s available to me in a pinch.

      Basically, I have backups for my backups when it comes to hardware. I have a handful of spares for my SSD array disks too…

      Worst case, I hook up my framework laptop as a stand in for my desktop until a replacement part ships to me.

      Backups of backups.

      • colin@lemmy.uninsane.org
        link
        fedilink
        English
        arrow-up
        1
        ·
        15 days ago

        are you talking ECC RAM? because normal RAM sticks tend to fail in ways like “after 2 years of use this random page of memory is now reporting the 17th bit of every word as 1, no matter what value you actually write to it”. if you have faulty RAM like that, RAID won’t save you; copy a 1 GB file from ~/a to ~/b: it’s read into RAM, 100 bits are flipped, the RAID system takes what’s in RAM and commits it to disk, congrats: your RAID array will reliably preserve that corrupted file for 100 years. worse is when the bad bits aren’t file data but internal filesystem structures, and then you just lose half your directory tree.

        so yeah, get ECC RAM. or if you don’t for some reason then when your bulletproof storage system starts reporting filesystem-level corruption 2 years from now, remember this comment and run memtest before spending $100’s replacing all the drives, motherboard, etc.

        • MystikIncarnate@lemmy.ca
          link
          fedilink
          English
          arrow-up
          1
          ·
          15 days ago

          My system is a Dell precision rack 7910, with dual Xeon E5 processors (I forget the exact model off the top of my head) and yes, 64G of ECC memory. DDR 4 IIRC.

          I’m telling you, the system is rock solid. I got it refurbished, so it was put through its paces before I got it. Any defective parts will have already failed and been replaced.

    • zeca@lemmy.ml
      link
      fedilink
      English
      arrow-up
      1
      ·
      edit-2
      16 days ago

      My experience of a ram stick failing is that my system would crash when ram usage got above the first stick. So I had 2x 8gb sticks and whenever it got above 8gb, the system would crash. Took me a while to figure out what was wrong.

      • unexposedhazard@discuss.tchncs.de
        link
        fedilink
        English
        arrow-up
        2
        ·
        16 days ago

        That kinda sounds like you didnt socket your ram correctly. If its configured in dual channel, then it should utilize both sticks equally afaik. I might be totally wrong here tho.

        • zeca@lemmy.ml
          link
          fedilink
          English
          arrow-up
          1
          ·
          16 days ago

          You might be right about the dual channel thing. Its been a while so i may be rememberig wrong.