Welcome to Inkbunny...
Allowed ratings
To view member-only content, create an account. ( Hide )
Waccoon

SSD woes: bit rot?

I recently needed to upgrade the hard drive on my primary workstation, so I was doing some research into new drives and the latest reliability rankings.  During this time, I saw a YouTube video about the longevity of SSDs and hard drives.  I've known for a while that unlike a hard drive, an SSD will lose its data if powered off for too long.  Typically people agree the limit is "a few years", though there's no hard rule about this.

It's been about a year and half since I powered up my old WindowsXP legacy system which uses a Samsung 840 Evo.  I ran a short SMART test and it passed just fine, no surprises there, but then I did a surface test.

15 MB per second.

I was shocked.  According to my toolkit, Hard Drive Sentinel, most of the data areas of the drive were super slow, but empty areas were getting the full, consistent 250 MB/s as they should.  I immediately got the impression that the SSD was suffering from weak cells, i.e. "bit rot", so any cells that haven't been written in a while weren't reading correctly.

I did multiple surface tests back-to-back, and was under the assumption the drive would realize there is a problem and fix it automatically.  That made no difference.  Over the course of a few days of tests, performance never improved at all in any data areas.  Samsung Magician (their SSD test/performance suite) supports a number of performance tests, optimizations, and forced TRIM operations... but nothing worked.  I decided to force the issue by doing a manual refresh.

Conveniently, the WinXP version of defrag gives a visual representation of where fragmentation is located on the drive, so you can compare where the used data areas are located and to where they are moved.  Sure enough, after I did a full defrag with compaction, the performance of the drive in the "moved" areas was right back to full speed, but most of the data at the beginning of the drive was still super slow.  I then told HD Sentinel to do a full surface refresh, which reads and writes-back the entire "surface" of the drive.  The whole drive, from start to finish, is now running at full speed.

Before: 840 Evo initial surface test
After: 840 Evo, after full surface refresh

Now for the kicker: I decided to repeat the performance test on the primary SSD of my workstation, which is about 9 years old, but is powered on every day.  The numbers were not quite so bad, but I absolutely saw the same performance degradation across old data areas that are probably only read regularly.  I did a defrag, and sure enough, the numbers improved in certain areas of the drive that were compacted.  I was wondering why my machine was feeling sluggish over the last year, given that I still run Win7 and never install updates or any of that stuff.

Given the age of the SSD, it's clear the drive is performing some kind of maintenance to keep cells from degrading, but even when powered on regularly, it's not perfect, so a manual refresh every few years might be a good idea.  According to the wear-leveling indicator, the drive is 94% healthy and shows 8.2 TB of lifetime writes, so there's plenty of headroom for full rewrites.

I'm a bit concerned about this.  I have a collection of old computers from the 80's and 90's that all still work perfectly from their original hard drives.  It's disappointing to think that keeping a modern computer powered off for 2-3 years might be enough to kill it and lose all your data.  No wonder why I have handfuls of dead USB thumb drives, and my older PCs are suffering from corrupt BIOS images.  Anything based on flash memory probably needs to be refreshed every now and then or it will die.  It wouldn't surprise me if modern cars suffer the same issue (Tesla has already had problems with dying SSDs in their older cars).

On another note, I did a service on someone else's RAID array, where he had 22 drives and only 1 parity (don't get me started).  After addressing a failed drive, 3 other drives about to fail, and adding a second parity, I figure I'd look at the contact points of each hard drive for corrosion, which is a known issue with modern hard drives.  Practically every drive had massive corrosion, just like in this picture.  Needless to say, I spent quite some time cleaning drives with a plastic eraser and contact cleaner.  What a mess!  I also examined one of my new-old-stock hard drives, which has been in storage for 10 years and never used.  It is brand new, fully sealed, and had those moisture beads in the packaging and everything.  When I disassembled the drive, sure enough, the contact points were all corroded.  The drive worked, but it doesn't inspire confidence that any modern technology is really capable of surviving for more than 10 years, no matter how carefully it's been stored and little it's been used.  All my Amigas look at new as when I bought them, with no corrosion anywhere in sight.  Hell, there isn't even any yellowing on the plastics.

Anyway, just food for thought.  If you found this interesting, in a few days I plan to give a bit of a summary regarding the history of Backing Out and why their world seems to be stuck in the early 1980's despite taking place thousands of years in the future.  Even decades ago, I knew that the preservation of technology and long-term archiving of information would be a major issue.  All of the human technology they "dug up" was effectively broken and corroded beyond understanding... except the stuff predating the 80's.  You can always count on low-density magnetic media, like floppy disks and reel tape, to be reliable.  8)
Viewed: 56 times
Added: 1 month, 3 weeks ago
 
Eviscerator
1 month, 3 weeks ago
Cassette Futurism is the True Futurism.
ThaPig
1 month, 3 weeks ago
After reading this, I pulled a floppy I had in a drawer for about 30 years and tested it. It still works.
( ᐢ (oo) ᐢ )
Waccoon
1 month, 3 weeks ago
Virtually every floppy disk in my collection still works perfectly.  Despite their reputation, they are in fact super reliable!

The only exception are the late-model disks made in the early 2000's.  Towards the end of the floppy era, quality went off a cliff, and almost everything made towards the end is total crap.  I've got a few boxes of NOS promotional floppies I got from a company clearing house, and right out of the box, about 25% are defective and can't be formatted.  If you're going to use a floppy disk, make sure it was manufactured in or before 1990.
foxboyprower
1 month, 3 weeks ago
It's pretty cool you have an SSD that old and can see it happening.
Waccoon
1 month, 3 weeks ago
Yeah, I jumped on the SSD bandwagon really early... and got seriously burned with those early models.  I still have nightmares about that OCZ drive that just froze my computer for minutes at a time.  The Corsair drive that replaced it was better, but not by much.

These Samsung drives are pretty good and holding up well, but the newer 870 Evo models apparently have some major firmware issues and will suddenly die for no reason.  Alas, the transition from hard drives to SSDs in this industry has not been very smooth!
foxboyprower
1 month, 3 weeks ago
I only learned about how they worked in my operating systems class.
FriskyWoods
1 month, 3 weeks ago
Well, fuck me! Does that mean all solid state memory? Like, SD cards and flash drives as well?
Waccoon
1 month, 3 weeks ago
Pretty much.  All flash media works on the same principle, but cell design does vary.

Some cell types are designed to store only one bit at a time (SLC), while others are designed to multiplex more than one bit per cell (MLC).  The SLC drives are expensive and can handle more lifetime writes before they wear out, but I'm not sure if they can retain their data longer.
New Comment:
Move reply box to top
Log in or create an account to comment.