Transcript DISK FAILURE - San Jose State University
Presenter: Namrata Buddhadev (104_224_13.4.1-13.4.4) Professor: Dr T Y Lin
Index
13.4 Disk Failures 13.4.1 Intermittent Failures 13.4.2 Checksums 13.4.3 Stable Storage 13.4.4 Error- Handling Capabilities of Stable Storage
Types of Errors
Intermittent Error: Read or write is unsuccessful.
Media Decay: Bit or bits becomes permanently corrupted.
Write Failure: Neither write or retrieve the data.
Disk Crash: Entire disk becomes unreadable.
Intermittent Failures
If we try to read the sector but the correct content of that sector is not delivered to the disk controller Check for the good or bad sector To check write is correct: Read is performed Good sector and bad sector is known by the read operation
Checksums
Each sector has some additional bits, called the checksums Checksums are set on the depending on the values of the data bits stored in that sector Probability of reading bad sector is less if we use checksums For Odd parity: Odd number of 1’s, add a parity bit 1 For Even parity: Even number of 1’s, add a parity bit 0 So, number of 1’s becomes always even
Example: 1. Sequence : 01101000-> odd no of 1’s parity bit: 1 -> 011010001 2. Sequence : 111011100->even no of 1’s parity bit: 0 -> 111011100
By finding one bit error in reading and writing the bits and their parity bit results in sequence of bits that has odd parity, so the error can be detected Error detecting can be improved by keeping one bit for each byte Probability is 50% that any one parity bit will detect an error, and chance that none of the eight do so is only one in 2^8 or 1/256 Same way if n independent bits are used then the probability is only 1/(2^n) of missing error
Stable Storage
To recover the disk failure known as Media Decay, in which if we overwrite a file, the new data is not read correctly Sectors are paired and each pair is said to be X, having left and right copies as Xl and Xr respectively and check the parity bit of left and right by substituting spare sector of Xl and Xr until the good value is returned
Error Handling Capabilities of Stable Storage
Failures: If out of Xl and Xr, one fails, it can be read form other, but in case both fails X is not readable, and its probability is very small Write Failure: During power outage, 1. While writing Xl, the Xr, will remain good and X can be read from Xr 2. After writing Xl, we can read X from Xl, as Xr may or may not have the correct copy of X