DISK FAILURE - San Jose State University

Download Report

Transcript DISK FAILURE - San Jose State University

Presenter: Namrata Buddhadev (104_224_13.4.1-13.4.4) Professor: Dr T Y Lin

Index

13.4 Disk Failures 13.4.1 Intermittent Failures 13.4.2 Checksums 13.4.3 Stable Storage 13.4.4 Error- Handling Capabilities of Stable Storage

Types of Errors

 Intermittent Error: Read or write is unsuccessful.

 Media Decay: Bit or bits becomes permanently corrupted.

 Write Failure: Neither write or retrieve the data.

 Disk Crash: Entire disk becomes unreadable.

Intermittent Failures

 If we try to read the sector but the correct content of that sector is not delivered to the disk controller  Check for the good or bad sector  To check write is correct: Read is performed  Good sector and bad sector is known by the read operation

Checksums

 Each sector has some additional bits, called the checksums  Checksums are set on the depending on the values of the data bits stored in that sector  Probability of reading bad sector is less if we use checksums  For Odd parity: Odd number of 1’s, add a parity bit 1  For Even parity: Even number of 1’s, add a parity bit 0  So, number of 1’s becomes always even

 Example: 1. Sequence : 01101000-> odd no of 1’s parity bit: 1 -> 011010001 2. Sequence : 111011100->even no of 1’s parity bit: 0 -> 111011100

 By finding one bit error in reading and writing the bits and their parity bit results in sequence of bits that has odd parity, so the error can be detected  Error detecting can be improved by keeping one bit for each byte  Probability is 50% that any one parity bit will detect an error, and chance that none of the eight do so is only one in 2^8 or 1/256  Same way if n independent bits are used then the probability is only 1/(2^n) of missing error

Stable Storage

 To recover the disk failure known as Media Decay, in which if we overwrite a file, the new data is not read correctly  Sectors are paired and each pair is said to be X, having left and right copies as Xl and Xr respectively and check the parity bit of left and right by substituting spare sector of Xl and Xr until the good value is returned

Error Handling Capabilities of Stable Storage

 Failures: If out of Xl and Xr, one fails, it can be read form other, but in case both fails X is not readable, and its probability is very small  Write Failure: During power outage, 1. While writing Xl, the Xr, will remain good and X can be read from Xr 2. After writing Xl, we can read X from Xl, as Xr may or may not have the correct copy of X

Thank You