Download STORAGE STRUCTURE-DBMS and more Study notes Database Management Systems (DBMS) in PDF only on Docsity!
21CSC205P-Database
Management Systems
UNIT-V
TOPICS
- Storage Structure
- Transaction control
- Concurrency control algorithms and Graph
- Issues in Concurrent execution
- Failures and Recovery algorithms
- Case Study: Demonstration of Entire project by applying all the concepts learned with minimum Front-End requirements, NoSQL Database, Document Oriented, Key Value pairs, Column Oriented
Physical Storage Media
Classification of Physical Storage Media
- Speed with which data can be accessed
- Cost per unit of data
- Reliability o data loss on power failure or system crash o physical failure of the storage device
- Can differentiate storage into: o Volatile storage: loses contents when power is switched off o Non-volatile storage: ▪ Contents persist even when power is switched off. ▪ Includes secondary and tertiary storage, as well as batter- backed up main-memory.
Storage Device Hierarchy
3. Flash Memory
- Data survives power failure
- Data can be written at a location only once, but location can be erased and written to again ▪ Can support only a limited number (10K – 1M) of write/erase cycles. ▪ Erasing of memory has to be done to an entire bank of memory
- Reads are roughly as fast as main memory
- But writes are slow (few microseconds), erase is slower
- Cost per unit of storage roughly similar to main memory
- Widely used in embedded devices such as digital cameras, phones, and USB keys
- Is a type of EEPROM (Electrically Erasable Programmable Read-Only Memory)
4. Magnetic-disk Storage
- Data is stored on spinning disk, and read/written magnetically
- Primary medium for the long-term storage of data; typically stores entire database.
- Data must be moved from disk to main memory for access, and written back for storage
- Much slower access than main memory
- direct-access – possible to read data on disk in any order, unlike magnetic tape
- Capacities range up to roughly 400 GB currently
- Much larger capacity and cost/byte than main memory/flash memory
- Growing constantly and rapidly with technology improvements (factor of 2 to 3 every 2 years)
- Survives power failures and system crashes
- disk failure can destroy data, but is rare
6. Tape Storage - Non-volatile, used primarily for backup (to recover from disk failure), and for archival data - Sequential-access – much slower than disk - Very high capacity (40 to 300 GB tapes available) - Tape can be removed from drive storage costs much cheaper than disk, but drives are expensive - Tape jukeboxes available for storing massive amounts of data - hundreds of terabytes (1 terabyte = 10 9 bytes) to even multiple petabytes (1 petabyte = 10 12 bytes)
Storage Hierarchy (Cont.)
- Primary Storage: Fastest media but volatile (cache, main memory).
- Secondary Storage: next level in hierarchy, non-volatile, moderately fast access time - also called on-line storage - E.g. flash memory, magnetic disks
- Tertiary Storage: lowest level in hierarchy, non-volatile, slow access time - also called off-line storage - E.g. magnetic tape, optical storage
Magnetic Disks Physical Characteristics of Disks
- Read-write head
- Positioned very close to the platter surface
- Reads or writes magnetically encoded information.
- Surface of platter divided into circular tracks
- Over 50K-100K tracks per platter on typical hard disks
- Each track is divided into sectors.
- A sector is the smallest unit of data that can be read or written.
- Sector size typically 512 bytes
- Typical sectors per track: 500 to 1000 (on inner tracks) to 1000 to 2000 (on outer tracks)
- To read/write a sector
- disk arm swings to position head on right track
- platter spins continually; data is read/written as sector passes under head
- Head-disk assemblies
- multiple disk platters on a single spindle (1 to 5 usually)
- one head per platter, mounted on a common arm.
- Cylinder i consists of i th^ track of all the platters
Magnetic Disks (Cont.)
- Earlier generation disks were susceptible to head-crashes
- Surface of earlier generation disks had metal-oxide coatings which would disintegrate on head crash and damage all data on disk
- Current generation disks are less susceptible to such disastrous failures, although individual sectors may get corrupted
- Disk controller – interfaces between the computer system and the disk drive hardware.
- accepts high-level commands to read or write a sector
- initiates actions such as moving the disk arm to the right track and actually reading or writing the data
- Computes and attaches checksums to each sector to verify that data is read back correctly - If data is corrupted, with very high probability stored checksum won’t match recomputed checksum
- Ensures successful writing by reading back sector after writing it
- Performs remapping of bad sectors
Disk Subsystem (cont.)
- Disks usually connected directly to computer system
- In Storage Area Networks (SAN) , a large number of disks are connected by a high-speed network to a number of servers
- In Network Attached Storage (NAS) networked storage provides a file system interface using networked file system protocol, instead of providing a disk system interface
Performance Measures of Disks
- Access time – the time it takes from when a read or write request is issued to when data transfer begins. - Seek time – time it takes to reposition the arm over the correct track. - Average seek time is 1/2 the worst case seek time. - Would be 1/3 if all tracks had the same number of sectors, and we ignore the time to start and stop arm movement - 4 to 10 milliseconds on typical disks - Rotational latency – time it takes for the sector to be accessed to appear under the head. - Average latency is 1/2 of the worst case latency. - 4 to 11 milliseconds on typical disks (5400 to 15000 r.p.m.)
- Data-transfer rate – the rate at which data can be retrieved from or stored to the disk.
- 25 to 100 MB per second max rate, lower for inner tracks
- Multiple disks may share a controller, so rate that controller can handle is also important
- E.g. SATA: 150 MB/sec, SATA-II 3Gb (300 MB/sec)
- Ultra 320 SCSI: 320 MB/s, SAS (3 to 6 Gb/sec)
- Fiber Channel (FC2Gb or 4Gb): 256 to 512 MB/s
- Mean time to failure (MTTF) – the average time the disk is expected to run continuously without any failure.
Redundant Array of Independent Disks (RAID)
- RAID is a technology that uses multiple physical disk drives to protect data from a single disk failure.
- The purpose of RAID is to ensure that at the time of failure, there should be one copy of data which should be available for immediate use.
- RAID levels define the use of disk arrays. RAID levels
- RAID 0
- RAID 1
- RAID 2
- RAID 3
- RAID 4
- RAID 5
- RAID 6
RAID 0
- RAID 0 consists of striping, but no mirroring or parity, but no redundancy of data. It offers the best performance, but no fault tolerance.
- In this level, a striped array of disks is implemented. The data is broken down into blocks and the blocks are distributed among disks.
- Block “1, 2” forms a stripe.
- Each disk receives a block of data to write/read in parallel.
- Reliability: there is no duplication of data. Hence, a block once lost cannot be recovered.