
Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
An introduction to dbms (database management systems), discussing its importance, evolution, and advantages. It covers essential terminologies, the relational model, and concurrency control, including schedules and types of schedules. The document also explains the concept of conflict serializability and its significance.
Typology: Study notes
1 / 1
This page cannot be seen from the preview
Don't miss anything!
Report An Issue If you are facing any issue on this page. Please let us know.
Sector-136, Noida, Uttar Pradesh - 201305
About Us Careers Privacy Policy Contact Us Terms of Service
Algorithms Data Structures Languages CS Subjects Video Tutorials
Courses Company-wise Topic-wise How to begin?
Write an Article Write Interview Experience Internships Videos
@geeksforgeeks , All rights reserved
Back To Course
Learn
Classroom
Theory
Quiz (^)
We have combined Classroom and Theory tab and created a new Learn tab for easy access. You can access Classroom and Theory from the left panel.
DBMS | Introduction, Evolution and Advantages
Important Terminologies Database: Database is a collection of inter-related data which helps in ef cient retrieval, insertion and deletion of data from database and organizes the data in the form of tables, views, schemas, reports etc. For Example, university database organizes the data about students, faculty, and admin staff etc. which helps in ef cient retrieval, insertion and deletion of data from it.
Database Management System: The software which is used to manage the database is called Database Management System (DBMS). For Example, MySQL, Oracle, etc. are popular commercial DBMS used in different applications. DBMS allows users the following tasks:
Data De nition: It helps in creation, modi cation, and removal of de nitions that de ne the organization of data in the database.
Data Updation: It helps in insertion, modi cation and deletion of the actual data in the database.
Data Retrieval: It helps in retrieval of data from the database which can be used by applications for various purposes.
User Administration: It helps in registering and monitoring users, enforcing data security, monitoring performance, maintaining data integrity, dealing with concurrency control and recovering information corrupted by unexpected failure.
The history of DBMS evolved in three primary phases:
Paradigm Shift from File System to DBMS A File Management system is a DBMS that allows access to single les or tables at a time. In a File System, data is directly stored in a set of les. It contains at les that have no relation to other les (when only one table is stored in a single le, then this le is known as at le). File System manages data using les in the hard disk. Users are allowed to create, delete, and update the les according to their requirements. Let us consider the example of a le-based University Management System. Data of students is available to their respective Departments, Academics Section, Result Section, Accounts Section, Hostel Of ce, etc. Some of the data is common for all sections like Roll No, Name, Father Name, Address and Phone number of students but some data is available to a particular section only like Hostel allotment number which is a part of hostel of ce. Let us discuss the issues with this system: Redundancy of data: Data is said to be redundant if same data is copied at many places. If a student wants to change a Phone number, he has to get it updated at various sections. Similarly, old records must be deleted from all sections representing that student. Inconsistency of Data: Data is said to be inconsistent if multiple copies of the same data do not match with each other. If a Phone number is different in the Accounts Section and Academics Section, it will be inconsistent. Inconsistency may be because of typing errors or not updating all copies of same data. Dif cult Data Access: A user should know the exact location of the le to access data, so the process is very cumbersome and tedious. If a user wants to search student hostel allotment number of a student from 10000 unsorted students’ records, how dif cult it can be. Unauthorized Access: File System may lead to unauthorized access to data. If a student gets access to le having his marks, he can change it in unauthorized way. No Concurrent Access: The access of same data by multiple users at same time is known as concurrency. File system does not allow concurrency as data can be accessed by only one user at a time. No Backup and Recovery: File system does not incorporate any backup and recovery of data if a le is lost or corrupted.
These are the main reasons which made a shift from le system to Relational DBMS.
Relational DBMS Relational database means the data is stored as well as retrieved in the form of relations (tables). The table below shows the relational database with only one relation called STUDENT which stores ROLL_NO, NAME, ADDRESS, PHONE and AGE of students.
STUDENT ROLL_NO NAME ADDRESS PHONE AGE 1 RAM DELHI 9455123451 18 2 RAMESH GURGAON 9652431543 18 3 SUJIT ROHTAK 9156253131 20 4 SURESH DELHI 9156768971 18 These are some important terminologies that are used in terms of relationships that we will learn later. The relational DBMS provides us with Structured Query Language or SQL which is a standard Database language that is used to create, maintain and retrieve the relational database. Following are some interesting facts about SQL.
SQL is case insensitive. But it is a recommended practice to use keywords (like SELECT, UPDATE, CREATE, etc) in capital letters and use user- de ned things (liked table name, column name, etc) in small letters.
We can write comments in SQL using “–” (double hyphen) at the beginning of any line.
SQL is the programming language for relational databases (explained below) like MySQL, Oracle, Sybase, SQL Server, Postgre, etc. Other non- relational databases (also called NoSQL) databases like MongoDB, DynamoDB, etc do not use SQL
Although there is an ISO standard for SQL, most of the implementations slightly vary in syntax. So we may encounter queries that work in SQL Server but do not work in MySQL.
NoSQL A NoSQL originally referring to non SQL or non-relational is a database that provides a mechanism for storage and retrieval of data. This data is modelled in means other than the tabular relations used in relational databases. NoSQL databases are used in real-time web applications and big data and their use are increasing over time. NoSQL systems are also sometimes called Not only SQL to emphasize the fact that they may support SQL-like query languages. A NoSQL database includes simplicity of design, simpler horizontal scaling to clusters of machines and ner control over availability. The data structures used by NoSQL databases are different from those used by default in relational databases which makes some operations faster in NoSQL. The suitability of a given NoSQL database depends on the problem it should solve. Data structures used by NoSQL databases are sometimes also viewed as more exible than relational database tables. Types of NoSQL databases and the name of the databases system that falls in that category are:
Advantages of DBMS DBMS helps in ef cient organization of data in database which has following advantages over typical le system. Minimized redundancy and data consistency: Data is normalized in DBMS to minimize the redundancy which helps in keeping data consistent. For Example, student information can be kept at one place in DBMS and accessed by different users. Simpli ed Data Access: A user need only name of the relation not exact location to access data, so the process is very simple. Multiple data views: Different views of same data can be created to cater the needs of different users. For Example, faculty salary information can be hidden from student view of data but shown in admin view. Data Security: Only authorized users are allowed to access the data in DBMS. Also, data can be encrypted by DBMS which makes it secure. Concurrent access to data: Data can be accessed concurrently by different users at same time in DBMS. Backup and Recovery mechanism: DBMS backup and recovery mechanism helps to avoid data loss and data inconsistency in case of catastrophic failures.
DBMS | Architectures
Two Level Architecture Two level architecture is similar to a basic client-server model. The application at the client end directly communicates with the database at the server side. API's like ODBC,JDBC are used for this interaction. The server side is responsible for providing query processing and transaction management functionalities. On the client side, the user interfaces and application programs are run. The application on the client side establishes a connection with the server side in order to communicate with the DBMS.
An advantage of this type is that maintenance and understanding are easier, compatible with existing systems. However, this model gives poor performance when there are a large number of users.
DBMS 3-tier Architecture DBMS 3-tier architecture divides the complete system into three inter-related but independent modules as shown in Figure below.
Presentation or User Tier: This tier is presented before the user of who knows nothing regarding the other existing complex databases underneath. This tier can provide multiple views of the databases, generated by the application residing in the application tier.
Application Tier: This is the middle lined tier between the Database and the Presentation tier and acts as a connection between the end-user and the database. This tier holds the programs and the application server along with the programs that could access the database. This is the last layer that the users are made aware of. It provides a complete abstraction to the database layer.
Database Tier: This tier holds the primary database along with all the data and the query languages for processing. The relations of data with there constraints are also de ned in this level.
Advantages: Enhanced scalability due to distributed deployment of application servers. Now,individual connections need not be made between client and server. Data Integrity is maintained. Since there is a middle layer between client and server, data corruption can be avoided/removed. Security is improved. This type of model prevents direct interaction of the client with the server thereby reducing access to unauthorized data. Disadvantages: Increased complexity of implementation and communication. It becomes dif cult for this sort of interaction to take place due to presence of middle layers.
Data Independence Data independence means a change of data at one level should not affect another level. Two types of data independence are present in this architecture:
Physical Data Independence: Any change in the physical location of tables and indexes should not affect the conceptual level or external view of data. This data independence is easy to achieve and implemented by most of the DBMS.
Conceptual Data Independence: The data at conceptual level schema and external level schema must be independent. This means, change in conceptual schema should not affect external schema. e.g.; Adding or deleting attributes of a table should not affect the user’s view of table. But this type of independence is dif cult to achieve as compared to physical data independence because the changes in conceptual schema are re ected in user’s view. Phases of database design Database designing for a real world application starts from capturing the requirements to physical implementation using DBMS software which consists of following steps shown in Figure.
Conceptual Design: The requirements of database are captured using high level conceptual data model. For Example, ER model is used for conceptual design of database.
Logical Design: Logical Design represents data in the form of relational model. ER diagram produced in the conceptual design phase is used to convert the data into the Relational Model.
Physical Design: In physical design, data in relational model is implemented using commercial DBMS like Oracle, DB2.
DBMS | ER - Model
ER Model is used to model the logical view of the system from the data perspective which consists of these components:
Entity, Entity Type, Entity Set
An Entity may be an object with a physical existence - a particular person, car, house, or employee - or it may be an object with a conceptual existence - a company, a job, or a university course.
An Entity is an object of Entity Type and set of all entities is called as Entity Set. e.g.; E1 is an entity having Entity Type Student and set of all students is called Entity Set. In ER diagram, Entity Type is represented as:
Attribute(s) Attributes are the properties which de ne the entity type. For example, Roll_No, Name, DOB, Age, Address, Mobile_No are the attributes which de nes entity type Student. In ER diagram, attribute is represented by an oval.
The complete entity type Student with its attributes can be represented as:
Relationship Type and Relationship Set A relationship type represents the association between entity types. For example,‘Enrolled in’ is a relationship type that exists between entity type Student and Course. In ER diagram, relationship type is represented by a diamond and connecting the entities with lines.
A set of relationships of the same type is known as a relationship set. The following relationship set depicts S1 is enrolled in C2, S2 is enrolled in C1 and S3 is enrolled in C3.
Degree of a relationship set: The number of different entity sets participating in a relationship set is called as degree of a relationship set.
Example 2: A person is married to a person.
Example 3: An employee manages an employee.
Example 1: A student attends a course.
Example 2: A supplier supplies item.
Example 3: A Profssor teaches subject.
Cardinality The number of times an entity of an entity set participates in a relationship set is known as cardinality. Cardinality can be of different types:
Example 2: A person can have only one Aadhar card and one Aadhar card can belong to only one person.
Example 3: Let us assume that a male can marry to one female and a female can marry to one male. So the relationship will be one to one.
Using Sets, it can be represented as:
Example 2: Many employees working for one department.
Example 3: Let us assume that a student can take only one course but one course can be taken by many students. So the cardinality will be n to
Using Sets, it can be represented as:
In this case, each student is taking only 1 course but 1 course has been taken by many students.
Example 2: Any number of customer can order any number of products.
Example 3: Let us assume that a student can take more than one course and one course can be taken by many students. So the relationship will be many to many.
Using sets, it can be represented as:
In this example, student S1 is enrolled in C1 and C3 and Course C3 is enrolled by S1, S3, and S4. So it is many to many relationships.
Participation Constraint: Participation Constraint is applied on the entity participating in the relationship set.
Example 1: If each student must attend a course, the participation of student will be total. Total participation is shown by double line in ER diagram.
Example 2: Each employee must join a department.
The diagram depicts the ‘Enrolled in’ relationship set with Student Entity set having total participation and Course Entity set having partial participation.
Using set, it can be represented as,
Every student in Student Entity set is participating in a relationship but there exists a course C4 which is not taking part in the relationship.
Weak Entity Type and Identifying Relationship As discussed before, an entity type has a key attribute that uniquely identi es each entity in the entity set. But there exists some entity type for which key attribute can’t be de ned. These are called the Weak Entity type. A weak entity type is represented by a double rectangle. The participation of a weak entity type is always total. The relationship between a weak entity type and its identifying strong entity type is called an identifying relationship and it is represented by a double diamond.
Example 1: a school might have multiple classes and each class might have multiple sections. The section cannot be identi ed uniquely and hence they do not have any primary key. Though a class can be identi ed uniquely and the combination of a class and section is required to identify each section uniquely. Therefore the section is a weak entity and it shares total participation with the class.
Example 2: A company may store the information of dependants (Parents, Children, Spouse) of an Employee. But the dependents don’t have existed without the employee. So Dependent will be a weak entity type and the Employee will be Identifying Entity type for Dependant.
Example 3: In case of multiple hosts, the login id cannot become primary key as it is dependent upon the hosts for its identity.
DBMS | Relational Model
Relational Model was proposed by E.F. Codd to model data in the form of relations or tables. After designing the conceptual model of Database using the ER diagram, we need to convert the conceptual model in the relational model which can be implemented using any RDBMS languages like Oracle SQL, MySQL, etc. So we will see what the Relational Model is.
What is Relational Model? Relational Model represents how data is stored in Relational Databases. A relational database stores data in the form of relations (tables). Consider a relation STUDENT with attributes ROLL_NO, NAME, ADDRESS, PHONE and AGE shown in Table 1.
STUDENT ROLL_NO NAME ADDRESS PHONE AGE 1 RAM DELHI 9455123451 18 2 RAMESH GURGAON 9652431543 18 3 SUJIT ROHTAK 9156253131 20 4 SURESH DELHI 18
Domain Constraints: These are attribute level constraints. An attribute can only take values which lie inside the domain range. e.g; If a constrains AGE>0 is applied on STUDENT relation, inserting a negative value of AGE will result in failure.
Key Integrity: Every relation in the database should have at least one set of attributes that de nes a tuple uniquely. Those set of attributes is called key. e.g.; ROLL_NO in STUDENT is a key. No two students can have the same roll number. So a key has two properties: It should be unique for all tuples. It can’t have NULL values. Referential Integrity: When one attribute of a relation can only take values from other attribute of same relation or any other relation, it is called referential integrity. Let us suppose we have 2 relations
STUDENT ROLL_NO NAME ADDRESS PHONE AGE BRANCH_CODE 1 RAM DELHI 9455123451 18 CS 2 RAMESH GURGAON 9652431543 18 CS 3 SUJIT ROHTAK 9156253131 20 ECE 4 SURESH DELHI 18 IT BRANCH BRANCH_CODE BRANCH_NAME CS COMPUTER SCIENCE IT INFORMATION TECHNOLOGY ECE ELECTRONICS AND COMMUNICATIONENGINEERING CV CIVIL ENGINEERING
BRANCH_CODE of STUDENT can only take the values which are present in BRANCH_CODE of BRANCH which is called referential integrity constraint. The relation which is referencing to other relation is called REFERENCING RELATION (STUDENT in this case) and the relation to which other relations refer is called REFERENCED RELATION (BRANCH in this case). ANOMALIES An anomaly is an irregularity or something which deviates from the expected or normal state. When designing databases, we identify three types of anomalies: Insert, Update and Delete.
Insertion Anomaly in Referencing Relation: We can’t insert a row in REFERENCING RELATION if the referencing attribute’s value is not present in the referenced attribute value. e.g.; Insertion of a student with BRANCH_CODE ‘ME’ in STUDENT relation will result in an error because ‘ME’ is not present in BRANCH_CODE of BRANCH.
Deletion/ Updation Anomaly in Referenced Relation: We can’t delete or update a row from REFERENCED RELATION if the value of REFERENCED ATTRIBUTE is used in the value of REFERENCING ATTRIBUTE. e.g; if we try to delete tuple from BRANCH having BRANCH_CODE ‘CS’, it will result in error because ‘CS’ is referenced by BRANCH_CODE of STUDENT, but if we try to delete the row from BRANCH with BRANCH_CODE CV, it will be deleted as the value is not been used by referencing relation. It can be handled by the following method:
ON DELETE CASCADE: It will delete the tuples from REFERENCING RELATION if a value used by REFERENCING ATTRIBUTE is deleted from REFERENCED RELATION. e.g; if we delete a row from BRANCH with BRANCH_CODE ‘CS’, the rows in STUDENT relation with BRANCH_CODE CS (ROLL_NO 1 and 2 in this case) will be deleted.
ON UPDATE CASCADE: It will update the REFERENCING ATTRIBUTE in REFERENCING RELATION if attribute value used by REFERENCING ATTRIBUTE is updated in REFERENCED RELATION. Ex: if we update a row from BRANCH with BRANCH_CODE ‘CS’ to ‘CSE’, the rows in STUDENT relation with BRANCH_CODE CS (ROLL_NO 1 and 2 in this case) will be updated with BRANCH_CODE ‘CSE’.
SUPER KEYS: Any set of attributes that allows us to identify unique rows (tuples) in a given relationship are known as superkeys. Out of these super keys, we can always a proper subset that can be used as a primary key. Such keys are known as Candidate keys. If there is a combination of two or more attributes that are being used as the primary key then we call it as a Composite key.
DBMS | Keys in Relational Model
Below is a Student relation to explain various types of keys.
Candidate Key: The minimal set of attribute which can uniquely identify a tuple is known as candidate key. Example: Enroll No, Aadhaar No and PAN No. The value of Candidate Key is unique and non-null for every tuple. There can be more than one candidate key in a relation. For Example, Aadhaar No and PAN No both are candidate keys for relation Student.
Note - In Sql Server a unique constraint that has a nullable column, allows the value 'null' in that column only once.
Primary Key: There can be more than one candidate key in relation out of which one and the most suitable must be chosen as the primary key. For Example, Aadhaar No, as well as PAN No both, are candidate keys for relation Student but Enroll No seems to be most appropriate to be chosen as the primary key(only one out of many candidate keys).
Alternate Key: The candidate key other than the primary key is called an alternate key. For example, Enroll No, Aadhaar No and PAN No are candidate keys for relation Student but except Enroll No, all others will be alternate key.
Super Key: The set of attributes that can uniquely identify a tuple is known as Super Key. For Example, (Enrollment No, Student Name) etc. Adding zero or more attributes to candidate key generates super key. A candidate key is a superkey but vice versa is not true.
Before proceeding further let's understand the rules of Armstrong Axioms, which we will use to verify whether an attribute is a candidate key or not:
Let's understand the concepts of keys using a few examples: Question 1: Let R1 be a relation with attributes A, B, C, and D. Find the candidate key in the following derivations: A -> B, A -> C, C -> D Solution: We know that A derives B, A derives C and C derive D. Since A derives C and C derives D, therefore A also derives D. No let's check whether A is a candidate key or not. To do this we need to nd out the closure for A. This set comprises of the attributes that A derives directly or indirectly using the rules of Armstrong Axioms. A -> A [Axiom of Re exivity] A -> D [Axiom of Transitivity] A = [ABCD] A contains all the attributes of the table, it is a candidate key.
Question 2: Let R2 be a relation with attributes A, B, C, and D. Find the candidate key in the following derivations: AB -> CD Solution: As AB -> AB, using axiom of Re exivity and AB -> CD, so AB = [ABCD]. Therefore, AB is a candidate key.
Question 3: Let R3 be a relation with attributes A, B, C, and D. Find the candidate key in the following derivations: B -> AC, C -> D Solution: B -> ABC, using axiom of Re exivity and B -> D using Axiom of Transitivity , so B = [ABCD]. Therefore, B is a candidate key.
To generate super keys for a relation, we need to nd all the subsets for a relation. For example, for R1(A, B, C, D) the possible subsets can raise from {A}, {B}, {C}, {D}, {AB} till {ABCD}. Then applying the Armstrong axioms for every set, we can nd out the number of super keys.
Foreign Key: This is a eld in a table which uniquely identi es each row of another table. This eld points to the primary key of another table. This usually creates a kind of link between the tables. The key basically states the relationship of one attribute(Column) of one table to the corresponding column on the other table as if one is a parent and other is the child. PARENT is that table which holds the primary key and CHILD is a table which holds the referential key. Hence every entry in the PRIMARY table is unique whereas the same entry might be repeating in the child table. But that same has to be present in the PARENT table (Primary KEY). Let's look at the table below:
Here The Order Table has Custome_ID as the foreign key which establishes a relation between the Order and Customer Table. The Custome_ID is the primary key for the Customer Table. This makes the work of access very easy and reliable. Suppose one needs to nd the details about the customer who ordered an item of Rs. 2000 on 15-06-2019. Then we just need to get the Customer_ID from the Order Table and then look up the Customer Table to get all the required details.
One way of creating a foreign key using SQL is: CREATE TABLE Order(
Order_id INTEGER Primary Key,
Order_Date DATE,
Cost INTEGER,
Customer_ID REFERENCES Customer(Customer_ID)
)
This method uses REFERENCES keyword to create a link between the tables. The Customer_ID is the primary key of the Customer Table and this can contain multiple values as the primary key. The mentioned key can be any unique key too.
Let's understand the concept of Referential Integrity. Referential integrity is a relational database concept, which states that table relationships must always be consistent. In other words, any foreign key eld must agree with the primary key that is referenced by the foreign key. Thus, any primary key eld changes must be applied to all foreign keys, or not at all. The same restriction also applies to foreign keys in that any updates (but not necessarily deletions) must be propagated to the primary parent key.
For example, suppose Table B has a foreign key that points to a eld in Table A. Referential integrity would prevent you from adding a record to Table B that cannot be linked to Table A. In addition, the referential integrity rules might also specify that whenever you delete a record from Table A, any records in Table B that are linked to the deleted record will also be deleted. This is called cascading delete. Finally, the referential integrity rules could specify that whenever you modify the value of a linked eld in Table A, all records in Table B that are linked to it will also be modi ed accordingly. This is called a cascading update.
Database Normalization
Objectives of Good Database Design:
Anomalies: There are different types of anomalies which can occur in referencing and referenced relation:
In the above table, if someone tries to rename the Subject_Name Database Management System to DBMS and while doing so, the system crashes in between, then it would lead to inconsistency of the table. Similarly imposing a bad query can also lead to redundancy issue.
Database Normalization | Functional Dependencies
Database normalization is the process of organizing the attributes of the database to reduce or eliminate data redundancy (having the same data but at different places).
Problems because of data redundancy: Data redundancy unnecessarily increases the size of the database as the same data is repeated in many places. Inconsistency problems also arise during insert, delete and update operations.
Functional Dependency Functional Dependency is a constraint between two sets of attributes in a relation from a database. A functional dependency is denoted by an arrow (→). If an attribute A functionally determines B or B is functionally dependent on A, then it is written as A → B. For example, employee_id → name means employee_id functionally determines the name of the employee. As another example in a time table database, {student_id, time} → {lecture_room}, student ID and time determine the lecture room where the student should be. Let's look at the tables below:
In this table we can say that: Enroll_No → Name Enroll_No → Address Enroll_No → Phone No
In this table we can say that: (Subject_Id, Student_Id) → Marks (Subject_Id, Student_Id) → Marks, Grades
Examples: For the following table, which of the following functional dependencies hold True.
Solution: The rst condition holds True as for every value of A there is a unique value in B. The second dependency results in False, as there is no unique identi cation for every value of B in A.
What does functionally dependent mean? A function dependency A → B means for all instances of a particular value of A, there is the same value of B. Let's do an exercise on Functional Dependency: For the following table de ne which of the dependency is True or False
Solution:
For example in the below table A → B is true, but B → A is not true as there are different values of A for B = 3. A B
1 3 2 3 4 0 1 3 4 0
Why do we study Functional Dependency? We study functional dependency so that we can carve out a few attributes to another table so that the data redundancy can be reduced to a minimum. Let's look at the following table:
In this table let's nd out the functional dependency between two attributes. As we can see that Dept_Id has each unique identi cation in Dept_Name, so we can say that Dept_Id → Dept_Name Therefore we can carve out this two table and create another one out of this.
Trivial Functional Dependency: X → Y is trivial only when Y is subset of X. Examples: ABC → AB ABC → A ABC → ABC
Non Trivial Functional Dependencies X → Y is a non trivial functional dependencies when Y is not a subset of X.
X → Y is called completely non-trivial when X intersect Y is NULL. Examples: Id → Name, Name → DOB
Semi Non Trivial Functional Dependencies X → Y is called semi non-trivial when X intersect Y is not NULL. Examples: AB → BC, AD → DC
DBMS | Normal Forms
Normalization is the process of minimizing redundancy from a relation or set of relations. Redundancy in relation may cause insertion, deletion and updating anomalies. So, it helps to minimize the redundancy in relations. Normal forms are used to eliminate or reduce redundancy in database tables.
Let's understand the concept of 1NF using this exercise:
As we can see that the above table contains multi-valued attributes, which violates the principle of rst normal form and hence must be reduced. There are various methods of doing this:
Method 1: This involves the creation of multiple columns, distributing the values alongside the new attributes. We can distribute the multivalued attributes to new columns by naming them as Mob No 1, Mob No 2, Mob No 3 etc, depending on the number of multivalued attributes till the table gets reduced to single-valued attribute.
But this method got few problems like various elds are needed to be left blank, resulting in wastage of space.
Method 2: This method involves the creation of multiple instead of columns, with copying up of the non-repeated attribute values for each repeated attribute values.
In this table we have made a copy of the values of the repeated attributes, keeping other attributes the same. This saves our space but introduces another problem of repetition of Customer_ID, which must remain unique.
Method 3: This method involves the creation of another table and shifting the repeated attributes to that table and linking it with the previous table by using any type of ID.
Example 1 - Relation STUDENT in table 1 is not in 1NF because of multi-valued attribute STUD_PHONE. Its decomposition into 1NF has been shown in table 2.
Example 2 - ID Name Courses
1 A c1, c 2 E c 3 M C2, c
In the above table, Course is a multi-valued attribute so it is not in 1NF.
Below Table is in 1NF as there is no multi valued attribute ID Name Course
1 A c 1 A c 2 E c 3 M c 3 M c
In the above table, the candidate key is the combination of (User_ID, Course_ID). The non-prime attribute is Course_Fee. Since this non-prime attribute depends partially on the candidate key, this table is not in 2NF. To convert this into 2NF, one needs to split the table into two and creating a common link between two. As we have done with the above table after splitting them into two and keeping Course_ID as the common key.
Example:
Partial Dependency - If the proper subset of candidate key determines non-prime attribute, it is called partial dependency.
Example 1 - In relation STUDENT_COURSE given in Table 3, FD set: {COURSE_NO->COURSE_NAME} Candidate Key: {STUD_NO, COURSE_NO}
In FD COURSE_NO->COURSE_NAME, COURSE_NO (proper subset of candidate key) is determining COURSE_NAME (non-prime attribute). Hence, it is partial dependency and relation is not in second normal form. To convert it to second normal form, we will decompose the relation STUDENT_COURSE (STUD_NO, COURSE_NO, COURSE_NAME) as : STUDENT_COURSE (STUD_NO, COURSE_NO) COURSE (COURSE_NO, COURSE_NAME)
Note - This decomposition will be lossless join decomposition as well as dependency preserving.
Example 2 - Consider following functional dependencies in relation R (A, B , C, D ) AB -> C [A and B together determine C]
BC -> D [B and C together determine D]
In the above relation, AB is the only candidate key and there is no partial dependency, i.e., any proper subset of AB doesn't determine any non- prime attribute.
Let's understand these rules using the following table:
Checking the rst condition of 3NF: The candidate key for the table is Stud_No. It has the following dependency on other attributes: Stud_No → [Stud_Name, Stud_State, State_Country] Student_State → Stud_Country In both the above cases, we see that there is only one candidate key and it has no other parts, and also there are no non-prime attributes. Hench the condition of 2NF holds.
Checking the second condition of 3NF: As we can see that Student_State → Stud_Country occurs which means that one non-prime attribute is de ning another non-prime attribute, it violates the second rule of 3NF.
This problem can be xed by splitting the table into two. T1: Stud_No, Stud_Name, Stud_State T2: Stud_State, Stud_Country As in this case, no non-prime attribute is de ning another non-prime attribute, hence the condition of 3NF holds.
Let's see another table and try to understand the concept of 3NF:
Let's rst nd out the dependencies: (Exam_Name, Exam_Year) → (Topper_Name, Topper_DOB) Topper_Name → Topper_DOB
We can very well see that since there are no repetitions in the columns, 1NF holds. Also, 2NF holds, as no non-prime attribute is partially dependent on the candidate key. But the second dependency violates the 3NF.
To solve this we need to split the table into two as we did in an earlier case: T1: Exam_Name, Exam_Year, Topper_Name T2: Topper_Name, Topper_DOB
Let's look at a few other de nitions of 3NF: Def 2: This also de nes two sets of rules which are mostly similar to the Def 1.
Def 3:It says, for every X → A, one of the following should be true:
Example:
Transitive dependency - If A->B and B->C are two FDs then A->C is called transitive dependency.
Example 1 - In relation STUDENT given in Table 4,
FD set: {STUD_NO -> STUD_NAME, STUD_NO -> STUD_STATE, STUD_STATE -> STUD_COUNTRY, STUD_NO -> STUD_AGE, STUD_STATE -> STUD_COUNTRY} Candidate Key: {STUD_NO}
For this relation in table 4, STUD_NO -> STUD_STATE and STUD_STATE -> STUD_COUNTRY are true. So STUD_COUNTRY is transitively dependent on STUD_NO. It violates the third normal form. To convert it in third normal form, we will decompose the relation STUDENT (STUD_NO, STUD_NAME, STUD_PHONE, STUD_STATE, STUD_COUNTRY_STUD_AGE) as: STUDENT (STUD_NO, STUD_NAME, STUD_PHONE, STUD_STATE, STUD_AGE) STATE_COUNTRY (STATE, COUNTRY)
Example 2 - Consider relation R(A, B, C, D, E) A -> BC, CD -> E, B -> D, E -> A All possible candidate keys in above relation are {A, E, CD, BC} All attribute are on right sides of all functional dependencies are prime.
Functional Dependencies: (Stud_ID, Subject) → Prof_id Prof_id → Subject
Candidate Keys: (Stud_ID, Subject), Prof_id, (Prof_id, Stud_ID)
The second condition in Functional Dependencies does not follow the rule of BCNF. Let's try to analyse whether the relation satis es all type of Normal Forms. 1NF: Since there are no repeated attributes, this holds true. 2NF: Since there are no non-prime attributes, this also holds true. 3NF: Since there are no non-prime attributes, this also holds true. BCNF: In the second dependency, Prof_id being a prime attribute de nes another prime attribute Subject. Hence the condition fails to satisfy BCNF.
Now let's modify the table by adding another row to it:
Then break the table into a BCNF compliance database.
Although this decomposition satis es with the conditions of BCNF, but the DBMS might not accept this split because while splitting the table the rst Functional Dependency i.e., (Subject_ID, Subject) → Prof_ID does not exist anymore.
Example 1 - Find the highest normal form of a relation R(A,B,C,D,E) with FD set as - BC->D, AC->BE, B->E
Step 1. As we can see, (AC)+ ={A,C,B,E,D} but none of its subset can determine all attribute of relation, So AC will be candidate key. A or C can’t be derived from any other attribute of the relation, so there will be only 1 candidate key {AC}. Step 2. The prime attributes are those attribute which are part of candidate key {A,C} in this example and others will be non-prime {B,D,E} in this example. Step 3. The relation R is in 1st normal form as a relational DBMS does not allow multi-valued or composite attribute. Step 4. The relation is in 2nd normal form because BC->D is in 2nd normal form (BC is not a proper subset of candidate key AC) and AC->BE is in 2nd normal form (AC is candidate key) and B->E is in 2nd normal form (B is not a proper subset of candidate key AC). Step 5.The relation is not in 3rd normal form because in BC->D (neither BC is a superkey nor D is a prime attribute) and in B->E (neither B is a superkey nor E is a prime attribute) but to satisfy 3rd normal for, either LHS of an FD should be super key or RHS should be prime attribute. So the highest normal form of relation will be 2nd Normal form.
Key Points -
Exercise 1: Find the highest normal form in R (A, B, C, D, E) under following functional dependencies. ABC --> D CD --> AE
Important Points for solving above type of question.
Candidate keys in a given relation are {ABC, BCD}
BCNF: ABC -> D is in BCNF. Let us check CD -> AE, the CD is not a superkey so this dependency is not in BCNF. So, R is not in BCNF.
3NF: ABC -> D we don't need to check for this dependency as it already satis ed BCNF. Let us consider a CD -> AE. Since E is not a prime attribute, so the relation is not in 3NF.
2NF: In 2NF, we need to check for partial dependency. CD which is a proper subset of a candidate key and it determine E, which is a non prime attribute. So, the given relation is also not in 2 NF. So, the highest normal form is 1 NF.
Indexing in Databases
Indexing is a way to optimize the performance of a database by minimizing the number of disk accesses required when a query is processed. It is a data structure technique which is used to quickly locate and access the data in a database.
Indexes are created using a few database columns. The rst column is the Search key that contains a copy of the primary key or candidate key of the table. These values are stored in sorted order so that the corresponding data can be accessed quickly. Note: The data may or may not be stored in sorted order. The second column is the Data Reference or Pointer which contains a set of pointers holding the address of the disk block where that particular key value can be found.
The indexing has various attributes: Access Types: This refers to the type of access such as value based search, range access, etc. Access Time: It refers to the time needed to nd particular data element or set of elements. Insertion Time: It refers to the time taken to nd the appropriate space and insert a new data. Deletion Time: Time taken to nd an item and delete it as well as update the index structure. Space Overhead: It refers to the additional space required by the index. Let's look at an example:
The DBMS uses hard disk to store all the records in the above databse. As we know that the access time of the hard disk is very slow, searching for anything in such huge databases could cost performance issue. Moreover, searchin for a repeated item in the database could lead to a greater consumption of time as this will require searching for all the items in every block. Suppose there are 100 rows in each block, so when a customer id is searched for in the database, will take too much of time. The hard disk does not store the data in a particular order. One solution to this problem is to arrange the indexes in a database in sorted order so that any looked up item can be found easily using Binary Search. This creation of orders to store the indexes is called clustered indexing.
This sort of indexing is called as the clustered indexing and this can be applied to any attribute. It is not necessary that the focus of order should be any key. When the primary key is used to order the data in a heap, it is called as Primary Indexing.
Although we have solved one problem of time cost, another problem arises when there are multiple values that are needed to be fetched using an id. The clustered indexing allows the physical arrangement of the data only on one key, therefore multiple fetching will raise a problem. To solve this we have another method of indexing called the non-clustered indexing or secondary indexing.
To make the understanding clearer, let's take an example of a book, where the index or the content page in which all the chapter numbers are mentioned are organised in a clustered index manner. The dictionary sort of thing mentioned at the end of the book, that provides a reference to the words in various chapters along with the page numbers are arranged in a non-clustered manner.
In general, there are two types of le organization mechanism which are followed by the indexing methods to store the data:
Dense Index: For every search key value in the data le, there is an index record. This record contains the search key and also a reference to the rst data record with that search key value.
Sparse Index: The index record appears only for a few items in the data le. Each item points to a block as shown. To locate a record, we nd the index record with the largest search key value less than or equal to the search key value we are looking for. We start at that record pointed to by the index record, and proceed along with the pointers in the le (that is, sequentially) until we nd the desired record.
There are primarily two methods of indexing: Clustered Indexing Non-Clustered or Secondary Indexing Multilevel Indexing
Let's look at this online retailing database, where each Order_id is stored along with Order_date, Cost and Customer_ID.
Working is easy to understand. The DBMS maintains an extra index le in clustered indexing. It has entries in the form of Search_key and Pointer, which stores the reference to disk block containing the given key. Here we have assumed that the Order_id is our Primary key on which indexing is done. So the Seach_key consists of the Order_id. The index le can store the data in either the Order Index File or Hashing Index File as shown earlier. The order indexing can be done in either Dense indexing format or sparse indexing format as we have learnt earlier. Dense Indexing saves the time of search and Sparse Index saves index le size. Let's look at another type where Clustered indexing is done on Customer_ID instead of Order_id.
Although there can be multiple kleys in CUstomer_ID and Disk Blocks, but there is only one entry in the Index File.
For example, students studying in each semester are grouped together. i.e. 1 Semester students, 2 semester students, 3 semester students etc are grouped.
Clustered index sorted according to rst name (Search key)
Primary Indexing: This is a type of Clustered Indexing wherein the data is sorted according to the search key and the primary key of the database table is used to create the index. It is a default format of indexing where it induces sequential le organization. As primary keys are unique and are stored in a sorted manner, the performance of the searching operation is quite ef cient.
Here we did the indexing according to the Customer_ID. In this, the index le has multiple blocks to point to the multiple Customer_ID. The non-clustered index can only be arranged in a dense manner and not in a sparse manner. We have used an ordered form of indexing to understand this example. A secondary index is created only on the most frequently searched attribute.
DBMS | B+ Tree Introduction
In order, to implement dynamic multilevel indexing, the B+ tree is generally employed. The drawback of B-tree used for indexing, however, is that it stores the data pointer (a pointer to the disk le block containing the key value), corresponding to a particular key value, along with that key value in the node of a B-tree. This technique, greatly reduces the number of entries that can be packed into a node of a B-tree, thereby contributing to the increase in the number of levels in the B-tree, hence increasing the search time of a record.
B+ tree eliminates the above drawback by storing data pointers only at the leaf nodes of the tree. Thus, the structure of leaf nodes of a B+ tree is quite different from the structure of internal nodes of the B+ tree. It may be noted here that, since data pointers are present only at the leaf nodes, the leaf nodes must necessarily store all the key values along with their corresponding data pointers to the disk le block, in order to access them. Moreover, the leaf nodes are linked to providing ordered access to the records. The leaf nodes, therefore form the rst level of the index, with the internal nodes forming the other levels of a multilevel index. Some of the key values of the leaf nodes also appear in the internal nodes, to simply act as a medium to control the searching of a record.
From the above discussion it is apparent that a B+ tree, unlike a B-tree has two orders, 'a' and 'b', one for the internal nodes and the other for the external (or leaf) nodes.
The structure of the internal nodes of a B+ tree of order 'a' is as follows:
Diagram-I
The structure of the leaf nodes of a B+ tree of order 'b' is as follows:
Diagram-II
Using the P pointer it is viable to traverse all the leaf nodes, just like a linked list, thereby achieving ordered access to the records stored in the disk.
A Diagram of B+ Tree -
Advantage - A B+ tree with 'l' levels can store more entries in its internal nodes compared to a B-tree having the same 'l' levels. This accentuates the signi cant improvement made to the search time for any given key. Having lesser levels and presence of P pointers imply that the B+ tree is very quick and ef cient in accessing records from disks.
DBMS | Concurrency Control - Introduction and ACID Properties
Concurrency Control deals with interleaved execution of more than one transaction. In the next article, we will see what is serializability and how to nd whether a schedule is serializable or not. What is Transaction? A set of logically related operations is known as transaction. The main operations of a transaction are: Read(A): Read operations Read(A) or R(A) reads the value of A from the database and stores it in a buffer in main memory. Write (A): Write operation Write(A) or W(A) writes the value back to the database from buffer.
Let us take a debit transaction from an account which consists of following operations:
Assume A’s value before starting of transaction is 5000. The rst operation reads the value of A from database and stores it in a buffer. Second operation will decrease its value by 1000. So buffer will contain 4000. Third operation will write the value from buffer to database. So A’s nal value will be 4000.
But it may also be possible that transaction may fail after executing some of its operations. The failure can be because of hardware, software or power etc. For example, if debit transaction discussed above fails after executing operation 2, the value of A will remain 5000 in the database which is not acceptable by the bank. To avoid this, Database has two important operations: Commit: After all instructions of a transaction are successfully executed, the changes made by transaction are made permanent in the database. Rollback: If a transaction is not able to execute all operations successfully, all the changes made by transaction are undone.
Properties of a transaction Atomicity: By this, we mean that either the entire transaction takes place at once or doesn't happen at all. There is no midway i.e. transactions do not occur partially. Each transaction is considered as one unit and either runs to completion or is not executed at all. It involves the following two operations. --Abort: If a transaction aborts, changes made to database are not visible. --Commit: If a transaction commits, changes made are visible. Atomicity is also known as the 'All or nothing rule'. Consider the following transaction T consisting of T1 and T2: Transfer of 100 from account X to account Y.
If the transaction fails after completion of T1 but before completion of T2.( say, after write(X) but before write(Y)), then amount has been deducted from X but not added to Y. This results in an inconsistent database state. Therefore, the transaction must be executed in entirety in order to ensure correctness of database state. Consistency: This means that integrity constraints must be maintained so that the database is consistent before and after the transaction. It refers to the correctness of a database. Referring to the example above, The total amount before and after the transaction must be maintained. Total before T occurs = 500 + 200 = 700. Total after T occurs = 400 + 300 = 700. Therefore, database is consistent. Inconsistency occurs in case T1 completes but T2 fails. As a result, T is incomplete. Isolation: This property ensures that multiple transactions can occur concurrently without leading to the inconsistency of the database state. Transactions occur independently without interference. Changes occurring in a particular transaction will not be visible to any other transaction until that particular change in that transaction is written to memory or has been committed. This property ensures that the execution of transactions concurrently will result in a state that is equivalent to a state achieved these were executed serially in some order. Let X= 500, Y = 500. Consider two transactions T and T''.
Suppose T has been executed till Read (Y) and then T’’ starts. As a result , interleaving of operations takes place due to which T’’ reads correct value of X but incorrect value of Y and sum computed by T’’: (X+Y = 50, 000+500=50, 500) is thus not consistent with the sum at end of transaction: T: (X+Y = 50, 000 + 450 = 50, 450). This results in database inconsistency, due to a loss of 50 units. Hence, transactions must take place in isolation and changes should be visible only after they have been made to the main memory. Durability: This property ensures that once the transaction has completed execution, the updates and modi cations to the database are stored in and written to disk and they persist even if a system failure occurs. These updates now become permanent and are stored in non-volatile memory. The effects of the transaction, thus, are never lost.
The ACID properties, in totality, provide a mechanism to ensure correctness and consistency of a database in a way such that each transaction is a group of operations that acts a single unit, produces consistent results, acts in isolation from other operations and updates that it makes are durably stored.
DBMS | Concurrency Control | Schedule and Types of Schedules
What is a Schedule? A schedule is a series of operations from one or more transactions. Schedules are used to resolve con icts between the transactions.
Schedule, as the name suggests, is a process of lining the transactions and executing them one by one. When there are multiple transactions that are running in a concurrent manner and the order of operation is needed to be set so that the operations do not overlap each other, Scheduling is brought into play and the transactions are timed accordingly.
Types of Schedules? Lets discuss various types of schedules.
Example: Consider the following schedule involving two transactions T and T.
T T R(A) W(A) R(B) W(B) R(A) R(B)
where R(A) denotes that a read operation is performed on some data item 'A' This is a serial schedule since the transactions perform serially in the order T —> T
The Non-Serial Schedule can be divided further into Serializable and Non-Serializable.
a. Serializable: This is used to maintain the consistency of the database. It is mainly used in the Non-Serial scheduling to verify whether the scheduling will lead to any inconsistency or not. On the other hand, a serial schedule does not need the serializability because it follows a transaction only when the previous transaction is complete. The non-serial schedule is said to be in a serializable schedule only when it is equivalent to the serial schedules, for an n number of transactions. Since concurrency is allowed in this case thus, multiple transactions can execute concurrently. A serializable schedule helps in improving both resource utilization and CPU throughput. These are of two types:
They belong to different transactions
They operate on the same data item
At Least one of them is a write operation
b. Non-Serializable: The non-serializable schedule is divided into two types, Recoverable and Non-recoverable Schedule.
W2(x) , W1(y), C1 , C2 ;
Given schedule follows order of Ti->Tj => C1->C2. Transaction T1 is executed before T2 hence there is no chances of con ict occur. R1(x) appears before W1(x) and transaction T1 is committed before T2 i.e. completion of rst transaction performed rst update on data item x, hence given schedule is recoverable.
Example 2: Consider the following schedule involving two transactions T and T. T T R(A) W(A) W(A) R(A) commit commit This is a recoverable schedule since T commits before T , that makes the value read by T correct.
There can be three types of recoverable schedule:
a. Cascading Schedule: When there is a failure in one transaction and this leads to the rolling back or aborting other dependent transactions, then such scheduling is referred to as Cascading rollback or cascading abort. Example:
b. Cascadeless Schedule: Also called Avoids cascading aborts/rollbacks (ACA). Schedules in which transactions read values only after all transactions whose changes they are going to read commit are called cascadeless schedules. Avoids that a single transaction abort leads to a series of transaction rollbacks. A strategy to prevent cascading aborts is to disallow a transaction from reading uncommitted changes from another transaction in the same schedule.
In other words, if some transaction T wants to read value updated or written by some other transaction T , then the commit of T must read it after the commit of T.
Example: Consider the following schedule involving two transactions T and T.
T T R(A) W(A) W(A) commit R(A) commit
This schedule is cascadeless. Since the updated value of A is read by T only after the updating transaction i.e. T commits.
Example: Consider the following schedule involving two transactions T and T.
T T R(A) W(A) R(A) W(A) abort abort
It is a recoverable schedule but it does not avoid cascading aborts. It can be seen that if T aborts, T will have to be aborted too in order to maintain the correctness of the schedule as T has already read the uncommitted value written by T.
c. Strict Schedule: A schedule is strict if for any two transactions T , T , if a write operation of T precedes a con icting operation of T (either read or write), then the commit or abort event of T also precedes that con icting operation of T. In other words, T can read or write updated or written value of T only after T commits/aborts.
Example: Consider the following schedule involving two transactions T and T.
T T R(A) R(A) W(A) commit W(A) R(A) commit
This is a strict schedule since T reads and writes A which is written by T only after the commit of T.
Note - It can be seen that:
DBMS | Con icting Serializability
Con ict Serializable: A schedule is called con ict serializable if it can be transformed into a serial schedule by swapping non-con icting operations. Con icting operations: Two operations are said to be con icting if all conditions satisfy: They belong to different transactions They operate on the same data item At Least one of them is a write operation Example: -
Con icting operations pair (R (A), W (A)) because they belong to two different transactions on same data item A and one of them is write operation. Similarly, (W (A), W (A)) and (W (A), R (A)) pairs are also con icting. On the other hand, (R (A), W (B)) pair is non-con icting because they operate on different data item. Similarly, ((W (A), W (B)) pair is non-con icting. Consider the following schedule: S1: R (A), W (A), R (A), W (A), R (B), W (B), R (B), W (B)
If O and O are two operations in a transaction and O < O (O is executed before O ), same order will follow in schedule as well. Using this property, we can get two transactions of schedule S1 as: T1: R (A), W (A), R (B), W (B)
T2: R (A), W (A), R (B), W (B)
Possible Serial Schedules are: T1->T2 or T2->T -> Swapping non-con icting operations R (A) and R (B) in S1, the schedule becomes, S11: R (A), W (A), R (B), W (A), R (A), W (B), R (B), W (B)
-> Similarly, swapping non-con icting operations W (A) and W (B) in S11, the schedule becomes, S12: R (A), W (A), R (B), W (B), R (A), W (A), R (B), W (B)
S12 is a serial schedule in which all operations of T1 are performed before starting any operation of T2. Since S has been transformed into a serial schedule S12 by swapping non-con icting operations of S1, S1 is con ict serializable.
Let us take another Schedule: S2: R (A), W (A), R (A), W (A), R (B), W (B), R (B), W (B)
Two transactions will be: T1: R (A), W (A), R (B), W (B)
T2: R (A), W (A), R (B), W (B)
Possible Serial Schedules are: T1->T2 or T2->T Original Schedule is: S2: R (A), W (A), R (A), W (A), R (B), W (B), R (B), W (B)
Swapping non-con icting operations R (A) and R (B) in S2, the schedule becomes, S21: R (A), W (A), R (B), W (A), R (B), W (B), R (A), W (B)
Similarly, swapping non-con icting operations W (A) and W (B) in S21, the schedule becomes, S22: R (A), W (A), R (B), W (B), R (B), W (B), R (A), W (A)
In schedule S22, all operations of T2 are performed rst, but operations of T1 are not in order (order should be R (A), W (A), R (B), W (B)). So S2 is not con ict serializable. Con ict Equivalent: Two schedules are said to be con ict equivalent when one can be transformed to another by swapping non-con icting operations. In the example discussed above, S11 is con ict equivalent to S1 (S1 can be converted to S11 by swapping non-con icting operations). Similarly, S11 is con ict equivalent to S12 and so on. Note 1: Although S2 is not con ict serializable, but still it is con ict equivalent to S21 and S21 because S2 can be converted to S21 and S22 by swapping non-con icting operations. Note 2: The schedule which is con ict serializable is always con ict equivalent to one of the serial schedule. S1 schedule discussed above (which is con ict serializable) is equivalent to serial schedule (T1->T2).
Question: Consider the following schedules involving two transactions. Which one of the following statement is true? S1: R (X) R (Y) R (X) R (Y) W (Y) W (X)
S2: R (X) R (X) R (Y) W (Y) R (Y) W (X) Both S1 and S2 are con ict serializable Only S1 is con ict serializable Only S2 is con ict serializable None [GATE 2007] Solution: Two transactions of given schedules are: T1: R (X) R (Y) W (X)
T2: R (X) R (Y) W (Y)
Let us rst check serializability of S1: S1: R (X) R (Y) R (X) R (Y) W (Y) W (X)
To convert it to a serial schedule, we have to swap non-con icting operations so that S1 becomes equivalent to serial schedule T1->T2 or T2->T1. In this case, to convert it to a serial schedule, we must have to swap R (X) and W (X) but they are con icting. So S1 can’t be converted to a serial schedule.
Now, let us check serializability of S2: S2: R (X) R (X) R (Y) W (Y) R (Y) W (X)
Swapping non con icting operations R (X) and R (X) of S2, we get S2’: R (X) R (X) R (Y) W (Y) R (Y) W (X)
Again, swapping non con icting operations R (X) and R (Y) of S2’, we get S2’’: R (X) R (Y) R (X) W (Y) R (Y) W (X)
Again, swapping non con icting operations R (X) and W (Y) of S2’’, we get S2’’’: R (X) R (Y) W (Y) R (X) R (Y) W (X)
which is equivalent to a serial schedule T2->T1.
So, correct option is C. Only S2 is con ict serializable.
View Serializability
The concurrent execution is allowed into DBMS so that the execution process is not slowed down, and the resources can be shifted when and where necessary. But the concurrency also creates another problem of resource allocation. If any ongoing transaction is under progress and the interleaving occurs in-between, then this may lead to inconsistency of the data. To maintain consistency, the interleaving must be tracked and checked. Isolation is one factor that must be kept in mind. It signi es whether all the individual transactions are run in an isolation manner or not, even when concurrency and interleaving of the process are allowed. Let's look at the below transactions T1 and T2.
The impact of the interleaving schedule must be such that T1 is executed before T2(T1T2) or T2 is executed before T1(T2T1). The execution must be equivalent to this condition. This is known as serializability. This is used to verify whether an interleaving schedule is valid or not. These are of two types:
Let's talk about View Serilizabilty: Considering the above transactions, to check for View Serializability we need to arrange for both the sequences i.e., T1T2 and T2T1 and check whether the original sequence is equivalent to the serializable sequence or not. Let's generate T1T2 and check whether it is view equivalent of the table: Read(X) X = X - 10 Write(X) Read(Y) Y = Y + 10 Write(Y)
Read(Y) Y = Y - 10 Write(Y) Read(Z) Z = Z + 10 Write(Z)
To check for view serializability, we need to check these three conditions for every data item as in X, Y and Z:
Initial Read: This speci es that the initial read on every data item must be done by the same transaction, in a serialized manner. Let's check for T1T2: For X, this is valid as in both the cases X is read by T1 rst. For Y, this condition fails, as in the serial case T1 reads Y rst, but in the original table, it is rst read by Y. So, in this case, the view equivalence fails. Hence no more conditions are needed to be checked for T1T2.
Let's check for T2T1: Read(Y) Y = Y - 10 Write(Y) Read(Z) Z = Z + 10 Write(Z)
Read(X) X = X - 10 Write(X) Read(Y) Y = Y + 10 Write(Y)
For Y, this is valid as in both the cases Y is read by T2 rst. We don't need to check for X and Z as they are not shared by both the transactions.
Updated Read: This says that when a transaction is reading a value that is written by another transaction, then the sequence must be the same. We don't need to check for X and Z as they are not sharing. In the case of Y T2T1, Y is written rst then T1 reads it. Hence the condition is valid.
Final Write: Let's check for the nal write for both the interleaved schedule and the serial schedule. We see that in both the case of T2T1, Y is written in a serial manner.
Therefore, we can conclude that it is viewed serializable in accordance with T2T1.
DBMS | Types of Recoverable Schedules
Generally, there are three types of Recoverable schedule given as follows:
Let's discuss each one of these in detail.
Example: S1: R1(x), W1(x) , R2(x), R1(y), R2(y),
W2(x) , W1(y), C1 , C2 ;
Given schedule follows order of Ti->Tj => C1->C2. Transaction T1 is executed before T2 hence there is no chances of con ict occur. R1(x) appears before W1(x) and transaction T1 is committed before T2 i.e. completion of rst transaction performed rst update on data item x, hence given schedule is recoverable.
Lets see example of unrecoverable schedule to clear the concept more: S2: R1(x), R2(x), R1(z), R3(x), R3(y), W1(x),
W3(y) , R2(y), W2(z), W2(y) , C1, C2 , C3 ;
Ti->Tj => C2->C3 but W3(y) executed before W2(y) which leads to con icts thus it must be committed before T2 transaction. So given schedule is unrecoverable. if Ti->Tj => C3->C2 is given in schedule then it will become recoverable schedule.
Note: A committed transaction should never be rollback. It means that reading a value from the uncommitted transaction and commits it will enter the current transaction into the inconsistent or unrecoverable state this is calledDirty Read problem.
Example:
Example: S3: R1(x), R2(z), R3(x), R1(z), R2(y), R3(y), W1(x), C1,
W2(z), W3(y) , W2(y) , C3 , C2 ;
In this schedule W3(y) and W2(y) overwrite con icts and there is no read, therefore given schedule is cascadeless schedule.
Special Case: A committed transaction desired to abort. As given below all the transactions are reading committed data hence it's cascade less schedule.
Cascading Abort: Cascading Abort can also be rollback. If transaction T1 abort as T2 read data that written by T1 which is not committed. Hence it's cascading rollback.
Example:
Example: S4: R1(x), R2(x), R1(z), R3(x), R3(y),
W1(x) , C1 , W3(y) , C3 , R2(y), W2(z) , W2(y) , C2 ;
In this schedule no read-write or write-write con ict arises before commit hence its strict schedule:
Corelation between Strict, Cascadeless and Recoverable schedule:
From above gure:
Rollback and its types in DBMS
What is a Rollback?
A Rollback refers to an operation in which the database is brought back to some previous state, inorder to recover the loss of data or to overcome an unexpected situation. A rollback not only recovers the database to a stable position but also helps to maintain the integrity of the database.
A transaction may not execute completely due to hardware failure, system crash or software issues. In that case, we have to roll back the failed transaction. But some other transaction may also have used values produced by the failed transaction. So we have to roll back those transactions as well.
Types of Rollback?
Dirty Read Problem and How to avoid it in DBMS
What is a Dirty Read Problem?
When a Transaction reads data from uncommitted write in another transaction, then it is known asDirty Read. If that writing transaction failed, and that written data may be updated again. Therefore, this causesDirty Read Problem.
In other words,
It is called dirty read because there is always a chance that the uncommitted transaction might roll back later. Thus, the uncommitted transaction might make other transactions read a value that does not even exist. This leads to inconsistency of the database.
For example, let’s say transaction 1 updates a row and leaves it uncommitted, meanwhile, Transaction 2 reads the updated row. If transaction 1 rolls back the change, transaction 2 will have read data that is considered never to have existed.
Note that there is no Dirty Read problem, is a transaction is reading from another committed transaction. So, no rollback required.
What does Dirty Read Problem result into?
Dirty Read Problem, in DBMS, results in the occurrence of Cascading Rollbacks.
Cascading Rollback: If in a schedule, failure of one transaction causes several other dependent transactions to rollback or abort, then such a schedule is called as a Cascading Rollback or Cascading Abort or Cascading Schedule. It simply leads to the wastage of CPU time. These Cascading Rollbacks occur because of Dirty Read problems.
For example, transaction T1 writes uncommitted x that is read by Transaction T2. Transaction T2 writes uncommitted x that is read by Transaction T3. Suppose at this point T1 fails. T1 must be rolled back since T2 is dependent on T1, T2 must be rolled back, and since T3 is dependent on T2, T3 must be rolled back.
Because of T1 rollback, all T2, T3, and T4 should also be rollback (Cascading dirty read problem).
This phenomenon, in which a single transaction failure leads to a series of transaction rollbacks is called Cascading rollback.
How to avoid Dirty Read Problem?
Dirty Read Problem can be avoided with the help of Cascadeless Schedule.
Cascadeless Schedule: This schedule avoids all possibleDirty Read Problem.
In Cascadeless Schedule, if a transaction is going to perform read operation on a value, it has to wait until the transaction who is performing write on that value commits. That means there must not be Dirty Read. Because Dirty Read Problem can causeCascading Rollback, which is inef cient.
Cascadeless Schedule avoids cascading aborts/rollbacks (ACA). Schedules in which transactions read values only after all transactions whose changes they are going to read commit are called cascadeless schedules. Avoids that a single transaction abort leads to a series of transaction rollbacks. A strategy to prevent cascading aborts is to disallow a transaction from reading uncommitted changes from another transaction in the same schedule.
In other words, if some transaction Tj wants to read value updated or written by some other transaction Ti, then the commit of Tj must read it after the commit of Ti.
Note that Cascadeless schedule allows only committed read operations. However, it allows uncommitted to write operations.
Also note that Cascadeless Schedules are always recoverable, but all recoverable transactions may not be Cascadeless Schedule.
DBMS | Concurrency Control Protocols
Introduction to Concurrency Control: The present-day scenario involves working on a multi-user system where there can be more than one or multiple database or transactions that needs access to. This can raise confusion and con ict within various operations under process. The con icts can be put under check during the READ operations but the condition might worsen when a single data in memory needs access from multiple databases, during the Write operations. This is where the Concurrency Control comes into play. This is an important procedure for the smooth ow of operations during a database transaction. It allows the simultaneous execution of multiple operations one by one without letting the rise of any deadlock situation. This, in turn, protects the data integrity of every database of every user. Example: Suppose an online product is needed to be bought by two different persons. However, there is only one product available. If both of them begin there buying process at the same time, this might lead to a con ict. In the absence of concurrency control, both of them might end up buying a single product, which will be a fault. This condition must be avoided and it is checked by the concurrency control which allows both the users to access the database throughout the process of ordering and only allows the one customer to buy who completes the process of the transaction rst.
Various Protocols in Concurrency: Lock-Based Protocols Two Phase Locking Timestamp-Based Protocols
Growing Phase: New locks on data items may be acquired but none can be released. Shrinking Phase: Existing locks may be released but no new locks can be acquired.
Although the protocol offers serializability, it doesn't avoid the deadlock situation. This can be further divided into a few types:
a. Strict 2PL: This method does not release the lock after every usage. It holds the lock until the whole process is over and releases the lock at one go. b. Centralized 2PL: This mechanism has a single lock manager for the management of the entire process. c. Primary copy 2PL: In this type, various copies of a single primary 2PL are distributed to various sites but the data items are controlled by a single primary 2PL. Once the main copy has been updated, the change is re ected on all the associated copies. d. Distributed 2PL: In this type, all the sites are provided with the lock managers to manage the locks of the data of those sites. It resembles primary 2PL when no changes take place in the lock managers and also the communication cost of the former is more than the latter.
The main idea for this protocol is to order the transactions based on their Timestamps. The advantages of using Timestamp-based protocols are: A schedule in which the transactions participate is then serializable and the only equivalent serial schedule permitted has the transactions in the order of their Timestamp Values. The deadlocks can be avoided since there is no waiting time for the transaction. One disadvantage of such a mechanism is that it might lead to starvation if the same transaction is continuously aborted and restarted.
The advantages of using Concurrency Control Protocols are: It helps to maintain serializability. To preserve data integrity. To x the read-write issues if any. For a smoother work ow of concurrent transactions.
Two Phase Locking Protocol
Now, recalling where we last left off, there are two types of Locks available Shared S(a) and Exclusive X(a). Implementing this lock system without any restrictions gives us the Simple Lock based protocol (orBinary Locking), but it has its own disadvantages, they does not guarantee Serializability. Schedules may follow the preceding rules but a non-serializable schedule may result.
To guarantee serializability, we must follow some additional protocolconcerning the positioning of locking and unlocking operations in every transaction. This is where the concept of Two-Phase Locking(2-PL) comes in the picture, 2-PL ensures serializability. Now, let's dig deep!
Let's get familiar with the two common locks which are used and some terminology followed in this protocol. Shared Lock (S): It is also known as Read-only lock. As the name suggests it can be shared between transactions because while holding this lock the transaction does not have the permission to update data on the data item. S-lock is requested using lock-S instruction. Exclusive Lock (X): In this lock, the data items can be both read and written. This is Exclusive and cannot be held simultaneously on the same data item. X-lock is requested using lock-X instruction.
A transaction is said to follow Two Phase Locking protocol if Locking and Unlocking can be done in two phases.
LOCK POINT: The Point at which the growing phase ends, i.e., when the transaction takes the nal lock it needs to carry on its work. Now look at the schedule, you’ll surely understand. Note - If lock conversion is allowed, then upgrading of lock( from S(a) to X(a) ) is allowed in Growing Phase and downgrading of lock (from X(a) to S(a)) must be done in shrinking phase. Let's see a transaction implementing 2-PL.
This is just a skeleton transaction which shows how unlocking and locking work with 2-PL. Note for: For Transaction T1, the growing phase is marked before the Lock point of T1, followed by its shrinking phase. For Transaction T2, the growing phase is marked before the Lock point of T2, followed by its shrinking phase.
This transaction also takes care of Con ict Serializability as the transactions do not con ict with each other. The use of lock mechanism and lock point avoids the con ict. Since the transaction is con ict serializable, it automatically ensures view serializability. In the above example, T2T1 is con ict equivalent to this schedule. To check this let's draw the precedence graph.
Since there is no cycle involved, it ensures the serializability.
Problems with Basic 2PL: Deadlock in 2-PL - Consider this simple example, it will be easy to understand.Say we have two transactions T and T. Schedule: T1: Lock-X (A) T2: Lock-X (B) T1: Lock-X (B) T2: Lock-X (A)
In this T1: Lock-X (B) has to wait for T2 and T2: Lock-X (A) has to wait for T1.
Recoverability in 2-PL - Looking at the below example we can see that Dirty read occurs between Write(A) of T1 and Read(A) of T2, leading to irrecoverability.
Hence this must be avoided, and thus basic 2PL fails to resolve these issues.
Conservative, Strict and Rigorous 2PL
Now that we are familiar with what is Two Phase Locking (2-PL) and the basic rules which should be followed which ensures serializability. Moreover, we came across the problems with 2-PL, Cascading Aborts and Deadlocks. Now, we turn towards the enhancements made on 2-PL which tries to make the protocol nearly error-free. Brie y, we allow some modi cations to 2-PL to improve it. There are three categories:
Also known as Static 2-PL, this protocol requires the transaction to lock all the items it access before the Transaction begins execution by predeclaring its read-set and write-set. If any of the predeclared items needed cannot be locked, the transaction does not lock any of the items, instead, it waits until all the items are available for locking. Conservative 2-PL isDeadlock free and but it does not ensure Strict schedule(More about this here!). However, it is dif cult to use in practice because of the need to predeclare the read-set and the write-set which is not possible in many situations. In practice, the most popular variation of 2-PL is Strict 2-PL.
This requires that in addition to the lock being 2-Phase all Exclusive(X) Locks held by the transaction be released untilafter the Transaction Commits. Following Strict 2-PL ensures that our schedule is: Recoverable Cascadeless
Hence it gives us freedom from Cascading Abort which was still there in Basic 2-PL and moreover guarantee Strict Schedules but stillDeadlocks are possible!
This requires that in addition to the lock being 2-Phase all Exclusive(X) and Shared(S) Locks held by the transaction be released untilafter the Transaction Commits. Following Rigorous 2-PL ensures that our schedule is: Recoverable Cascadeless
Hence it gives us freedom from Cascading Abort which was still there in Basic 2-PL and moreover guarantee Strict Schedules but stillDeadlocks are possible!
Note the difference between Strict 2-PL and Rigorous 2-PL is that Rigorous is more restrictive, it requires both Exclusive and Shared locks to be held until after the Transaction commits and this is what makes the implementation of Rigorous 2-PL easier.
The Venn Diagram below shows the classi cation of schedules which are rigorous and strict. The universe represents the schedules which can be serialized as 2-PL. Now as the diagram suggests, and it can also be logically concluded, if a schedule is Rigorous then it is Strict. We can also think in another way, say we put a restriction on a schedule which makes it strict, adding another to the list of restrictions make it Rigorous. Take a moment to again analyze the diagram and you'll de nitely get it.
Image - Venn Diagram showing categories of languages under 2-PL
Comparisons:
Rigorous Vs Strict:
Overview (^) Learn Quiz Contest
−
−
−
−
−
st nd rd
−
1 1 2 2 c-1 c-1 c i i
1 2 c- i i-1 i i-
1 1 2 2 c-1 c-1 next i i i next 1 2 c-
next
next
−
−
1 2
1 2
1 2
j i j i
1 2 1 2
1 2 2
j i j i
1 2
1 2
2 1
1 2
1 2
1 2 2 1
i j i j i j j i i
1 2
1 2
2 1 1
−
1 2
1 2 1 2 1 2 1 2
1 1 2 2 1 1 2 2
i j i j i j
1 1 1 1
2 2 2 2
2 1 1 1 1 2 2 1 2 2
2 1 1 1 1 1 2 2 2 2
2 2 1 1 1 1 2 2
1 1 1 1
2 2 2 2
2 2 1 1 1 1 2 2
1 2 2 2 2 1 1 1 1 2
1 2 2 2 2 2 1 1 1 1
1 1 1 1
1 1 2 2 2 1
1 2 2 2 1 1
1 1 1
2 2 2
1 1 2 2 2 1
2 1
1 2 2 2 1 1
1 2 2 1 2 2 1 1
1 2 2 2 1 2 1 1
1 2 2 2 2 1 1 1
− − − − − −
1 2 1 2 1 2
1 2
−
Filter
Problems Courses Get Hired
2