Download Database Management Systems (DBMS) Synopsis: A Comprehensive Guide to Database Architectur and more Schemes and Mind Maps Database Management Systems (DBMS) in PDF only on Docsity!
DATABASE MANAGEMENT SYSTEMS (DBMS) SYNOPSIS
Unit-1: Database System Architecture and Data Models:
Data Abstraction, Data Independence, Data Definition Language (DDL), Data Manipulation Language (DML), Entity-relationship model, network model, relational and object oriented data models, integrity constraints, data manipulation operations.
Unit-2: Relational Query Languages and Relational Database Design:
Relational algebra, Tuple and domain relational calculus, SQL3, DDL and DML constructs, Open source and Commercial DBMS - MYSQL, ORACLE, DB2, SQL server.
Unit-3: Query Processing and Optimization and Storage Strategies:
Evaluation of relational algebra expressions, Query equivalence, Join strategies, Query optimization algorithms, Indices, B-trees, hashing.
Unit-4: Transaction Processing and Database Security:
Concurrency control, ACID property, Serializability of scheduling, Locking and timestamp based schedulers, Multi-version and optimistic Concurrency Control schemes, Database recovery Authentication, Authorization and access control.
Unit-5: SQL and PL/SQL Concepts:
Basics of SQL, DDL,DML,DCL, structure – creation, alteration, defining constraints – Primary key, foreign key, unique, not null, check, IN operator, aggregate functions, Built-in functions – numeric, date, string functions, set operations, sub-queries, correlated sub-queries, join, Exist, Any, All , view and its types., transaction control commands
---------------------------
Unit-1: Database System Architecture and Data Models:
Data Abstraction, Data Independence, Data Definition Language (DDL), Data Manipulation Language (DML), Entity-relationship model, network model, relational and object oriented data models, integrity constraints, data manipulation operations.
- A database management system (or DBMS) is essentially nothing more than a computerized data-keeping system. Users of the system are given facilities to perform several kinds of operations on such a system for either manipulation of the data in the database or the management of the database structure itself.
- The DBMS manages incoming data, organizes it, and provides ways for the data to be modified or extracted by users or other programs. Some DBMS examples include MySQL, PostgreSQL, Microsoft Access, SQL Server, FileMaker, Oracle, RDBMS, dBASE, Clipper, and FoxPro.
- FORMS allow you to both add data to tables and view data that already exists. REPORTS present data from tables and also from QUERIES, which then search for and analyze data within these same tables.
Database System Applications, Purpose of Database Systems, Railway Reservation System; Library Management System; Banking; Education Sector; Credit card exchanges; Social Media Sites; Pinterest, Quora, Facebook, Twitter, and Linked in; Broadcast communications; Account; Online Shopping; Human Resource Management; Manufacturing; Airline Reservation System; etc.
- A representation of DBMS design- It helps to design, develop, implement, and maintain the database management system.
- One Tier Architecture (Single Tier Architecture)
- Two Tier Architecture
- Three Tier Architecture
- The following features are desirable in a database system used in transaction processing systems: Good data placement: The database should be designed to access patterns of data from many simultaneous users.
- Short transactions: Short transactions enables quick processing. This avoids concurrency and paces the systems.
- Real-time backup: Backup should be scheduled between low times of activity to prevent lag of the server.
- High normalization: This lowers redundant information to increase the speed and improve concurrency, this also improves backups.
- Physical Data Independence: This is defined as the ability to modify the physical schema of the database without the modification causing any changes in the logical/conceptual or view/external level.
- Examples of Physical Data Independence:
- Changing from one data structure to another.
- Making use of new storage technology, such as a hard drive or magnetic tapes
- Change the location of the database from one drive to another.
- Changing the database's file organization.
- Logical Data Independence
- Logical data independence is the ability to modify logical schema without causing any unwanted modifications to the external schema or the application programs to be rewritten.
- Logical data is database data, which means it stores information about how data is managed within the database. Logical data independence is a method that makes sure that if we make modifications to the table format, the data should not be affected.
- Examples of Logical Data Independence:
- Without rewriting current application scripts, you can add, modify, or delete a new attribute, entity, or relationship.
- To divide an existing record into two or more records.
- Merging two records into a single one.
- A DATA DEFINITION LANGUAGE (DDL) is a computer language used to create and modify the structure of database objects in a database. These database objects include views, schemas, tables, indexes, etc. aka DATA DESCRIPTION LANGUAGE
- It is considered to be a subset of SQL (STRUCTURED QUERY LANGUAGE).
- Common examples of DDL statements include CREATE, ALTER, and DROP.
- A DATA MANIPULATION LANGUAGE (DML) is a computer programming language used for adding (inserting), deleting, and modifying (updating) data in a database. A DML is often a sublanguage of a broader database language such as SQL, with the DML comprising some of the operators in the language.
DDL- Data definition language, DCL= Data control language, DML = Data manipulation language, TCL- transaction control language, DQL – Data query language
- ER model (Entity-Relationship model) It is a high-level data model. This model is used to define the data elements and relationship for a specified system.
- Four types of relationships exist in relational database design:
- ONE TO ONE - where one table record relates to another record in another table
- ONE TO MANY - where one table record relates to multiple records in another table
- MANY TO ONE - where more than one table record relates to another table record
- MANY TO MANY - where multiple records relate to more than one record in another table
weak entity setsThe entity sets which do not have sufficient attributes to form a primary key are known as weak entity sets and the entity sets which have a primary key are known as strong entity sets. As the weak entities do not have any primary key, they cannot be identified on their own, so they depend on some other entity (known as owner entity). The weak entities have total participation constraint (existence dependency) in its identifying relationship with owner identity.
- A network model of DBMS is a way of organizing data in which multiple many-to-many relationships between records are represented using a graph-like structure. In this model, records are represented as nodes in the graph, and relationships between records are
represented as edges connecting the nodes. A network model of DBMS can handle complex data structures and support multiple paths to access the same data. A network model of DBMS was formalized by the Database Task group in the 1960s https://www.geeksforgeeks.org/network-model-in-dbms/ and was one of the most popular models before the introduction of the relational model.
- Some of the advantages of a network model of DBMS are:
- • It is simple and easy to design like the hierarchical model.
- • It can represent 1:1, 1:M, and M:N relationships among entities.
- • It can avoid data redundancy problems by supporting multiple links to the same record.
- • It can provide fast and efficient data access by using pointers or direct links.
- Some of the disadvantages of a network model of DBMS are:
- • It is difficult to maintain and modify the database schema as any change in the structure affects all the related records.
- • It requires complex programming and navigation logic to traverse the database graph.
- • It is not compatible with standard query languages like SQL.
- Some examples of network model of DBMS are:
- • IDMS (Integrated Database Management System) by CA Technologies
- • RDM (Relational Data Model) by Raima Inc.
- • TurboIMAGE by Hewlett-Packard
- What is semi-structured data in database?
- Semi-structured data refers to data that is not captured or formatted in conventional ways. Semi-structured data does not follow the format of a tabular data model or relational databases because it does not have a fixed schema.
- An object database is a database management system in which information is represented in the form of objects as used in object-oriented programming. Object databases are different from relational databases which are table-oriented.
- Object oriented data models are a way of organizing data in which both data and their relationships are contained in a single structure called an object. An object is an abstraction of a real-world entity that has attributes (properties) and methods (behaviors). Object oriented data models are used to represent complex and diverse data types, such as images, audio, video, and other multimedia. Object oriented data models also support features such as
Polymorphism is the ability of an object to take on many forms. The most common use of polymorphism in OOP occurs when a parent class reference is used to refer to a child class object. Object-oriented programming -ability to process objects differently depending on their class or data type.
- Object-Oriented Database featuresMost contain the following features:
- Query Language - Language to find objects and retrieve data from the database.
- Transparent Persistence-Ability to use an object-oriented programming language for data manipulation.
- ACID Transactions- Atomicity, Consistency, integrity and durability transactions guarantee all transactions are complete without conflicting changes.
- Database Caching-Creates a partial replica of the database. Allows access to a database from program memory instead of a disk.
- Disaster recovery - in case of application or system failure.
- EncapsulationThe provision of an interface for a piece of software or hardware to allow or simplify access for the user.
- We can create a fully encapsulated class in Java by making all the data members of the class private. Now we can use setter and getter methods to set and get the data in it.
- Inheritance is an important pillar of OOP(Object-Oriented Programming). It is the mechanism in java by which one class is allowed to inherit the features(fields and methods) of another class.
- Super Class: The class whose features are inherited is known as superclass(or a base class or a parent class).
- Sub Class: The class that inherits the other class is known as a subclass(or a derived class, extended class, or child class). The subclass can add its own fields and methods in addition to the superclass fields and methods.
- Reusability: Inheritance supports the concept of “reusability”,
- Some prominent ODBMSMongoDB-NoSQL- data objects are stored as documents
- Oracle, Microsoft SQL, IBM DB2 and PostGreSQL support objects
- LISP (List programming), Small Talk
- Applications of ODBMSEngineering design, CAD, STAAD, VEDA, 3D
- Architecture, building information modeling
- Astronomy
- Real time systems for manufacturing, traffic management, ATC etc.
- Telecommunications,
- Molecular sciences
- Hi performance, calculations and fast results
- Integrity constraintsIntegrity constraints are rules that help to maintain the accuracy and consistency of data in a database. They can be used to enforce business rules or to ensure that data is entered correctly. There are different types of integrity constraints in DBMS, such as:
- • Domain constraints: These specify the valid values for an attribute or a column in a table. For example, if the domain of an attribute is integer, then only integer values can be inserted into that column.
- • Key constraints: These ensure that each row in a table is uniquely identified by a primary key or a candidate key. A primary key is a minimal set of attributes that can uniquely identify a row, and a candidate key is any other set of attributes that can also serve as a primary key For example, if the primary key of a table is employee ID, then no two rows can have the same employee ID value
- Integrity constraints are important for maintaining the quality and reliability of data in a database. They can prevent data anomalies, such as insertion, deletion, and update anomalies, that can cause inconsistency and redundancy in the database. They can also
Unit-2: Relational Query Languages and Relational Database Design:
Relational algebra, Tuple and domain relational calculus, SQL3, DDL and DML constructs, Open source and Commercial DBMS - MYSQL, ORACLE, DB2, SQL server.
Relational Algebra is procedural query language, which takes Relation as input and generate relation as output. Relational algebra mainly provides theoretical foundation for relational databases and SQL.
- Operators in Relational Algebra
- Projection (π) Projection is used to project required column data from a relation.
By Default projection removes duplicate data.
Selection
- (σ) Selection is used to select required tuples of the relations.
- for the above relation σ (c>3)R will select the tuples which have c more than 3.
- Note: selection operator only selects the required tuples but does not display them. For displaying, data projection operator is used.
- For the above selected tuples, to display we need to use projection also.
- Union (U) Union operation in relational algebra is same as union operation in set theory, only constraint is for union of two relation both relation must have same set of Attributes.
- Set Difference (-) Set Difference in relational algebra is same set difference operation as in set theory with the constraint that both relation should have same set of attributes.
- Rename (ρ) Rename is a unary operation used for renaming attributes of a relation. ρ (a/b)R will rename the attribute ‘b’ of relation by ‘a’.
- Cross Product (X)
- Cross product between two relations let say A and B, so cross product between A X B will results all the attributes of A followed by each attribute of B. Each record of A will pairs with every record of B.
Natural Join ( ⋈ ) Natural join is a binary operator. Natural join between two or more relations will result set of all combination of tuples where they have equal common attribute. Conditional join works similar to natural join. In natural join, by default condition is equal between common attribute while in conditional join we can specify the any condition such as greater than, less than, not equalSelect(σ)The SELECT operation is used for selecting a subset of the tuples according to a given selection condition
Projection(π)The projection eliminates all attributes of the input relation but those mentioned in the projection list.
Union Operation( ∪ )UNION is symbolized by symbol. It includes all tuples that are in tables A or in B.
Set Difference(-)– Symbol denotes it. The result of A – B, is a relation which includes all tuples that are in A but not in B.
- Intersection(∩) defines a relation consisting of a set of all tuple that are in both A and B.
- Cartesian Product(X) operation is helpful to merge columns from two relations.
- Inner join, includes only those tuples that satisfy the matching criteria.
- Theta Join(θ)The general case of JOIN operation is called a Theta join. It is denoted by symbol θ.
- EQUI Join-When a theta join uses only equivalence condition, it becomes a equi join.
- Natural Join( ⋈ ) can only be performed if there is a common attribute (column) between the relations.
- Left Outer Join(=|X|)In the left outer join, operation allows keeping all tuple in the left relation.
- Right Outer join(|x|=)In the right outer join, operation allows keeping all tuple in the right relation.
- Full Outer Join(=|X|=)In a full outer join, all tuples from both relations are included in the result irrespective of the matching condition.
- Insertion.
- insert into course (course_id, title, dept_name, credits)
A ⋈ A.column 2 < B.column 2 (B)
column 1 column 2
2 2
- EQUI Join is done when a Theta join uses only the equivalence condition. EQUI join is the most difficult operation to implement efficiently in an RDBMS, and one reason why RDBMS have essential performance problems.
- For example:
- A ⋈ A.column 2 = B.column 2 (B)
- Natural Join ( ⋈ )
- Natural Join does not utilize any of the comparison operators. In this type of join, the attributes should have the same name and domain. In Natural Join, there should be at least one common attribute between two relations.
- It performs selection forming equality on those attributes which appear in both relations and eliminates the duplicate attributes.
- An Outer Join doesn’t require each record in the two join tables to have a matching record. In this type of join, the table retains each record even if no other matching record exists.
- Three types of Outer Joins are:
- Left Outer Join
- Right Outer Join
- Full Outer Join
- Tuple Relational Calculus (TRC) and Domain Relational Calculus (DRC) Tuple Relational Calculus (TRC) and Domain Relational Calculus (DRC) are two non-procedural query languages used in relational database management systems (RDBMS) to retrieve data from tables. TRC is used to select tuples from a relation based on a condition, while DRC is used to select individual values from a relation based on a condition.
- In TRC, the variables represent the tuples from specified relations, while in DRC, the variables represent the value drawn from a specified domain. TRC uses tuple variables to represent tuples, while DRC uses individual value variables.
- Here are some examples of how to use TRC and DRC: TRC: {T | EMPLOYEE (T) AND T.DEPT_ID = 10} selects all the tuples of employee names who work for Department 10.
- DRC: { | < EMPLOYEE > DEPT_ID = 10 } selects EMP_ID and EMP_NAME of employees who work for department 10.
- SQL3 is the name of the next version of the ANSI/ISO SQL standard, which is expected to introduce new features and enhancements to the SQL language. SQL3 is also known as SQL:2023, as it is planned to be published in 2023. Some of the main features of SQL3 are:
- Object-relational extensions: SQL3 will support object-oriented concepts, such as user-defined types, inheritance, polymorphism, and methods. This will allow SQL to handle complex and diverse data types, such as multimedia, spatial, and temporal data
- Persistent stored modules: SQL3 will support the creation and execution of stored procedures and functions, which are blocks of SQL code that can be invoked from within the database or from external applications. This will improve the modularity, reusability, and security of SQL code
- Triggers: SQL3 will support the definition and activation of triggers, which are special types of stored procedures that automatically execute when certain events occur in the database, such as insert, update, or delete operations. This will enable SQL to perform complex actions and enforce business rules based on data changeshttps://www.w3schools.com/sql/default.asp.
- Dynamic SQL: SQL3 will support the generation and execution of SQL statements at run time, based on user input or program logic. This will allow SQL to adapt to different situations and data sources dynamically
- Recursive queries: SQL3 will support the use of recursive queries, which are queries that refer to themselves or to other queries within the same statement. This will allow SQL to handle hierarchical and network data structures more efficiently
- SQL3 is not yet widely implemented by database systems, but some of its features are already supported by some popular databases, such as SQLite, MySQL, SQL Server, Oracle, PostgreSQL
- • Pipelining: This method evaluates several operations simultaneously in a pipeline, where the output of one operation is passed directly to the next operation without storing it in a temporary file. This method reduces the disk I/O cost and improves the performance, but it requires more memory and synchronization
- Evaluation of relational algebra expressions-2Heuristic optimization: This method applies some rules or heuristics to transform a relational algebra expression into an equivalent one that has a lower cost. For example, some heuristics are: push selections down as far as possible, push projections down as far as possible, combine selections and projections into one operation, etc
- • Cost-based optimization: This method estimates the cost of different evaluation plans for a relational algebra expression and chooses the one with the lowest cost. The cost can be measured by various factors, such as the number of disk accesses, CPU time, memory usage, etc. To estimate the cost, this method uses statistics and information about the relations, such as the size, cardinality, distribution, indexes, etc
- Query equivalenceQuery equivalence and join strategies are two important concepts in query processing and optimization, which are the steps of transforming and executing a query in a database system.
- Query equivalence means that two queries produce the same result for any database instance, regardless of the syntax or the order of operations. For example, the queries SELECT *** FROM R WHERE A = 10 and SELECT * FROM (SELECT * FROM R WHERE A = 10) are equivalent,** as they both return the same set of records from relation R that have A equal to 10. Query equivalence can be used to rewrite a query into a simpler or more efficient form, using some rules or heuristics, such as pushing selections and projections down, combining operations, etc
- Join strategies are the methods of implementing the join operation, which combines records from two or more relations based on a common attribute. There are different types of join strategies, such as nested-loop join, single-loop join, hash join, merge join, etc. Each join strategy has its own advantages and disadvantages, depending on the size, distribution, and indexing of the relations involved. The choice of the best join strategy for a query depends on various factors, such as the cost of disk I/O, CPU time, memory usage, etc
- Query optimization algorithms are methods of finding the most efficient way to execute a query in a database system. Query optimization algorithms can be classified into two main categories: heuristic-based and cost-based
- Heuristic-based algorithms use some rules or guidelines to transform a query into an equivalent form that may be evaluated more efficiently. For example, some heuristics are: push selections and projections down as far as possible, combine operations, use indexes, etc.
Heuristic-based algorithms are simple and fast, but they may not always find the optimal solution.
- Cost-based algorithms use statistics and information about the database to estimate the cost of different execution plans for a query and choose the one with the lowest cost. The cost can be measured by various factors, such as the number of disk accesses, CPU time, memory usage, etc. Cost-based algorithms are more accurate and flexible, but they require more computation and maintenance.
- • Exhaustive search: This algorithm generates all possible execution plans for a query and selects the one with the lowest cost. This algorithm is guaranteed to find the optimal solution, but it is very expensive and impractical for large queries
- • Dynamic programming: This algorithm uses a bottom-up approach to divide a query into subqueries and find the optimal plan for each subquery. This algorithm reduces the search space by reusing the optimal plans for subqueries, but it still has a high complexity and memory requirement
- • Greedy algorithm: This algorithm uses a top-down approach to select one operation at a time for a query based on some heuristic or cost function. This algorithm is fast and simple, but it may get stuck in a local optimum and miss the global optimum
- • Genetic algorithm: This algorithm uses an evolutionary approach to generate and improve a population of execution plans for a query based on some fitness function. This algorithm can explore a large and diverse search space, but it may take a long time to converge to a good solution
- Indices in DBMS are data structures that help to improve the performance of a database by reducing the number of disk accesses required to process a query. Indices are created on one or more columns of a table and store the values of those columns in a sorted order along with pointers to the corresponding records in the table. Indices can speed up data retrieval, especially for queries that involve selection, projection, or join operations
- There are different types of indices in DBMS, such as:
- • Primary index: This is an index created on the primary key of a table, which is a unique identifier for each record. A primary index can be dense or sparse, depending on whether it contains an entry for every record or only for some records in the table
- • Secondary index: This is an index created on a non-primary key column or a combination of columns that may not be unique for each record. A secondary index is always dense and can be used to access records based on other attributes than the primary key
- Clustered index: This is an index that determines the physical order of records in a table. A table can have only one clustered index, which can be based on the primary key or any other