Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Data, Files and Databases, Lecture notes of Database Management Systems (DBMS)

An overview of data, files, and databases. It explains the concept of data and files, and how files are used to store data. It also introduces the concept of databases and their components. basic terms related to databases such as entity, attribute, domain, and instance. It also explains the importance of maintaining large data in a database rather than a conventional file system.

Typology: Lecture notes

2022/2023

Available from 10/30/2023

jivan-chelekar
jivan-chelekar 🇮🇳

1 document

1 / 135

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
Chapter 0
Data, Files and Databases
Topics To Learn
0.0 Overview
0.1 Introduction
0.2 Operations on file
0.3 Introduction to Database
0.0 Overview
In this chapter we will study the concept of data, Files and then different
operations that we can perform on the file. In the section 0.1 you will study Data
and Files. Data is a collection of facts, such as values or measurements. It can be
numbers, words, measurements, observations or even just descriptions of things.
After understanding data we will study the concept of files. File is a container for
data. We use files to keep our data safe and secured. For example, in office we
keep all office information documents (office data) in different files. And each file
has given a particular name. Similarly in computer we use files to keep our data.
We will see these terms in section 0.2.
After understanding data and files we will study Database. Database is also a
type of data storage like file but it has some higher facilities and features than
a regular file which help us to manage the data easily. As in the market we are
having different types of file folders to keep our paper documents similarly the
database is also available in different structures and models to provide ease of
retrieving the data.
0.1 Introduction
Data: In general we define data as; it is information in raw or unorganized form
(such as alphabets, numbers or symbols) that represents conditions, ideas, or
objects.
In short, Data is a collection of facts, Figures and statistics related to an
object. Data can be processed to create useful information.
Hence we can say manipulated and processed form of data is called information,
which is more meaningful than data. Data is used as input for processing and
information is the output of this processing, as shown below.
Processing
Data Information
Mumbai 90
For example: Suppose you are going to visit in a shop to purchase some items. So
before you go, you make a list of items you want to purchase. This item list
contains name and quantity of items (i.e. data about items) you are purchasing.
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37
pf38
pf39
pf3a
pf3b
pf3c
pf3d
pf3e
pf3f
pf40
pf41
pf42
pf43
pf44
pf45
pf46
pf47
pf48
pf49
pf4a
pf4b
pf4c
pf4d
pf4e
pf4f
pf50
pf51
pf52
pf53
pf54
pf55
pf56
pf57
pf58
pf59
pf5a
pf5b
pf5c
pf5d
pf5e
pf5f
pf60
pf61
pf62
pf63
pf64

Partial preview of the text

Download Data, Files and Databases and more Lecture notes Database Management Systems (DBMS) in PDF only on Docsity!

Chapter 0

Data, Files and Databases

Topics To Learn

0.0 Overview

  1. 1 Introduction

  2. 2 Operations on file

  3. 3 Introduction to Database

0.0 Overview

In this chapter we will study the concept of data, Files and then different

operations that we can perform on the file. In the section 0.1 you will study Data

and Files. Data is a collection of facts, such as values or measurements. It can be

numbers, words, measurements, observations or even just descriptions of things.

After understanding data we will study the concept of files. File is a container for

data. We use files to keep our data safe and secured. For example, in office we

keep all office information documents (office data) in different files. And each file

has given a particular name. Similarly in computer we use files to keep our data.

We will see these terms in section 0.2.

After understanding data and files we will study Database. Database is also a

type of data storage like file but it has some higher facilities and features than

a regular file which help us to manage the data easily. As in the market we are

having different types of file folders to keep our paper documents similarly the

database is also available in different structures and models to provide ease of

retrieving the data.

0.1 Introduction

Data: In general we define data as; it is information in raw or unorganized form

(such as alphabets, numbers or symbols) that represents conditions, ideas, or

objects.

In short, Data is a collection of facts, Figures and statistics related to an

object. Data can be processed to create useful information.

Hence we can say manipulated and processed form of data is called information,

which is more meaningful than data. Data is used as input for processing and

information is the output of this processing, as shown below.

Processing

Data Information

Mumbai 90

For example: Suppose you are going to visit in a shop to purchase some items. So

before you go, you make a list of items you want to purchase. This item list

contains name and quantity of items (i.e. data about items) you are purchasing.

So here name and price of items is nothing but the data about that item.

Data can exist in a variety of forms such as numbers, text or symbols on pieces of

paper, as bits and bytes stored in electronic memory, or as facts stored in a person's

mind.

 In terms of computer data are symbols or signals that are input to the computer,

stored in computer, and processed by a computer for output as usable

information.

 We can say for example:

011000110011… can be represented as binary data in computer.

 Data is often distinguished from programs in computer. We define a program,

as a set of instructions that perform a particular task for computer.

For example: we write C’ Program for calculating sum of two numbers. This

program contains set of C’ language’s instructions. Hence we cannot say those

instructions as data. Yes but whatever numbers we are providing to that program to

calculate sum are ‘data’.

In this sense, data is everything that is not program code (instructions of

programs).

Hence from above discussion we can differentiate Data, Instruction and Programs

as-

Data: raw information.

Information: processed data.

Programs: set of instructions

Files: We use file to save the data. In general we keep our documents (papers on

which something is written) in a folder file where they can be safe and secured.

For example after making a list of your purchasing items you may keep that list in

your pocket or in envelope where it can be safe. With same manner, we store

computer data in file (i.e. file of computer) where that data will be safe and secure.

Item List

Sugar 5 kg

Wheat 10 kg

Oil 5ltr

Rice 10 kg

Soap 4 qty

E.g.. Name and Quantity

of items on item list are

data about items to be

purchased.

We use file to save our data..

We will see this term in more detail in further chapters.

Now let us continue with Update Operation….

Update operation on the other hand; change the file by modifying the

records, deleting the records and inserting new records. That is we actually modify

the contents in file in update operation.

The operations for locating, modifying, deleting and inserting records will

vary from system to system, but there are most some operations that are used in

most systems and those are given below:

Find (Locate): The goal of this operation is to locate the record or records that

satisfy the search criteria. A block that contains the records is transferred to the

main memory and the records are searched. The first record that matches the

search criteria is located and the search continues for other records until the end

of file is reached.

Read: In read operation the contents of the records are copied from memory to

a program variable or work area. This command in some cases advances the

pointer to next record in the result set.

Read Next: Searches the next record that matches the selection condition and

when found, the contents of record are copied to the program variable or work

area.

Modify: Also known as update. This command modifies the field values / data

of the current record then writes modified record back to the disk.

Insert: Inserts a new record to the file.

Delete: Deletes the current record and updates the file on the disk to reflect the

deletion.

0.3 Introduction to Database

Before we attempt to see what Data Base Management System is, we will

see what is Database and what it contains.

0.3.1 Definition of Data Base:

 We use different files such as word file or Excel File to store text type of data

or bmp files to store image type of data in computer. Similarly DBMS provides

us another type of file known as database file to store our data. The difference

between those ordinary text files and database file is that the database file (or

simply, “Database”) is more intelligent than those text files.

 A database is an application that manages data and allows fast storage and

retrieval of that data. A database can organize, store, and retrieve large amount

of data easily.

 And a DBMS (database management system) is a piece of software that is

designed to make the different database operations easier.

 By storing data in a database, rather than in files, we can use the features of

database to manage the data in a robust and efficient manner.

 For example: let consider a college contains a large collection of data (say, 30

GB) of student’s information such as student’s name, student’s class, student’s

division, student’s Roll no., marks, fees and so on. Several users access this

data concurrently. Hence any questions that user has asked about the data must

be answered quickly, changes made to the data by different users must be

applied consistently, and access to certain parts of the data (E.g.., fees) must be

restricted.

 To maintain such large data in conventional file system is a time consuming

and more critical job and we probably do not have 30 GB of main memory to

hold all the data.

 Hence we can deal with this data management problem by storing the data in a

database.

 There are different types of database but the most popular is a relational

database that stores data in tables where each row in the table holds the same

sort of information. In the early 1970s, Ted Codd, an IBM researcher devised

12 laws of normalization. These apply to how the data is stored and relations

between different tables that we will see in further chapters.

0.3.2 Components of Database:

 A database consists of four elements as shown in Figure(0.1) below.

Figure 0.1 Components of Database

Data Item

 Each attribute of an entity is represented in storage by a data item.

 For example: Suppose there is a database of “customer account”. And this

database contains attributes Cust_name, Cust_id, Cust_add, Balance.

Hence there is a data item for Cust_name, another data item for Cust_id etc.

 A data item is assigned a name in order to refer it in storage, retrieval and

processing.

Relationships represent a correspondence between the various data elements /

entities. In short it represents relations among the different entities.

Constraints are predicates that define correct database states.

In other words we can say that the Constraints within a database are rules,

which control values allowed in columns and also enforce the integrity between

columns and tables.

For example, consider a column constraint (i.e. constraint for column) “CHECK”.

The “CHECK” constraint for column will limit the values allowed for that

column.

etc. so these are the domains for the attributes sud_marks and stud_division

respectively.

Instance:

 An instance of the entity represented by a set of specific values for each of the

attributes at particular time period.

 For example; suppose there is a furniture shop where we are having variety of

furniture so here furniture is the entity and the attributes of furniture entity

could be Furniture_type, item_color, item_price etc.

 The attributes could be same for all kind of furniture in the shop. But the values

of the attributes in each instance are different.

 Thus we have chair, black, 2000 Rupees, as one instance and bed, brown,

10000 Rupees as another instance.

 These two cases represent the attributes of two instances of the entity

“furniture”.

Record/ Tuple:

 The data representation in storage of each instance of an entity commonly

called as record. And it is also called as tuple.

 In simple words, a row in a relation may represent a record.

 For example: Let consider below database/Relation of “Customer Details”,

where customer is an entity and cust_name, cust_id, cust_add, balance are

attributes. And

Jyoti A123, Nashik, 50000 is one record.

Relation “ Customer Details

Attributes

Tuple/

Record

In above table / relation

Entity = Customer

Attributes = cust_name, cust_id, cust_add, balance.

Domains= cust_name (jyoti, sarika, manisha), cust_id (A123, B125, C222) etc.

Record / tuple = jyoti, A123, Nashik, 50000.

Cust_name cust_id Cust_add Balance

Jyoti A123 Nashik 50000

Sarika B125 Pune 45000

Manisha C222 Nashik 30000

Instance = (Jyoti, A123, Nashik, 50000.) is one instance at a particular time period

T1. And suppose jyoti has withdraw 10000 Rupees from her account then at a time

period T2 , then the instance at time T2 will be = (Jyoti, A123, Nashik, 40000).

Summary:

 Data are binary computer representation of stored logical entities.

 A file contains data that is needed for information processing.

 We use file to save these data.

 A file is a collection of bytes stored as an individual entity.

 There are mainly two types of operations on file –

  1. Retrieval Operation

  2. Update Operation

Definition of Data Base: A database is an application that manages data and

allows fast storage and retrieval of that data

 A database consists of four elements:

1.Data item

2.Relationships

3.Constraints

4.Schema

Objective Type Questions:

  1. In Computer data are stored in ….

a. Binary Format b. Decimal format

c. Character Format d. Hexadecimal Format

  1. Which of the followings is not an operation of file?

a. Open b. Close c. Delete d. Hide

  1. Which of the followings is a false statement?

I. A database is an application that manages data and allows fast storage and

retrieval of that data.

II. Constraints are predicates that define correct database states.

III.In database, domain refers to the description of an attribute's allowed values.

IV. Schema describes the organization of data and relationship within the database.

a. I b. I and III c. I and II d. None of the above

  1. A database consists of four elements, these are…

a. Entity, attribute, domain, relation

b.Data item, Relationship, Constraint, Schema

c.Schema, domain, Attributes, Constraints

d.All of the above.

  1. Which of the followings can be defined as an Entity?

a. Person b. Bank c. Shop d. All of above

1.1 Definition to DBMS

As we studied that database can contain a huge amount of data. In short it is

storage for data. If database contain large amount of data then there should be

something to manage or handle it. As all of us know our mother manages all house

related work, similarly in computer also we need to have such software or

mechanism, which will handle all database related work and for that here we are

going to study that mechanism which manages all database related work and that

mechanism is nothing but Database Management System (DBMS).

A Database Management system (DBMS) is a collection of programs (or we

can say it as a software) that enables you to store, modify and extract information

from a database.DBMS is software designed to assist in maintaining and utilizing

large collections of data in database.

1.2 WHY DBMS?

The Question is, why should we prefer DBMS over conventional File

system…??

Below are some reasons which gives us clear idea that why to use DBMS?

 Data Base Management System stores data in database in an appropriate

manner. I.e. it actually stores the data in a table format so that it will be easy to

maintain and search the data.

 DBMS helps to control the organization, storage, management, and retrieval of

data in a database.

 The DBMS accepts requests for data from the application program and instructs

the operating system to transfer the appropriate data.

That's why DBMS can be considered as system software and also can be said as

Application Software.

Apart from these reasons DBMS provides following five services which a

conventional File system does not provide.

1. Transaction Management:

 A transaction is sequence of database operations that represents a logical unit

of work (delete record, update record, modify a set of records etc).

 DBMS manages these transactions. When the DBMS does a “commit” the

changes made by the transaction are made permanent. If you don’t want to

make changes permanent you can rollback the transaction and the database

remain in its original state. You will study this Transaction Management

process in detail in further chapter.

2. Concurrency Control:

 In a database, number of processes or actions (such as delete record, update

record, insert new record etc) may execute simultaneously i.e. concurrently. So

this simultaneous or concurrent execution of actions must be managed

somehow.

 DBMS provides concurrency control mechanism by coordinating the actions of

database manipulation processes.

3. Recovery Management:

 A simple meaning of recovery is to return to a normal/stable condition.

 Recovery mechanism in DBMS ensures that the database returned to a

consistent state after a transaction fails or abort i.e. database should not be in

inconsistent state.

 Inconsistency means that two files may contain different data of the same

entity. For example, let consider a “student Admission” database with

attributes stud_name and stud_address. And another database “student

account” with attributes stud_id and stud_address. Now if the address of a

student is changed in “student Admission” database, it must be changed in

“student account” database. Because there is a possibility that it is changed in

the “Students Admission” database and not from “student account” database. In

this case, the data of the student becomes inconsistent. But the recovery

Management service of the DBMS maintains consistency in database.

4. Security Management:

 Security mechanism of DBMS ensures that only authorized users are given

access to the data in database.

5. Language Interface:

 To handle data in database a user has to use some kind of language, which will

be understandable for database.

 And that’s why DBMS provides languages support, which is used to define and

manipulate the data in database.

 Basically there are two languages used by DBMS to interact between user and

database and those are DDL and DML.

 The data structures are created by the data definition language (DDL)

commands.

 The data manipulation is done using the data manipulation language (DML)

commands.

We will see these languages in detail in further topics.

1.3 Difference between file system and DBMS

Now let see some differences between file system and DBMS.

1 ) Representing complex relationship among data is possible in DBMS not in file

system.

  1. Recovery and back up can be done in DBMS not in file system

  2. Multiple user interfaces is allowed in DBMS

  3. The changes made in one database will be applicable to all tables which is not

possible in file system.

  1. Data redundancy is removed in DBMS.

  2. Unauthorized access in DBMS can easily be restricted and therefore data in

databases are more secure compared to data in files!!

  1. Enforcing integrity constraint in DBMS.

8 ) Provide storage structures for efficient query processing.

1.5 Applications of DBMS

Database is widely used all around the world in different sectors such as:

1. Banking : For customer information, accounts loans and banking transactions. 2. Airlines : For reservations and schedule information. 3. Universities : For maintaining student information, course registrations and

grades etc.

4. Credit card transactions : For purchases on credit cards and generation of

monthly statements.

5. Telecommunications : For keeping records of calls made, generating monthly

bills, maintaining balances on prepaid calling cards and storing information about

the communication networks.

6. Finance : For storing information about holdings, sales and purchase of financial

instruments such as stocks and bonds.

7. Sales : For maintaining customer, product and purchase information. 8. Manufacturing : For management of supply chain and for tracking production

of items in factories, inventories of items in warehouses/stores and orders for

items.

9. Human Resources : For information about employees, salaries, payroll taxes

and benefits and for generation of paychecks.

10.Web based services : For taking web users feedback, responses, resource

sharing online shopping etc.

1.6 Abstraction levels (Three level architecture)

Data Abstraction:

In our day-to-day life many times we need to hide specific information from

different persons for their convenience.

For example: suppose you are interested in purchasing a new car. At that time a car

manufacturing company hides some details from you (such as where the car is

manufactured, how many engineers made the design of this car, or who did

manufacture the car etc.), to remove the complexity in your mind. Similarly DBMS

hides certain data in the database from different users.

So the abstraction means to hide the specific data from different database users.

 Major aim of a DBMS is to provide users with an abstract view of data.

 Data abstraction hides certain details of how the data are stored & maintained

in database.

 Most DB users are not computer trained, developers hide complexity through

several levels of abstraction to simplify user’s interaction with the systems

Three levels of data abstraction are:

1 ) Physical or Internal Level (Physical Schema):

2 ) Logical or Conceptual Level (Conceptual Schema):

3 ) View or External Level (External schema):

Figure 1.1 DBMS Levels of abstraction

1) Physical or Internal Level (Physical Schema):

 This is the lowest level of abstraction which describes how data are actually

stored

 It also describes complex low-level data structures in detail

The physical schema (physical level) specifies additional storage details for

data in database. Essentially, the physical schema summarizes how the relations

described in the conceptual schema are actually stored on secondary storage

devices such as disks and tapes.

The process of arriving at a good physical schema is called physical

database design.

2) Logical or Conceptual Level (Conceptual Schema):

 Describes what data are stored in the database & what relationships exist among

those data

 Describes the entire database in terms of relatively simpler structures

The conceptual schema (sometimes called the logical schema) describes the

stored data in terms of the data model of the DBMS. In a relational DBMS, the

conceptual schema describes all relations that are stored in the database.

user interface. They perform all operations by using simple commands (menus and

buttons) provided in the user interface.

Example: The data entry operator in an office is responsible for entering

records in the database. He performs this task by using menus and buttons etc. He

does not know anything about database or DBMS. He interacts with the database

through the application program.

(4) Sophisticated Users:

Sophisticated users are the users who are familiar with the structure of database

and facilities of DBMS. Such users can use a query language such as SQL to

perform the required operations on databases. Some sophisticated users can also

write application programs.

(5) Database Administrator:

Database administrator is responsible for, managing the whole database

system. He designs creates and maintains the database. He manages the users who

can access this database, and controls integrity issues. He also monitors the

performance of the system and makes changes in the system as and when required.

1.8 DDL and DML:

DDL: Data Definition Language

 The Data Definition Language (DDL) is used to create and destroy databases

and database objects. Database administrators will primarily use these

commands during the setup and removal phases, of a database project.

 DDL statements are used to define the database structure or schema.

In below given example there are some DDL commands:

CREATE - to create objects in the database

ALTER - alters the structure of the database

DROP - delete objects from the database

TRUNCATE - remove all records from a table, including all spaces

allocated for the records are removed

COMMENT - add comments to the data dictionary

RENAME - rename an object

DML: Data Manipulation Language

 DML is used to change the data in the database tables. Instructions of DML are

well known for everyone: insert, update, delete.

 DML statements are used for managing data within schema objects i.e. Data

within database.

Some examples:

SELECT - retrieve data from the a database

INSERT - insert data into a table

UPDATE - updates existing data within a table

DELETE - deletes all records from a table, the space for the records remain

CALL - call a PL/SQL or Java subprogram.

 There are two main types of DML :

  1. Procedural DML
  2. Nonprocedural DML

Procedural DML:

A low-level or procedural DML allows the user, i.e. programmer to specify

what data is needed and how to obtain it. This type of DML typically retrieves

individual records from the database and processes each separately.

The programmers use the low-level DML.

Example of procedural DML is “Relational Algebra”.

Nonprocedural DML:

A high-level or non-procedural DML allows the user to specify what data is

required without specifying how it is to be obtained. Many DBMSs allow high-level

DML statements either to be entered interactively from a terminal or to be

embedded in a general-purpose programming language.

The end-users use a high-level DML to specify their requests to DBMS to

retrieve data. Usually a single statement is given to the DBMS to retrieve or update

multiple records. The DBMS translates a DML statement into a procedure that

manipulates the set of records. The examples of non-procedural DMLs are SQL and

QBE (Query-By-Example) that are used by relational database systems. These

languages are easier to learn and use. The part of a non-procedural DML, which is

related to data retrieval from database, is known as query language.

1.9 STRUCTURE OF DBMS

DBMS (Database Management System) acts as an interface between the user

and the database. The user requests the DBMS to perform various operations

(insert, delete, update and retrieval) on the database. The components of DBMS

perform these requested operations on the database and provide necessary data to

the users. The various components of DBMS are shown below:

The Main Functions Of Data Manager Are: –

  • Convert operations in user's Queries coming from the application programs or

combination of DML Compiler and Query optimizer which is known as Query

Processor from user's logical view to physical file system.

  • Controls DBMS information access that is stored on disk.
  • It also controls handling buffers in main memory.
  • It also enforces constraints to maintain consistency and integrity of the data.
  • It also synchronizes the simultaneous operations performed by the concurrent

users.

  • It also controls the backup and recovery operations. 4. Data Dictionary

Data Dictionary is a repository of description of data in the database. It contains

information about

  • Data - names of the tables, names of attributes of each table, length of attributes,

and number of rows in each table.

  • Relationships between database transactions and data items referenced by them,

which is useful in determining which transactions are affected when certain data

definitions are changed.

  • Constraints on data i.e. range of values permitted.
  • Detailed information on physical database design such as storage structure,

access paths, files and record sizes.

  • Access Authorization - is the Description of database users their responsibilities

and their access rights.

  • Usage statistics such as frequency of query and transactions.

Data dictionary is used to actually control the data integrity, database operation and

accuracy. It may be used as a important part of the DBMS.

Importance of Data Dictionary -

Data Dictionary is necessary in the databases due to following reasons:

  • It improves the control of DBA over the information system and user's

understanding of use of the system.

  • It helps in documenting the database design process by storing documentation of

the result of every design phase and design decisions.

  • It helps in searching the views on the database definitions of those views.
  • It provides great assistance in producing a report of which data elements (i.e. data

values) are used in all the programs.

  • It promotes data independence i.e. by addition or modifications of structures in

the database application program are not effected.

5. Data Files - It contains the data portion of the database. 6. Compiled DML - The DML complier converts the high level Queries into low

level file access commands known as compiled DML.

7. End Users - They are already discussed in previous section.

1.10 Metadata:

Metadata is loosely defined as data about data.

For example : a web page may include metadata specifying what language it's

written in, what tools were used to create it, and where to go for more on the

subject, allowing browsers to automatically improve the experience of users.

 Metadata is defined as data providing information about one or more aspects of

the data, such as:

1 ) Means of creation of the data

2 ) Purpose of the data

3 ) Time and date of creation

4 ) Creator or author of data

5 ) Placement on a computer network where the data was created

6 ) Standards used

For example , a digital image may include metadata that describes how large

the picture is, the color depth, the image resolution, when the image was

created, and other data. A text document's metadata may contain information

about how long the document is, who the author is, when the document was

written, and a short summary of the document.

 Metadata is data about data.

 As such, metadata can be stored and managed in a database, often called a

registry or repository. However, it is impossible to identify metadata just by

looking at it because a user would not know when data is metadata or just data.

S

S

u

u m

m m

m a

a r

r y

y

 DBMS is a collection of programs that enables you to store, modify and extract

information from a database.

DBMS provides below five services:

  1. Transaction Management:

  2. Concurrency Control:

  3. Recovery Management:

4 ) Security Management:

  1. Language Interface:

Applications of DBMS:

Following are the applications of DBMS

  • Banking :
  • Airlines :
  • Universities :
  • Credit card transactions :
  • Telecommunications :
  • Finance
  • Sales
  • Manufacturing
  • Human Resources
  • Web based services