




























































































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
An overview of data, files, and databases. It explains the concept of data and files, and how files are used to store data. It also introduces the concept of databases and their components. basic terms related to databases such as entity, attribute, domain, and instance. It also explains the importance of maintaining large data in a database rather than a conventional file system.
Typology: Lecture notes
1 / 135
This page cannot be seen from the preview
Don't miss anything!
Topics To Learn
0.0 Overview
1 Introduction
2 Operations on file
3 Introduction to Database
0.0 Overview
In this chapter we will study the concept of data, Files and then different
operations that we can perform on the file. In the section 0.1 you will study Data
and Files. Data is a collection of facts, such as values or measurements. It can be
numbers, words, measurements, observations or even just descriptions of things.
After understanding data we will study the concept of files. File is a container for
data. We use files to keep our data safe and secured. For example, in office we
keep all office information documents (office data) in different files. And each file
has given a particular name. Similarly in computer we use files to keep our data.
We will see these terms in section 0.2.
After understanding data and files we will study Database. Database is also a
type of data storage like file but it has some higher facilities and features than
a regular file which help us to manage the data easily. As in the market we are
having different types of file folders to keep our paper documents similarly the
database is also available in different structures and models to provide ease of
retrieving the data.
0.1 Introduction
Data: In general we define data as; it is information in raw or unorganized form
(such as alphabets, numbers or symbols) that represents conditions, ideas, or
objects.
In short, Data is a collection of facts, Figures and statistics related to an
object. Data can be processed to create useful information.
Hence we can say manipulated and processed form of data is called information,
which is more meaningful than data. Data is used as input for processing and
information is the output of this processing, as shown below.
Processing
Data Information
Mumbai 90
For example: Suppose you are going to visit in a shop to purchase some items. So
before you go, you make a list of items you want to purchase. This item list
contains name and quantity of items (i.e. data about items) you are purchasing.
So here name and price of items is nothing but the data about that item.
Data can exist in a variety of forms such as numbers, text or symbols on pieces of
paper, as bits and bytes stored in electronic memory, or as facts stored in a person's
mind.
In terms of computer data are symbols or signals that are input to the computer,
stored in computer, and processed by a computer for output as usable
information.
We can say for example:
011000110011… can be represented as binary data in computer.
Data is often distinguished from programs in computer. We define a program,
as a set of instructions that perform a particular task for computer.
For example: we write C’ Program for calculating sum of two numbers. This
program contains set of C’ language’s instructions. Hence we cannot say those
instructions as data. Yes but whatever numbers we are providing to that program to
calculate sum are ‘data’.
In this sense, data is everything that is not program code (instructions of
programs).
Hence from above discussion we can differentiate Data, Instruction and Programs
as-
Data: raw information.
Information: processed data.
Programs: set of instructions
Files: We use file to save the data. In general we keep our documents (papers on
which something is written) in a folder file where they can be safe and secured.
For example after making a list of your purchasing items you may keep that list in
your pocket or in envelope where it can be safe. With same manner, we store
computer data in file (i.e. file of computer) where that data will be safe and secure.
E.g.. Name and Quantity
of items on item list are
data about items to be
purchased.
We will see this term in more detail in further chapters.
Now let us continue with Update Operation….
Update operation on the other hand; change the file by modifying the
records, deleting the records and inserting new records. That is we actually modify
the contents in file in update operation.
The operations for locating, modifying, deleting and inserting records will
vary from system to system, but there are most some operations that are used in
most systems and those are given below:
Find (Locate): The goal of this operation is to locate the record or records that
satisfy the search criteria. A block that contains the records is transferred to the
main memory and the records are searched. The first record that matches the
search criteria is located and the search continues for other records until the end
of file is reached.
Read: In read operation the contents of the records are copied from memory to
a program variable or work area. This command in some cases advances the
pointer to next record in the result set.
Read Next: Searches the next record that matches the selection condition and
when found, the contents of record are copied to the program variable or work
area.
Modify: Also known as update. This command modifies the field values / data
of the current record then writes modified record back to the disk.
Insert: Inserts a new record to the file.
Delete: Deletes the current record and updates the file on the disk to reflect the
deletion.
0.3 Introduction to Database
Before we attempt to see what Data Base Management System is, we will
see what is Database and what it contains.
0.3.1 Definition of Data Base:
We use different files such as word file or Excel File to store text type of data
or bmp files to store image type of data in computer. Similarly DBMS provides
us another type of file known as database file to store our data. The difference
between those ordinary text files and database file is that the database file (or
simply, “Database”) is more intelligent than those text files.
A database is an application that manages data and allows fast storage and
retrieval of that data. A database can organize, store, and retrieve large amount
of data easily.
And a DBMS (database management system) is a piece of software that is
designed to make the different database operations easier.
By storing data in a database, rather than in files, we can use the features of
database to manage the data in a robust and efficient manner.
For example: let consider a college contains a large collection of data (say, 30
GB) of student’s information such as student’s name, student’s class, student’s
division, student’s Roll no., marks, fees and so on. Several users access this
data concurrently. Hence any questions that user has asked about the data must
be answered quickly, changes made to the data by different users must be
applied consistently, and access to certain parts of the data (E.g.., fees) must be
restricted.
To maintain such large data in conventional file system is a time consuming
and more critical job and we probably do not have 30 GB of main memory to
hold all the data.
Hence we can deal with this data management problem by storing the data in a
database.
There are different types of database but the most popular is a relational
database that stores data in tables where each row in the table holds the same
sort of information. In the early 1970s, Ted Codd, an IBM researcher devised
12 laws of normalization. These apply to how the data is stored and relations
between different tables that we will see in further chapters.
0.3.2 Components of Database:
A database consists of four elements as shown in Figure(0.1) below.
Figure 0.1 Components of Database
Data Item
Each attribute of an entity is represented in storage by a data item.
For example: Suppose there is a database of “customer account”. And this
database contains attributes Cust_name, Cust_id, Cust_add, Balance.
Hence there is a data item for Cust_name, another data item for Cust_id etc.
A data item is assigned a name in order to refer it in storage, retrieval and
processing.
Relationships represent a correspondence between the various data elements /
entities. In short it represents relations among the different entities.
Constraints are predicates that define correct database states.
In other words we can say that the Constraints within a database are rules,
which control values allowed in columns and also enforce the integrity between
columns and tables.
For example, consider a column constraint (i.e. constraint for column) “CHECK”.
The “CHECK” constraint for column will limit the values allowed for that
column.
etc. so these are the domains for the attributes sud_marks and stud_division
respectively.
Instance:
An instance of the entity represented by a set of specific values for each of the
attributes at particular time period.
For example; suppose there is a furniture shop where we are having variety of
furniture so here furniture is the entity and the attributes of furniture entity
could be Furniture_type, item_color, item_price etc.
The attributes could be same for all kind of furniture in the shop. But the values
of the attributes in each instance are different.
Thus we have chair, black, 2000 Rupees, as one instance and bed, brown,
10000 Rupees as another instance.
These two cases represent the attributes of two instances of the entity
“furniture”.
Record/ Tuple:
The data representation in storage of each instance of an entity commonly
called as record. And it is also called as tuple.
In simple words, a row in a relation may represent a record.
For example: Let consider below database/Relation of “Customer Details”,
where customer is an entity and cust_name, cust_id, cust_add, balance are
attributes. And
Jyoti A123, Nashik, 50000 is one record.
Relation “ Customer Details ”
Attributes
Tuple/
Record
In above table / relation
Entity = Customer
Attributes = cust_name, cust_id, cust_add, balance.
Domains= cust_name (jyoti, sarika, manisha), cust_id (A123, B125, C222) etc.
Record / tuple = jyoti, A123, Nashik, 50000.
Cust_name cust_id Cust_add Balance
Jyoti A123 Nashik 50000
Sarika B125 Pune 45000
Manisha C222 Nashik 30000
Instance = (Jyoti, A123, Nashik, 50000.) is one instance at a particular time period
T1. And suppose jyoti has withdraw 10000 Rupees from her account then at a time
period T2 , then the instance at time T2 will be = (Jyoti, A123, Nashik, 40000).
Summary:
Data are binary computer representation of stored logical entities.
A file contains data that is needed for information processing.
We use file to save these data.
A file is a collection of bytes stored as an individual entity.
There are mainly two types of operations on file –
Retrieval Operation
Update Operation
Definition of Data Base: A database is an application that manages data and
allows fast storage and retrieval of that data
A database consists of four elements:
1.Data item
2.Relationships
3.Constraints
4.Schema
Objective Type Questions:
a. Binary Format b. Decimal format
c. Character Format d. Hexadecimal Format
a. Open b. Close c. Delete d. Hide
I. A database is an application that manages data and allows fast storage and
retrieval of that data.
II. Constraints are predicates that define correct database states.
III.In database, domain refers to the description of an attribute's allowed values.
IV. Schema describes the organization of data and relationship within the database.
a. I b. I and III c. I and II d. None of the above
a. Entity, attribute, domain, relation
b.Data item, Relationship, Constraint, Schema
c.Schema, domain, Attributes, Constraints
d.All of the above.
a. Person b. Bank c. Shop d. All of above
1.1 Definition to DBMS
As we studied that database can contain a huge amount of data. In short it is
storage for data. If database contain large amount of data then there should be
something to manage or handle it. As all of us know our mother manages all house
related work, similarly in computer also we need to have such software or
mechanism, which will handle all database related work and for that here we are
going to study that mechanism which manages all database related work and that
mechanism is nothing but Database Management System (DBMS).
A Database Management system (DBMS) is a collection of programs (or we
can say it as a software) that enables you to store, modify and extract information
from a database.DBMS is software designed to assist in maintaining and utilizing
large collections of data in database.
The Question is, why should we prefer DBMS over conventional File
system…??
Below are some reasons which gives us clear idea that why to use DBMS?
Data Base Management System stores data in database in an appropriate
manner. I.e. it actually stores the data in a table format so that it will be easy to
maintain and search the data.
DBMS helps to control the organization, storage, management, and retrieval of
data in a database.
The DBMS accepts requests for data from the application program and instructs
the operating system to transfer the appropriate data.
That's why DBMS can be considered as system software and also can be said as
Application Software.
Apart from these reasons DBMS provides following five services which a
conventional File system does not provide.
1. Transaction Management:
A transaction is sequence of database operations that represents a logical unit
of work (delete record, update record, modify a set of records etc).
DBMS manages these transactions. When the DBMS does a “commit” the
changes made by the transaction are made permanent. If you don’t want to
make changes permanent you can rollback the transaction and the database
remain in its original state. You will study this Transaction Management
process in detail in further chapter.
2. Concurrency Control:
In a database, number of processes or actions (such as delete record, update
record, insert new record etc) may execute simultaneously i.e. concurrently. So
this simultaneous or concurrent execution of actions must be managed
somehow.
DBMS provides concurrency control mechanism by coordinating the actions of
database manipulation processes.
3. Recovery Management:
A simple meaning of recovery is to return to a normal/stable condition.
Recovery mechanism in DBMS ensures that the database returned to a
consistent state after a transaction fails or abort i.e. database should not be in
inconsistent state.
Inconsistency means that two files may contain different data of the same
entity. For example, let consider a “student Admission” database with
attributes stud_name and stud_address. And another database “student
account” with attributes stud_id and stud_address. Now if the address of a
student is changed in “student Admission” database, it must be changed in
“student account” database. Because there is a possibility that it is changed in
the “Students Admission” database and not from “student account” database. In
this case, the data of the student becomes inconsistent. But the recovery
Management service of the DBMS maintains consistency in database.
4. Security Management:
Security mechanism of DBMS ensures that only authorized users are given
access to the data in database.
5. Language Interface:
To handle data in database a user has to use some kind of language, which will
be understandable for database.
And that’s why DBMS provides languages support, which is used to define and
manipulate the data in database.
Basically there are two languages used by DBMS to interact between user and
database and those are DDL and DML.
The data structures are created by the data definition language (DDL)
commands.
The data manipulation is done using the data manipulation language (DML)
commands.
We will see these languages in detail in further topics.
1.3 Difference between file system and DBMS
Now let see some differences between file system and DBMS.
1 ) Representing complex relationship among data is possible in DBMS not in file
system.
Recovery and back up can be done in DBMS not in file system
Multiple user interfaces is allowed in DBMS
The changes made in one database will be applicable to all tables which is not
possible in file system.
Data redundancy is removed in DBMS.
Unauthorized access in DBMS can easily be restricted and therefore data in
databases are more secure compared to data in files!!
8 ) Provide storage structures for efficient query processing.
1.5 Applications of DBMS
Database is widely used all around the world in different sectors such as:
1. Banking : For customer information, accounts loans and banking transactions. 2. Airlines : For reservations and schedule information. 3. Universities : For maintaining student information, course registrations and
grades etc.
4. Credit card transactions : For purchases on credit cards and generation of
monthly statements.
5. Telecommunications : For keeping records of calls made, generating monthly
bills, maintaining balances on prepaid calling cards and storing information about
the communication networks.
6. Finance : For storing information about holdings, sales and purchase of financial
instruments such as stocks and bonds.
7. Sales : For maintaining customer, product and purchase information. 8. Manufacturing : For management of supply chain and for tracking production
of items in factories, inventories of items in warehouses/stores and orders for
items.
9. Human Resources : For information about employees, salaries, payroll taxes
and benefits and for generation of paychecks.
10.Web based services : For taking web users feedback, responses, resource
sharing online shopping etc.
1.6 Abstraction levels (Three level architecture)
Data Abstraction:
In our day-to-day life many times we need to hide specific information from
different persons for their convenience.
For example: suppose you are interested in purchasing a new car. At that time a car
manufacturing company hides some details from you (such as where the car is
manufactured, how many engineers made the design of this car, or who did
manufacture the car etc.), to remove the complexity in your mind. Similarly DBMS
hides certain data in the database from different users.
So the abstraction means to hide the specific data from different database users.
Major aim of a DBMS is to provide users with an abstract view of data.
Data abstraction hides certain details of how the data are stored & maintained
in database.
Most DB users are not computer trained, developers hide complexity through
several levels of abstraction to simplify user’s interaction with the systems
Three levels of data abstraction are:
1 ) Physical or Internal Level (Physical Schema):
2 ) Logical or Conceptual Level (Conceptual Schema):
3 ) View or External Level (External schema):
Figure 1.1 DBMS Levels of abstraction
1) Physical or Internal Level (Physical Schema):
This is the lowest level of abstraction which describes how data are actually
stored
It also describes complex low-level data structures in detail
The physical schema (physical level) specifies additional storage details for
data in database. Essentially, the physical schema summarizes how the relations
described in the conceptual schema are actually stored on secondary storage
devices such as disks and tapes.
The process of arriving at a good physical schema is called physical
database design.
2) Logical or Conceptual Level (Conceptual Schema):
Describes what data are stored in the database & what relationships exist among
those data
Describes the entire database in terms of relatively simpler structures
The conceptual schema (sometimes called the logical schema) describes the
stored data in terms of the data model of the DBMS. In a relational DBMS, the
conceptual schema describes all relations that are stored in the database.
user interface. They perform all operations by using simple commands (menus and
buttons) provided in the user interface.
Example: The data entry operator in an office is responsible for entering
records in the database. He performs this task by using menus and buttons etc. He
does not know anything about database or DBMS. He interacts with the database
through the application program.
(4) Sophisticated Users:
Sophisticated users are the users who are familiar with the structure of database
and facilities of DBMS. Such users can use a query language such as SQL to
perform the required operations on databases. Some sophisticated users can also
write application programs.
(5) Database Administrator:
Database administrator is responsible for, managing the whole database
system. He designs creates and maintains the database. He manages the users who
can access this database, and controls integrity issues. He also monitors the
performance of the system and makes changes in the system as and when required.
1.8 DDL and DML:
DDL: Data Definition Language
The Data Definition Language (DDL) is used to create and destroy databases
and database objects. Database administrators will primarily use these
commands during the setup and removal phases, of a database project.
DDL statements are used to define the database structure or schema.
In below given example there are some DDL commands:
CREATE - to create objects in the database
ALTER - alters the structure of the database
DROP - delete objects from the database
TRUNCATE - remove all records from a table, including all spaces
allocated for the records are removed
COMMENT - add comments to the data dictionary
RENAME - rename an object
DML: Data Manipulation Language
DML is used to change the data in the database tables. Instructions of DML are
well known for everyone: insert, update, delete.
DML statements are used for managing data within schema objects i.e. Data
within database.
Some examples:
SELECT - retrieve data from the a database
INSERT - insert data into a table
UPDATE - updates existing data within a table
DELETE - deletes all records from a table, the space for the records remain
CALL - call a PL/SQL or Java subprogram.
There are two main types of DML :
Procedural DML:
A low-level or procedural DML allows the user, i.e. programmer to specify
what data is needed and how to obtain it. This type of DML typically retrieves
individual records from the database and processes each separately.
The programmers use the low-level DML.
Example of procedural DML is “Relational Algebra”.
Nonprocedural DML:
A high-level or non-procedural DML allows the user to specify what data is
required without specifying how it is to be obtained. Many DBMSs allow high-level
DML statements either to be entered interactively from a terminal or to be
embedded in a general-purpose programming language.
The end-users use a high-level DML to specify their requests to DBMS to
retrieve data. Usually a single statement is given to the DBMS to retrieve or update
multiple records. The DBMS translates a DML statement into a procedure that
manipulates the set of records. The examples of non-procedural DMLs are SQL and
QBE (Query-By-Example) that are used by relational database systems. These
languages are easier to learn and use. The part of a non-procedural DML, which is
related to data retrieval from database, is known as query language.
DBMS (Database Management System) acts as an interface between the user
and the database. The user requests the DBMS to perform various operations
(insert, delete, update and retrieval) on the database. The components of DBMS
perform these requested operations on the database and provide necessary data to
the users. The various components of DBMS are shown below:
The Main Functions Of Data Manager Are: –
combination of DML Compiler and Query optimizer which is known as Query
Processor from user's logical view to physical file system.
users.
Data Dictionary is a repository of description of data in the database. It contains
information about
and number of rows in each table.
which is useful in determining which transactions are affected when certain data
definitions are changed.
access paths, files and record sizes.
and their access rights.
Data dictionary is used to actually control the data integrity, database operation and
accuracy. It may be used as a important part of the DBMS.
Importance of Data Dictionary -
Data Dictionary is necessary in the databases due to following reasons:
understanding of use of the system.
the result of every design phase and design decisions.
values) are used in all the programs.
the database application program are not effected.
5. Data Files - It contains the data portion of the database. 6. Compiled DML - The DML complier converts the high level Queries into low
level file access commands known as compiled DML.
7. End Users - They are already discussed in previous section.
1.10 Metadata:
Metadata is loosely defined as data about data.
For example : a web page may include metadata specifying what language it's
written in, what tools were used to create it, and where to go for more on the
subject, allowing browsers to automatically improve the experience of users.
Metadata is defined as data providing information about one or more aspects of
the data, such as:
1 ) Means of creation of the data
2 ) Purpose of the data
3 ) Time and date of creation
4 ) Creator or author of data
5 ) Placement on a computer network where the data was created
6 ) Standards used
For example , a digital image may include metadata that describes how large
the picture is, the color depth, the image resolution, when the image was
created, and other data. A text document's metadata may contain information
about how long the document is, who the author is, when the document was
written, and a short summary of the document.
Metadata is data about data.
As such, metadata can be stored and managed in a database, often called a
registry or repository. However, it is impossible to identify metadata just by
looking at it because a user would not know when data is metadata or just data.
u
u m
m m
m a
a r
r y
y
DBMS is a collection of programs that enables you to store, modify and extract
information from a database.
DBMS provides below five services:
Transaction Management:
Concurrency Control:
Recovery Management:
4 ) Security Management:
Applications of DBMS:
Following are the applications of DBMS