









Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
This document provides a great collection of various ddbms terms with examples.
Typology: Lecture notes
1 / 16
This page cannot be seen from the preview
Don't miss anything!
On special offer
AMMARAH MAQBOOL CS Department, GC Women University, Faisalabad
Introduction:
This is an advanced course of the previous that you must have previously studied and that is the “Database Management Systems”. This course enhances the concepts learnt earlier, moreover, the applications where you will be applying the concepts and the techniques learnt in this course are also more advanced and complex by nature. The Distributed Database Management Systems (DDBMS) uses the concepts of:
The key is to identify the environments in which we have to use the distributed databases. You may realize that using distributed databases in some situations may not prove to be fruitful. The implementation may develop drawbacks or may become in-efficient.
As a computer or database expert, you will always be having the basic assignment that you are given a system for which you have to design/develop a solution or a database system. You will have several options available and interesting thing is that every one of them would work. Now the challenge lies in selecting the most feasible solution. For this, the merits and demerits of every approach have to be analyzed. If this evaluation is not properly done in the beginning then it is very difficult to apply adjustments in the later stages. To be more precise, Distributed Database System (DDBS) will be one of many options available to you for any system that you will be asked to develop. It is your job to analyze whether the environment needs a DDBS solution or any other one. A wrong decision in this regard may introduce inefficiency rather than any advantage
Technological Treatment:
Different design approaches of DDBS will be implemented using the prevailing DDBMSs, like SQL Server and Oracle. That will give you the idea of how a real DDBS will look like.
Theoretical Aspects:
We will discuss the theoretical aspects related to the DDBS. The study of these issues will help you administering a DDBS on one side and on the other side it will help you in the further studies/research in the DDBS. The database management systems available today do most of the administration automatically but it is important for the database designer to know the background procedures so that the overall efficiency of the distributed database management systems may be enhanced. Recommended Books:
Database Approach:
To remove the defects from the file processing systems the database was approach was used which eliminated the interdependency of the program and the data. The changes/modifications can be brought about easily whether they were related to the programs or the data itself. Database is a shared collection of logically related data
Distributed Computing Systems:
Distributed Computing System can be defined as “A system consisting of a number of autonomous processing elements that are connected through a computer network and that cooperate in performing their assigned task”.
Three things are important here:
Distributed Computing:
We can elaborate the concept of the Distributed Computing with the following example: A Computer has different components for example RAM, Hard disk, Processor etc working together to perform a single task. So is this the example of Distributed Computing?
The answer is No!!!!!!.
This is because according to the definition there are “different computing systems” involved therefore we cannot say that the distributed activities involved in a single computer is an example of distributed computing.
The second thing is that what is being distributed? A few examples are given below:
All these things can be divided to make our system run efficiently.
Classification of Distributed Computing Systems:
Following factors are to be addressed:
1) Degree of Coupling:
Here we have to see that how closely the systems are connected. We can measure this closeness in terms of messages exchanged between two systems and the duration for which they work independently.
Note: If two systems are connected through a network the coupling may be weak however if they are sharing some hardware the coupling is strong.
2) Interconnection Structure:
We have to see how the two systems are connected to each other. The connection can be point- to-point or sharing a common channel etc.
3) Interdependence:
The interdependency doesn’t base totally on the architecture, it is also based on the task and how it is distributed.
4) Not Totally Independent:
Why Distributed Computing Systems:
Some organization structures are suitable for Distributed Computing
Distributed Computing Alerts:
Note: Multiple options are available for designing distributed systems.
The candidate applications for a DDBS have following two main characteristics: 1- Large number of users 2- Users are physically spread across large geographical area
Following are some of the Database applications that are strong candidates for a DDBS.
Banking Applications: Take the example of any Pakistani Bank. A bank has large number of customers and its branches are spread across all Pakistan (obviously, many of them have branches around the world, their candidature is even stronger). Now, in the modern banking, the customers not only access/use their accounts from within the branch rather they access data outside the branch. Like, from ATMs/branches spread across the city or country. Every time, when a user operates his account from anywhere in the country/world, his account/data is being accessed.
Air ticketing: We now have the facility to book a seat in any airline from any location to any destination. e.g. we can book return ticket from Lahore to Karachi and from Karachi to Lahore from the airline’s Lahore office. This system too, has a large number of users spread across a large area. Whenever a booking is made, the data of the flights is accessed.
Business at multiple locations: A company having offices at multiple locations, or different units at different locations, like production, warehouses, sales operating from different locations, each site storing data locally however, these units need to access each other’s data and data from all the sites is required for the global access.
A software system that permits the management of distributed database and makes the distribution transparent to the users.
Like we need a DBMS for a centralized or client-server database, we do need a DDBMS for a DDBS. A DDBMS will behave like a normal DBMS on the local site, however, the additional facility that it provides is the creation and maintenance of the global access where data across multiple sites is accessed against a single query. The approach that most of the current commercial DBMS vendors (like Oracle, SQL Server, DB2, Sybase) have adopted is that they provide different versions for different situations. If the user needs a desktop database for the single computer usage, then a smaller version is available that does not support the remote access or data distribution. For client-server database there is another version, and for the DDBS environment the Enterprise Edition of the DBMS is provided that of course supports data distribution among multiple sites, the establishing of link between these sites and finally joining/combining data from multiple sites against a single query.
A collection of Independent databases on non-networked computers. In this environment the data at multiple computers is related but these computers are not linked, so whenever data has to be accessed from multiple computers, we need to apply some means of transferring data across these computers.
Summary: In today’s lecture we have discussed the definition of the DDBS, the common applications where the DDBS can be applied and the reasons why the DDBS is recommended for these sites. This is extremely important to have a clear idea about what precisely is a DDBS if we want to implement a DDBS properly.
Distributed files:
A collection of files stored on differed computers of a network, not a DDBS; Why? This is not
enough for DDBS, as the data should be logically related.
Note: DDBS is logically related, has common structure among files, and accessed via the same interface.
Multiprocessor system: Multiple processors that share some common memory. RAM Sharing Tight coupling. HDISK Sharing Loose coupling. Systems simply connected Share Nothing. Centralized C/S System
Data management is carried on a single centralized system. However this data is accessed from different machine (clients). All machines are connected with each other through a communication link (network). This is a very common architecture. The major characteristic of this architecture is that data storage and management is mainly done on the server. As the diagram at the next page shows, the data is associated with a single site, this site is basically the Server, rest of the machines are accessing data from the Server.
.
There are two basic reasons for the DDBS Environment. To better understand these reasons, we need to see the other (than DDBS) alternative, and that is the centralized database or a client-server environment. Taking the example of our Bank database, if it is a centralized one, it means that the database is stored at a single place, lets suppose, in Pakistan they select a geographically central place, let it be Multan, then the database is stored in Multan, now users from all over Pakistan, whenever they want to use their account, the data will be accessed from the central database (in Multan). If it is a distributed environment, then the Bank can pick two, three, four or more places and each database will be storing the data of its surrounding areas. So the load now is distributed over multiple places. With this scenario in mind, lets discuss the reasons for DDBS:
Reduce telecom cost: With a centralized database, all the users from all over the country will access it remotely and therefore there would be an increased telecommunication cost. However, if it is distributed then for all users the data is closer and it is local access most of the time. So it reduces the overall telecommunication cost.
Reduce the risk of telecom failures: With a centralized database, all the working throughout the country depends on the link with the central site. If, due to any reason, link fails then working at all the places will be discontinued. So failure at a single site caused damage at all the places. On the other side, if it is distributed, then failure at a single site will disturb only the users of that site, remaining sites will be working just normal. Moreover, one form of data distribution is replication where the data is duplicated at multiple places. This particular form of data distribution, further reduces the cost of telecommunication failure. Schema contains: What has to be shown to the global user. How we are going to set data for a thing on each site. The type of the data stored on each site. How we are going to merge the data present on different sites. Note: Global users are attached to the Distributed DBMS layer.
If we adopt a DDBS as a solution for a particular application, what features we are going to get:
Transparency: A transparent system hides the implementation details from the user. There are different types of transparencies, but at this point the focus is on the distribution transparence, that means that the global user will not have any idea that the data that is being provided to him is actually coming from multiple sites, rather he will get the feeling, as if the data is coming just from the machine that he is using. It is a very useful feature, as it saves the global user from the complexity and details of the distributed access.
Data Independence Major advantage of the database approach is the data independence as the program and data are not dependent on each other i.e. we can change the program with very little or no changes made to the data and vice versa
In a 3-layer architecture the changes on lower level has little or no effect on higher level.
Logical data independence: If we change the conceptual schema there is little or no effect on the External level.
Physical data independence: If we change the physical or lower level then there is little or no effect on the conceptual level.
Network transparency: This is another form of transparency. The user is unaware of even the existence of the network that frees him from the problems and complexities of network.
Replication transparency: Replication and fragmentation are the two ways to implement a DDBS. In replication same data is stored on multiple sites example e.g. In case of a bank every branch is holding the data of every other branch. The replication increases the availability of data and reduces the risk of telecom failure. In case of replication, the DDBS hides the replication from the end user, advantage is that user simply gets the benefits of the system and does not need to know the details or to understand the technical details.
Summary In today’s lecture we continued the discussion on distributed systems. We discussed the setups that resemble a DDBS and there we studied distributed file system and multiprocessor systems. In the later type, we have share everything and share nothing systems. We then discussed a centralized C/S system that is also a very popular architecture for the databases. Then we saw different reasons to have a DDBS, the situations where it suits, we compared it with its alternative and studied why a DDBS is useful for certain type of applications. Finally, we saw what advantages we are going to have if we adopt a DDBS solution.
Fragmentation transparency: A file or a table is broken down into smaller parts/sections called fragments and those fragments are stored at different locations. The fragmentation will be discussed in detail in the later lectures. However, briefly, a table can be fragmented horizontally (row-wise) or vertically (column-wise). Hence we have two major types of fragmentations, horizontal and vertical. Different fragmentations of a table are placed at different locations. The basic objective of fragmentation and placement at different places is to maximize the local access and to reduce the remote access since the later causes cost and delay. Fragmentation transparency is that a user should not know that the database is fragmented. The concept of fragmentation should be kept hidden from the user. Note: DBA designs the architecture of fragments where as once implemented it is managed by DDBMS.
Responsibility of transparency:- Transparency is very much desirable since it hides all the technical details from the users, and makes the use/access very easy. However, providing transparency involves cost, the cost that has
Reliability of DDBS: Reliability through distributed transaction: The distributed nature in a DDBS environment reduces the chances of single point of failure and that increases the reliability of the entire system. It means that the entire system does not go down with the failure of a single system as is the case with centralized database systems. It definitely means, however, that in case of DDBS, the site that goes down, the users of that site will definitely suffer but not the entire system.
Concurrency issues: the concurrent access means the access of data by multiple users at the same time. Concurrency issues rise even in simple (client-server) databases, however these issues become more critical in case of a DDBS. Specially, in case of replication, when same data is duplicated at multiple site, then the consistency of data across multiple sites is a serious issue that needs extra care.
Performance Improvement: The DDBS provides improved performance; the major factors causing this improved performance are data localization and query parallelism.
Data Localization: One of the basic principles of data distribution in a DDBS is that the data should reside at the closest site where it is most frequently accessed. This reduces the communication cost and the delay. However, the DDBS also involves the remote accesses as well and in that case delay is unavoidable, but through maximized data localization we get overall improved performance.
Query Parallelism: This is the second major factor that is the basis of improved performance in a DDBS. Since the DDBS involves multiple systems, a query in certain situations can be executed in parallel, that improves performance. There are two types of query parallelism, that is, the inter- query and intra-query parallelism. The former means that multiple queries can be executed at the same time, whereas the later means that the same query is split across multiple sites and this split components are executed in parallel that increases the throughput.
Complicating factors There are certain aspects that complicate a DDBS environment. Following are some of those factors.
Selection of the Copy: In case of replication, the selection of the right copy is a complicating factor. That is, as the same data resides at multiple places, which particular site should be accessed against a particular query is an important factor to resolve. One simple solution is to decide on the basis of distance or load. However, the same question arises in a different situation when a particular site goes down. In this case the queries that were originally routed to this particular sites now have to be re-routed. Thus selection of the appropriate copy is an issue that needs extra attention in a DDBS environment.
Failure recovery: Likewise, in case of replication the synchronization of copies after failure has to be dealt with carefully.
Complexity: Since the data is stored at multiple sites and has to be managed the overall system is more complex as compared to a centralized database system.
Cost: A DDBS involves more cost, as the hardware and the trained manpower has to be deployed at multiple sites.
Distribution of Control: The access to the data should be allowed carefully. Rights to access data should be well defined for local sites.
The Problem Areas: The following areas in DDBS still need more work and are considered problem areas Database design: All the issues of a centralized database system are applicable in a DDBS but it introduces additional aspect related to data placement, that is, where our sites should be located
Query processing: problem arises in queries executed at multiple sites .e g. what should be done when data from one site is not collected.
Other critical issues include Concurrency Control, Operating System and Heterogeneity. These issues will be discussed in the later lectures. The diagram shows the interlink between these problem areas.
The diagram shows that the DDBS design lies at the heart of all issues. It is linked with most of the issues like Directory Management, Reliability etc. It means that overall performance of a DDBS mainly depends on the database design. If we could do it efficiently than most of the issues will be working efficiently.
Summary: This lecture continues the discussion on different forms of transparencies including fragmentation transparency. Then the issue of the responsibility for providing the transparency is discussed. Three different components may be considered as the transparency providers, however, practically all three components are used to provide different forms of transparencies and to provide the end-user a user-friendly environment to work with. After this, different issues that