Prepare for your exams
Get points
Guidelines and tips

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Go to the blog

CS186 Midterm 2 Exam: Introduction to Database Systems - Fall 1998, Exams of Introduction to Database Management Systems

Gujarat Ayurved University Introduction to Database Management Systems

The midterm exam for the introduction to database systems course offered at the university of california, berkeley in fall 1998. The exam consists of four problems, each with multiple questions, covering topics such as cost estimation, query processing, and query optimization. Students are expected to read the exam carefully, plan their time management, and answer all parts of each question.

Typology: Exams

2012/2013

Uploaded on 04/02/2013

shalin_p01ic 🇮🇳

4

(7)

86 documents

1 / 5

This page cannot be seen from the preview

Don't miss anything!

CS186

Fall 1998

Midterm 2

J.Hellerstein

Midterm Exam: Introduction to Database Systems

This exam has four problems, for a total of 70 points. There are also 10 points of extra credit available. Each

problem is made up of multiple questions. You should read through the exam quickly and plan your

time-management accordingly. Before beginning to answer a question, be sure to read it carefully and to

answer all parts of every question!

Cost Estimation1.

Cage/Fish & Associates hires you to help keep track of their clients and cases. Theyre currently stored as 2

relations (primary keys in bold)

Clients(name, id, birthday)

-5000 tuples, 50 bytes/tuple (2 tuples/page)

Cases(caseid, clientid, lawyer, courtdate)

-10,000 tuples, 2000 bytes/tuple, each attribute of the tupe is 500 bytes

long. (2 tuples/page)

Some additionaly notes on this schema:

A page holds 4000 bytes• caseid has values ranging from 0 to 10,000• clientid has values ranging from 50 to 5050, and is a foreign key to Clients.• Unless stated otherwise, a B+Tree index is available on each of the primary keys.•

[2 points] Consider the following two queries:a.

SELECT* FROM Clients C WHERE C.birthday=4/25/77i. SELECT* FROM Cases S WHERE S.lawyer=Georgia Thomasii.

Assume that the two queries return the same number of qualifying tuples. If youre building indexes on

the selected attributed to speed up the queries, for which is a clustered index( as opposed to

unclustered) more important, and why? Be sure to state any assumptions that you make.

[4 points] Estimate the number of tuples that would pass the selection Cases.casesid = Cases.clientid.a.

What assumptions are you making in this approximation?

[9 points] Assume that you have two indexes on Cases. The first is a clustered B+tree of height 2 on

(lawyer, caseid), containing 201 pages (note that key compression makes this possible, even though

b.

CS186

1

Partial preview of the text

Download CS186 Midterm 2 Exam: Introduction to Database Systems - Fall 1998 and more Exams Introduction to Database Management Systems in PDF only on Docsity!

CS

Fall 1998

Midterm 2

J.Hellerstein

Midterm Exam: Introduction to Database Systems

This exam has four problems, for a total of 70 points. There are also 10 points of extra credit available. Each problem is made up of multiple questions. You should read through the exam quickly and plan your time-management accordingly. Before beginning to answer a question, be sure to read it carefully and to answer all parts of every question!

1. Cost Estimation

Cage/Fish & Associates hires you to help keep track of their clients and cases. Theyre currently stored as 2 relations (primary keys in bold )

Clients(name, id , birthday)

-5000 tuples, 50 bytes/tuple (2 tuples/page)

Cases( caseid , clientid, lawyer, courtdate)

-10,000 tuples, 2000 bytes/tuple, each attribute of the tupe is 500 bytes long. (2 tuples/page)

Some additionaly notes on this schema:

A page holds 4000 bytes
caseid has values ranging from 0 to 10,
clientid has values ranging from 50 to 5050, and is a foreign key to Clients.
Unless stated otherwise, a B+Tree index is available on each of the primary keys.

a. [2 points] Consider the following two queries:

i.SELECT* FROM Clients C WHERE C.birthday=4/25/ ii.SELECT* FROM Cases S WHERE S.lawyer=Georgia Thomas

Assume that the two queries return the same number of qualifying tuples. If youre building indexes on the selected attributed to speed up the queries, for which is a clustered index( as opposed to unclustered) more important, and why? Be sure to state any assumptions that you make.

a.[4 points] Estimate the number of tuples that would pass the selection Cases.casesid = Cases.clientid.

What assumptions are you making in this approximation?

[9 points] Assume that you have two indexes on Cases. The first is a clustered B+tree of height 2 on (lawyer, caseid), containing 201 pages (note that key compression makes this possible, even though

b.

the keys themselves are 500 bytes long!). The second is an unclustered B+tree of height 2 on (caseid) containing 101 pages. Consider the query:

SELECT lawyer

FROM Cases

WHERE caseid > 5000

i.Using the clustered B+-tree, what is the cost of running this query? ii.Using the unclustered B+tree, what is the cost of running this query? What would be the cost if you used the unclustered tree but sorted the rids of the matching tuples before fetching the tuples from the heap file? Assume that the rids fit in memory

iii.

2) Query Processing

Consider a system containing three join algorithms: (I) Block Nested Loops, (ii) Sort Merge and (iii) Hash Join.

[3 points] For each of the join algorithms in the system, discuss how its performance is affected by skew in the data values of the join columns.

a.

[3 points] For each of the join algorithms in the system, discuss how its performance is affected by making a major change in the size of only one input relation.

b.

[9 points] Two relations R and S have sizes (in #pages) of M=500 and N=100, respectively. You are to perform an equijoin of R and S. Given B=12 buffers, how many I/Os would be required for each of the join algorithms in the system?

c.

3. Query Optimization

[6 points] Assume you have a simple PC database system that contains only two join algorithms: simple nested-loops join, and index nested loops join. You have a database with the following schema (primary keys in bold ):

a.

RF_Exists() that can give an accurate reduction factor for an EXISTS clause. Given this information and what you know from class, can you describe how to choose the best plan for the query?

4. SQL

A bookmaker (also known as a "bookie") is a person who takes bets on various sprting events. He determines odds on the outcome of these events, and collects and payrs off the bets made by others through him.

Bob is a bookie that doesnt really understand anything about sports or gambling; hence, he has a large customer base. His customer base is growing too large for him to handle his operation manually. Since he knows a little about databases, he decides to use a database to keep track of customer bets on sprting events,

Hoping eventually he will be able to bring his operation online.

He decides on the following schema for his initial database (primary keys are in bold , referential integrity to be enforced on all cross-table references):

Customers( c_id : integer, name: string, loc: string, favorite_limb: string)

This table contains the customers, their names, where they live, and their favorite limbs(if they cant pay their bets, Bob takes their favorite limb).

Teams( team_id : integer, name: string, league: string, home: string)

This table holds the teams Bob allows his customers to bet on. It contains name of the team, the league to which the team belongs, i.g. "NFL" or "NCAA", and the home location of the team.

Games( game_id : integer, fav_team: integer, underdog_team: integer, date:, DATE, odds: float)

This table holds the games Bob allows his customers to bet on. It contains a unique game_id, the team_id of the team from the Teams table favored to win (fav_team), the team_id of the opponent in the Teams table (underdog_team), the date of the game, the odds for the game.

Bets( c_id : integer, game_id: integer, pick: integer, betted_amount: float)

This table contains the bets made by customers from the Customers table on specific games from the Games table. Each records the customers bet of an amount (betted_amount) on a team they pick from the Teams table (pick).

The betting rules work as follows:

Customers pay Bob the betted_amount for each bet. If a customoer bets on a game, and the team they betted for wins, and that team is the favorite team, then Bob gives the customer 2 times the betted_amount.

If the customer bets on a game, and the team they betted for wins, and that team is the underdog, then Bob gives the customer betted_amount + odds*betted_amount.

If the customer bets on a game, and the team they betted for loses, Bob keeps the betted_amount.

Note

: the key of Bets is (c_id, game_id), so a customer can only bet once per game.

Assume there are no nulls in any tables of the database.

You need to help Bob figure out some of the SQL issues. Feel free to use views if they help you clarify your thinking (or Bobs)!

a.[4 points] Write the SQL DLL for the Games table. [4 points] Write an SQL query to find the names of teams that were picked to win in more than 50 bets.

b.

[5 points] Suppose that all the underdogs win. Write an SQL query to give they list of customers who lost all their bets. The list should contain customers names, locations, how much they owe, and their favorite limbs.

c.

[7 points] Write an SQL integrity constraint that assures no team in the NFL league plays more than once a day.

d.

CS186 Midterm 2 Exam: Introduction to Database Systems - Fall 1998, Exams of Introduction to Database Management Systems

Related documents

Partial preview of the text

Download CS186 Midterm 2 Exam: Introduction to Database Systems - Fall 1998 and more Exams Introduction to Database Management Systems in PDF only on Docsity!

CS

4. SQL