






Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
Main points of this past exam are: Relational Query, Translator, Software Test Engineer, Equivalent, Changing Exactly, Entire Nodes, Fetched, Query Optimization, Selectivity, Individual Term
Typology: Exams
1 / 12
This page cannot be seen from the preview
Don't miss anything!
UNIVERSITY OF CALIFORNIA Department of EECS, Computer Science Division CS186 Hellerstein/Olston Fall 2006 Midterm Exam Midterm Exam: Introduction to Database Systems This exam has four problems and one extra credit question, worth different amounts of points each. Each problem is made up of multiple questions. You should read through the exam quickly and plan your time- management accordingly. Before beginning to answer a problem, be sure to read it carefully and to answer all parts of every problem! Good luck!
1. Relational Query Languages [20 points] Recently, Bob the Builder coded up a Relational Query Language Translator (RQLT) and passed it along to his lesser-known colleague, Ted the Software Test Engineer, for testing. Fortunately for Ted, the RQLT is simple and only operates on a single schema: Tool(tid, brand, cost) Jobsite(location, compensation, task) Toolbox(tbid, location) – location is a foreign key to Jobsite Holds(tbid, tid) – tbid is a foreign key to Toolbox, tid is a foreign key to Tool. a. (5 points) Ted sets the RQLT to translate the following SQL query to Relational Algebra: SELECT T.tid FROM Tool T, Holds H, Toolbox B, Jobsite J WHERE T.tid = H.tid AND H.tbid = B.tbid AND B.location = J.location AND J.task = ’Plumbing’ Which of the following would be an equivalent Relational Algebra query? 1. π tid ( Tool Holds Toolbox σ task = ‘Plumbing’ (Jobsite)) 2. π tid (σ task = ‘Plumbing’ ( Tool ( Holds Toolbox ) Jobsite)) 3. σ task = ‘Plumbing’ (π tid ( Tool ) Holds Toolbox Jobsite) 4. 1 and 2 5. 1, 2, and 3 6. None of the above YOUR ANSWER HERE (1a): _________________ You must write your answers on the exam. Two pages of extra answer space have been provided at the back in case you run out of space while answering. If you run out of space, be sure to make a “forward reference” to the page number where your answer continues. Do not tear pages off of your exam! Class Account: ________________________________
b. (10 points) Ted switches the RQLT to translate Relational Algebra into Relational Calculus and passes in: π tbid ( Toolbox ) – π tbid ( ( π tbid ( Toolbox ) × π brand ( Tool ) )
3. Query Optimization [32 points] Consider the following schema: Guitars (gid, brand, price) Players (pid, name, age) LastPlayed (gid, pid, date) And the following query: SELECT P.name FROM Guitars G, Played P, LastPlayed L WHERE G.gid = L.gid AND P.pid = L.pid AND P.age < 25 AND G.brand = ‘Gibson’ AND G.price > 3000; In the schema, Guitars.gid is the primary key of Guitars, Players.pid is the primary key of Players, and LastPlayed.gid and LastPlayed.pid are foreign keys referencing the primary keys of Guitars and Players (respectively). Assume that the data is evenly distributed, and that the following properties hold: Players.age ranges from 10 to 85 Guitars.brand has 10 distinct values Guitars.price ranges from 1,000 to 5, Guitars.gid has 1,000 distinct values Players.pid has 1,000 distinct values a. (5 points) Compute the selectivity for each individual term in the WHERE clause. You must fill in each blank below! G.gid = L.gid : _________________________________ P.pid = L.pid : __________________________________ P.age < 25 : __________________________________ G.brand = ‘Gibson’ : __________________________________ G.price > 3000 : __________________________________
b. (5 points) According to the System R query optimizer that we studied, circle all the following join orders that would be considered: c. (12 points) Now consider the following. There is a B+ tree index on Guitars.gid – it is unclustered and it uses Alternative 2 to represent data entries. Assume that, on average, it takes 3 I/Os to retrieve a given data entry in a leaf of the index, and that the following properties hold: Guitars 40 bytes/tuple, 100 tuples/page, 10 pages Players 80 bytes/tuple, 50 tuples/page, 20 pages Lastplayed 20 bytes/tuple, 200 tuples/page, 100 pages Assuming no buffering occurs , fill in the three blanks in the diagram below.
Guitars (B+tree on gid)
Players (file scan) L.gid = G.gid (index nested loops) LastPlayed (file scan) P.pid = L.pid I/Os for this subtree:
__ Total I/Os:
(page-oriented nested loops)
4. Query Execution [25 points] You have been hired to advise a major hot-dog vending franchise called “Dunce Dog” to tune their database server installation. They run a commercial DBMS called Tentacle version 9y. One of the startup parameters in Tentacle is called query_space; it tells Tentacle how many disk-blocks worth of RAM it should allocate to use for sorting and hash joins. Note that memory used for query_space is separate from the Tentacle buffer pool. Dunce Dog is having problems running one of their monthly reporting queries. It uses the following tables: Store(sid, location, owner) ItemsForSale(iid , name, description, cost, price) DetailedSales(receiptno, iid, sid) The query is: SELECT S.sid, S.location, SUM(I.price – I.cost) AS PROFIT FROM Stores S, ItemsForSale I, DetailedSales D WHERE S.sid = D.sid and D.iid = I.iid GROUP BY S.sid, S.location You determine that the query plan that Tentacle chooses is:
a) (5 points) Tentacle performs aggressive projection – that is, it discards any attributes that it does not need as soon as possible. For each edge in the query plan, write down the smallest list of attributes that needs to be retained (i.e. not discarded) on the corresponding dotted line. b) (5 points) Assume that the result of the scan and projection of ItemsForSale fits in 1000 pages, and the result of the scan and projection of DetailedSales fits in 10,000 pages. Approximately how many blocks of query_space should Tentacle need at minimum to perform the sort-merge join of ItemsForSale and DetailedSales in two passes? Feel free to round up or down by as many as 2 blocks in any equations that you use, but if you do so, show your equations! c) (5 points) Assume that the result of the first join fits in 2,500 pages, and the result of the scan of Stores fits in 400 pages. Approximating and showing work as in part (b), estimate the number of blocks of query_space that Tentacle should need to perform the second join in the plan in two passes.
EXTRA CREDIT: The Disk Whisperer (10 points) It’s exhausting being a disk drive, constantly seeking, scanning, transferring… Since you are a sensitive Berkeley person, you are able to talk to your disk drive and it confides in you about how tough life is. Recently, your disk drive told you that it’s really tired of doing sort-merge joins. You offered to let it do hash joins all the time, even when the optimizer used to choose sort- merge join. Your disk responded, “Who cares – same stuff, different order.” Explain why your disk drive said that.