













Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
This is a complete assignment related to data mining and data warehouse
Typology: Assignments
1 / 21
This page cannot be seen from the preview
Don't miss anything!
According to the definition of Bill Inmon, ”Data warehouse is a subject- oriented, Integrated, Non-volatile and Time variant collection of data in support of management’s decision.” i. Subject-Oriented Data: In data warehouse, data is stored by subjects, not by application. ii. Integrated data: Data in data warehouse comes from several operational system like remove inconsistencies, transformation, integration of source data. iii. Time-Variant: It means data warehouse has to contain historical data, not just current value. iv. Non-volatile Data: Data is not updated/deleted from data warehouse in real time. Components of data warehouse:
Q. Define multi-dimensional data model. Explain different types of schemas used in data warehouse with example. (10 marks) Dimensional Modelling is a design technique used in data warehousing where data is organized into fact and dimension tables to optimize query performance, ease of use, and to support business processes. It often employs star or snowflake schema designs in database architecture. A multi-dimensional data model is a representation of data that allows for viewing and analysing information from multiple perspectives or dimensions simultaneously. Different types of schemas used in data warehouse: Schema is a logical description of the entire database. It includes the name and description of records of all record types including all associated data-items and aggregates.
Q. Define data mining. Explain data mining issues and applications. (10 marks) Data mining is the process of analyzing large datasets to discover patterns, trends, and relationships that can provide valuable insights or predictions. Issues in data mining:
Q. What is data mining. Explain data mining tasks/functionalities.(10 marks)
Q. Explain multidimensional data model with example. The multi-Dimensional Data Model is a method which is used for ordering data in the database along with good arrangement and assembling of the contents in the database. Here are main concepts of multidimensional data model:
Q. What is data mining. Explain various techniques of data mining.( 10 marks) Data Mining is the process of discovering patterns, correlations, trends, and useful information from large sets of data using techniques from fields such as statistics, machine learning, and database systems. Various techniques used for data mining:
Q. Explain Apriori algorithm with example.(10 marks)
1. Purpose: The Apriori algorithm is used to find frequent itemsets in a dataset and then derive association rules from them. 2. Principle: The algorithm is based on the idea that a subset of a frequent itemset must also be frequent. 3. Initialization: Start by counting the occurrence (support) of each item in the dataset and collecting all items that meet a minimum support threshold. 4. Iteration: For each subsequent step, generate new candidate itemsets by combining the previous frequent itemsets. 5. Pruning(Trimning): Before determining the frequency of these new candidate itemsets, prune those with any subset that is not frequent, using the Apriori property. 6. Counting: Count the support of the remaining candidate itemsets. 7. Threshold Check: Keep those itemsets that meet the minimum support threshold. 8. Repetition: Repeat steps 4-7 until no more frequent itemsets can be generated. 9. Association Rules: From the list of frequent itemsets, derive association rules that have a confidence level above a given threshold. 10. Example: Suppose in a supermarket dataset, the bread (B) is bought 100 times, and bread and butter (B&B) are bought together 50 times. The support for B is 100/total transactions and for B&B is 50/total transactions. If our threshold is 0.01 (1% of total transactions) and both support values are above this, they are considered frequent. The confidence of the rule B → B&B is 50/100 = 0.5 or 50%. If this is above our confidence threshold, the association rule is accepted.
Q. Classification by decision tree. Classification by Decision Tree :
Q. Explain Bayesian classification with example. Example : Situation: You receive an email in your inbox. Goal: Determine whether the email is spam or not using Bayesian classification. Prior Knowledge:
Q. Clustering methods.
Q. Hierarchical clustering.