Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

computer science and programming, Essays (university) of Computer Science

computer science and programming

Typology: Essays (university)

2019/2020

Uploaded on 10/06/2020

ishwar-panchariya
ishwar-panchariya 🇮🇳

1 document

1 / 6

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Q1. Explain the kind of knowledge to be mined? (10 Marks)
Answer:
Data mining is not specific to one type of media or data. Data mining should be
applicable to any kind of information repository. Algorithms and approaches
may differ when applied to different types of data.
The challenges presented by different types of data vary significantly. Data
mining is being put into use and studied for databases, including relational
databases, object-relational databases and object-oriented databases, data
warehouses, transactional databases, unstructured and semi-structured
repositories such as the World Wide Web, advanced databases such as spatial
databases, multimedia databases, time-series databases and textual databases,
and even flat files.
Here are some examples.
1. Flat files: Flat files are actually the most common data source for data mining
algorithms, especially at the research level. Flat files are simple data files in
text or binary format with a structure known by the data mining algorithm
to be applied. The data in these files can be transactions, time-series data,
scientific measurements, etc.
2. Relational Databases: A relational database consists of a set of tables
containing either values of entity attributes, or values of attributes from
entity relationships. Tables have columns and rows, where columns
represent attributes and rows represent tuples. A tuple in a relational table
corresponds to either an object or a relationship between objects and is
identified by a set of attribute values representing a unique key. Some
relations Customer, Items, and Borrow represents the business activity.
These relations are a subset of what could be a database for the video store.
3. Data Warehouses: A data warehouse as a storehouse is a repository of data
collected from multiple data sources is to be used as a whole under the same
schema. A data warehouse gives the option to analyze data from different
sources under the same roof.
pf3
pf4
pf5

Partial preview of the text

Download computer science and programming and more Essays (university) Computer Science in PDF only on Docsity!

Q1. Explain the kind of knowledge to be mined? (10 Marks) Answer:  Data mining is not specific to one type of media or data. Data mining should be applicable to any kind of information repository. Algorithms and approaches may differ when applied to different types of data.  The challenges presented by different types of data vary significantly. Data mining is being put into use and studied for databases, including relational databases, object-relational databases and object-oriented databases, data warehouses, transactional databases, unstructured and semi-structured repositories such as the World Wide Web, advanced databases such as spatial databases, multimedia databases, time-series databases and textual databases, and even flat files.  Here are some examples.

  1. Flat files: Flat files are actually the most common data source for data mining algorithms, especially at the research level. Flat files are simple data files in text or binary format with a structure known by the data mining algorithm to be applied. The data in these files can be transactions, time-series data, scientific measurements, etc.
  2. Relational Databases: A relational database consists of a set of tables containing either values of entity attributes, or values of attributes from entity relationships. Tables have columns and rows, where columns represent attributes and rows represent tuples. A tuple in a relational table corresponds to either an object or a relationship between objects and is identified by a set of attribute values representing a unique key. Some relations Customer, Items, and Borrow represents the business activity. These relations are a subset of what could be a database for the video store.
  3. Data Warehouses: A data warehouse as a storehouse is a repository of data collected from multiple data sources is to be used as a whole under the same schema. A data warehouse gives the option to analyze data from different sources under the same roof.
  1. Transaction Databases: A transaction database is a set of records representing transactions, each with a time stamp, an identifier and a set of items. Associated with the transaction files could also be descriptive data for the items. For example, in the case of the video store, the rentals table, represents the transaction database. Each record is a rental contract with a customer identifier, a date, and the list of items rented (i.e. video tapes, games, VCR, etc.). Since relational databases do not allow nested tables (i.e. a set as attribute value), transactions are usually stored in flat files or stored in two normalized transaction tables, one for the transactions and one for the transaction items. One typical data mining analysis on such data is the market basket analysis or association rules in which associations between items occurring together or in sequence are studied.
  2. Multimedia Databases: Multimedia databases include video, images, audio and text media. They can be stored on extended object-relational or object- oriented databases, or simply on a file system. Multimedia is characterized by its high dimensionality, which makes data mining even more challenging. Data mining from multimedia repositories may require computer vision, computer graphics, image interpretation, and natural language processing methodologies.
  3. Spatial Databases: Spatial databases are databases that, in addition to usual data, store geographical information like maps, and global or regional positioning. Such spatial databases present new challenges to data mining algorithms.
  4. Time-Series Databases: Time-series databases contain time related data such stock market data or logged activities. These databases usually have a continuous flow of new data coming in, which sometimes causes the need for a challenging real time analysis. Data mining in such databases commonly includes the study of trends and correlations between evolutions of different variables, as well as the prediction of trends and movements of the variables in time.

For example, a user may define big spenders as customers who purchase items that cost $100 or more on an average; and budget spenders as customers who purchase items at less than $100 on an average. The mining of discriminant descriptions for customers from each of these categories can be specified in the DMQL as : mine comparison as purchaseGroups for bigSpenders where avg(I.price) ≥$ versus budgetSpenders where avg(I.price)< $ analyze count

  1. Association The syntax for association is : mine associations [ as {pattern_name} ] { matching {metapattern} } For Example mine associations as buyingHabits matching P(X:customer,W) ^ Q(X,Y) ≥ buys(X,Z) where X is key of customer relation; P and Q are predicate variables; and W, Y, and Z are object variables.
  2. Classification The syntax for classification is : mine classification [as pattern_name] analyze classifying_attribute_or_dimension

For example, To mine patterns, classifying customer credit rating where the classes are determined by the attribute credit_rating, and mine classification is determined as classify Customer Credit Rating analyze credit_rating

  1. Prediction The syntax for prediction is − mine prediction [as pattern_name] analyze prediction_attribute_or_dimension {set {attribute_or_dimension_i= value_i}} Q3. Explain the syntax for specifying the kind of knowledge to be mined? (10 Marks) Answer: A data mining task can be specified in the form of a data mining query, which is input to the data mining system. A data mining query is defined in terms of data mining task primitives. These primitives allow the user to interactively communicate with the data mining system during discovery in order to direct the mining process, or examine the findings from different angles or depths. The data mining primitives specify the following:
  2. Task-relevant Data: This specifies the portions of the database or the set of data in which the user is interested. This includes the database attributes or data warehouse dimensions of interest referred to as the relevant attributes or dimensions.
  3. The kind of knowledge to be mined: This specifies the data mining functions to be performed, such as characterization, discrimination, association or correlation analysis, classification, prediction, clustering, outlier analysis, or evolution analysis.