Prepare for your exams
Get points
Guidelines and tips

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Go to the blog

computer science and programming, Essays (university) of Computer Science

Sant Baba Bhag Singh University Computer Science

computer science and programming

Typology: Essays (university)

2019/2020

Uploaded on 10/06/2020

ishwar-panchariya 🇮🇳

1 document

1 / 6

This page cannot be seen from the preview

Don't miss anything!

Q1. Explain the kind of knowledge to be mined? (10 Marks)

Answer:

 Data mining is not specific to one type of media or data. Data mining should be

applicable to any kind of information repository. Algorithms and approaches

may differ when applied to different types of data.

 The challenges presented by different types of data vary significantly. Data

mining is being put into use and studied for databases, including relational

databases, object-relational databases and object-oriented databases, data

warehouses, transactional databases, unstructured and semi-structured

repositories such as the World Wide Web, advanced databases such as spatial

databases, multimedia databases, time-series databases and textual databases,

and even flat files.

 Here are some examples.

1. Flat files: Flat files are actually the most common data source for data mining

algorithms, especially at the research level. Flat files are simple data files in

text or binary format with a structure known by the data mining algorithm

to be applied. The data in these files can be transactions, time-series data,

scientific measurements, etc.

2. Relational Databases: A relational database consists of a set of tables

containing either values of entity attributes, or values of attributes from

entity relationships. Tables have columns and rows, where columns

represent attributes and rows represent tuples. A tuple in a relational table

corresponds to either an object or a relationship between objects and is

identified by a set of attribute values representing a unique key. Some

relations Customer, Items, and Borrow represents the business activity.

These relations are a subset of what could be a database for the video store.

3. Data Warehouses: A data warehouse as a storehouse is a repository of data

collected from multiple data sources is to be used as a whole under the same

schema. A data warehouse gives the option to analyze data from different

sources under the same roof.

Partial preview of the text

Download computer science and programming and more Essays (university) Computer Science in PDF only on Docsity!

Q1. Explain the kind of knowledge to be mined? (10 Marks) Answer:  Data mining is not specific to one type of media or data. Data mining should be applicable to any kind of information repository. Algorithms and approaches may differ when applied to different types of data.  The challenges presented by different types of data vary significantly. Data mining is being put into use and studied for databases, including relational databases, object-relational databases and object-oriented databases, data warehouses, transactional databases, unstructured and semi-structured repositories such as the World Wide Web, advanced databases such as spatial databases, multimedia databases, time-series databases and textual databases, and even flat files.  Here are some examples.

Flat files: Flat files are actually the most common data source for data mining algorithms, especially at the research level. Flat files are simple data files in text or binary format with a structure known by the data mining algorithm to be applied. The data in these files can be transactions, time-series data, scientific measurements, etc.
Relational Databases: A relational database consists of a set of tables containing either values of entity attributes, or values of attributes from entity relationships. Tables have columns and rows, where columns represent attributes and rows represent tuples. A tuple in a relational table corresponds to either an object or a relationship between objects and is identified by a set of attribute values representing a unique key. Some relations Customer, Items, and Borrow represents the business activity. These relations are a subset of what could be a database for the video store.
Data Warehouses: A data warehouse as a storehouse is a repository of data collected from multiple data sources is to be used as a whole under the same schema. A data warehouse gives the option to analyze data from different sources under the same roof.

Transaction Databases: A transaction database is a set of records representing transactions, each with a time stamp, an identifier and a set of items. Associated with the transaction files could also be descriptive data for the items. For example, in the case of the video store, the rentals table, represents the transaction database. Each record is a rental contract with a customer identifier, a date, and the list of items rented (i.e. video tapes, games, VCR, etc.). Since relational databases do not allow nested tables (i.e. a set as attribute value), transactions are usually stored in flat files or stored in two normalized transaction tables, one for the transactions and one for the transaction items. One typical data mining analysis on such data is the market basket analysis or association rules in which associations between items occurring together or in sequence are studied.
Multimedia Databases: Multimedia databases include video, images, audio and text media. They can be stored on extended object-relational or object- oriented databases, or simply on a file system. Multimedia is characterized by its high dimensionality, which makes data mining even more challenging. Data mining from multimedia repositories may require computer vision, computer graphics, image interpretation, and natural language processing methodologies.
Spatial Databases: Spatial databases are databases that, in addition to usual data, store geographical information like maps, and global or regional positioning. Such spatial databases present new challenges to data mining algorithms.
Time-Series Databases: Time-series databases contain time related data such stock market data or logged activities. These databases usually have a continuous flow of new data coming in, which sometimes causes the need for a challenging real time analysis. Data mining in such databases commonly includes the study of trends and correlations between evolutions of different variables, as well as the prediction of trends and movements of the variables in time.

For example, a user may define big spenders as customers who purchase items that cost $100 or more on an average; and budget spenders as customers who purchase items at less than $100 on an average. The mining of discriminant descriptions for customers from each of these categories can be specified in the DMQL as : mine comparison as purchaseGroups for bigSpenders where avg(I.price) ≥$ versus budgetSpenders where avg(I.price)< $ analyze count

Association The syntax for association is : mine associations [ as {pattern_name} ] { matching {metapattern} } For Example mine associations as buyingHabits matching P(X:customer,W) ^ Q(X,Y) ≥ buys(X,Z) where X is key of customer relation; P and Q are predicate variables; and W, Y, and Z are object variables.
Classification The syntax for classification is : mine classification [as pattern_name] analyze classifying_attribute_or_dimension

For example, To mine patterns, classifying customer credit rating where the classes are determined by the attribute credit_rating, and mine classification is determined as classify Customer Credit Rating analyze credit_rating

Prediction The syntax for prediction is − mine prediction [as pattern_name] analyze prediction_attribute_or_dimension {set {attribute_or_dimension_i= value_i}} Q3. Explain the syntax for specifying the kind of knowledge to be mined? (10 Marks) Answer: A data mining task can be specified in the form of a data mining query, which is input to the data mining system. A data mining query is defined in terms of data mining task primitives. These primitives allow the user to interactively communicate with the data mining system during discovery in order to direct the mining process, or examine the findings from different angles or depths. The data mining primitives specify the following:
Task-relevant Data: This specifies the portions of the database or the set of data in which the user is interested. This includes the database attributes or data warehouse dimensions of interest referred to as the relevant attributes or dimensions.
The kind of knowledge to be mined: This specifies the data mining functions to be performed, such as characterization, discrimination, association or correlation analysis, classification, prediction, clustering, outlier analysis, or evolution analysis.

computer science and programming, Essays (university) of Computer Science

Related documents

Partial preview of the text

Download computer science and programming and more Essays (university) Computer Science in PDF only on Docsity!