Forgot Password?

An Introduction to DataMining

Data mining is a result of the natural evolution of information technology.

On-line transaction processing (OLTP), is an effecient method , where a query is viewed as a read-only transaction, have contributed substantially to the evolution and wide acceptance of relational technology as a major tool for efficient storage, retrieval, and management of large amounts of data.

Data warehouse technology includes data cleaning, data integration, and on-line analytical processing (OLAP), that is, analysis techniques with functionalities such as summarization, consolidation, and aggregation as well as the ability to view information from different angles.

 

 


What Is Data Mining?

 

datamining

 

 

 

 

 

 

 

 

 

 

 

 

 

Components of data mining

components

 

What is a Relational Database?

A relational database is a collection of tables, each of which is assigned a unique name. Each table consists of a set of attributes and stores a set of tuples . Each tuple in a relational table represents an object identified by a unique key and described by a set of attribute values.


What is a Data Warehouse?

A data ware-house is a repository of information collected from multiple sources, stored under a unified schema, and that usually resides at a single site.


What is a Transactional Database?

In general, a transactional database consists of a file where each record represents a transaction.


Some Advanced Data and Information Systems and Advanced Applications

Object-Relational Databases : Object-relational databases are constructed based on an
object-relational data model

Temporal Databases, Sequence Databases, and Time-Series Databases: A temporal database typically stores relational data that include time-related attributes.

Spatial Databases and Spatiotemporal Databases: Spatial databases contain spatial-related information. Examples include geographic (map) databases.

Text Databases and Multimedia Databases : Text databases are databases that contain word descriptions for objects

Heterogeneous Databases or Legacy Databases: A heterogeneous database consists of a set of interconnected, autonomous component databases.

Data Streams : where data flow in and out of an observation platform (or window) dynamically.


What is Data characterization?

It is a summarization of the general characteristics or features of a target class of data.

example : Data characterization. A data mining system should be able to produce a description
summarizing the characteristics of a student who has obtained more than 75% in every semister , The result could be a general profile of the student.


What is Data discrimination?

It is a comparison of the general features of target class data objects with the general features of objects from one or a set of contrasting classes.

Example:
Data discrimination. A data mining system should be able to compare two groups of colleges such as the colleges getting a result of 80% distinction and some colleges rarely reaching that mark.


Explain Classification in the process of DataMining

Classification is the process of finding a model (or function) that describes and distinguishes data classes or concepts, for the purpose of being able to use the model to predict the class of objects whose class label is unknown. The derived model is based on the analysis of a set of training data (i.e., data objects whose class label is known).

A classification model can be represented in various forms, such as

1) IF-THEN rules,

student ( class , "undergraduate") AND concentration ( level, "high") ==> class A

student (class ,"undergraduate") AND concentrtion (level,"low") ==> class B

student (class , "post graduate") ==> class C

2) a decision tree,

decision tree

3) neural network.

neural network


What is Cluster Analysis
?

Clustering analyzes data objects without consulting a known class label. Clustering can also facilitate taxonomy formation.

Example
Cluster analysis. Cluster analysis can be performed on group of students to find out students with same IQ.


What is Outlier Minining ?

A database may contain data objects that do not comply with the general behavior or model of the data. These data objects are outliers. The analysis of outlier data is referred to as outlier mining.

Example
Outlier analysis. Outlier analysis may uncover fraudulent usage of credit cards by detecting purchases of extremely large amounts for a given account number in comparison to regular charges incurred by the same account. Outlier values may also be detected with respect to the location and type of purchase, or the purchase frequency.


What is Evolution Analysis?

Data evolution analysis describes and models regularities or trends for objects whose behavior changes over time.
Example:
Evolution analysis. The data of result the last several years of a college would give an idea if quality of graduated produced by it.

An interesting pattern is one which is easily understood by humans, valid on new or test data with some degree of certainty, potentially useful,and novel. A pattern is also interesting if it validates a hypothesis that the user sought to confirm. An interesting pattern represents knowledge.


Classification of Data Mining Systems :

The Datamining systems can be classified based on :

  1. The kinds of databases mined
  2. The kinds of knowledge mined
  3. The kinds of techniques utilized
  4. The applications adapted


What are Data Mining Task Primitives?

A data mining task can be specified in the form of a data mining query, which is input to the data mining system. A data mining query is defined in terms of data mining task primitives.

Example:
To get the list of names of firsy year students getting more than 80%
1. use database DMTcollege db
2. use hierarchy location hierarchy for T.year , score hierarchy for S.percentage
3. mine classification as promising students
4. in relevance to S.percentage ,T.year
5. from student S, college T
6. where S.item ID = T.item ID
7. C.percentage ≥ 80 and year = 1
8. display as rules


Integration of a Data Mining System with a Database or Data Warehouse System

DataBase and DataWarehouse systems, possible integration schemes include

Some issues we encounter in Data Mining

 

The performance of dataminig system is measered on the following issues: