An Introduction to DataMining
Data mining is a result of the natural evolution of information technology.
On-line transaction processing (OLTP), is an effecient method , where a query is viewed as a read-only transaction, have contributed substantially to the evolution and wide acceptance of relational technology as a major tool for efficient storage, retrieval, and management of large amounts of data.
Data warehouse technology includes data cleaning, data integration, and on-line analytical processing (OLAP), that is, analysis techniques with functionalities such as summarization, consolidation, and aggregation as well as the ability to view information from different angles.
What Is Data Mining?

Components of data mining

What is a Relational Database?
A relational database is a collection of tables, each of which is assigned a unique name. Each table consists of a set of attributes and stores a set of tuples . Each tuple in a relational table represents an object identified by a unique key and described by a set of attribute values.
What is a Data Warehouse?
A data ware-house is a repository of information collected from multiple sources, stored under a unified schema, and that usually resides at a single site.
What is a Transactional Database?
In general, a transactional database consists of a file where each record represents a transaction.
Some Advanced Data and Information Systems and Advanced Applications
Object-Relational Databases : Object-relational databases are constructed based on an
object-relational data model
Temporal Databases, Sequence Databases, and Time-Series Databases: A temporal database typically stores relational data that include time-related attributes.
Spatial Databases and Spatiotemporal Databases: Spatial databases contain spatial-related information. Examples include geographic (map) databases.
Text Databases and Multimedia Databases : Text databases are databases that contain word descriptions for objects
Heterogeneous Databases or Legacy Databases: A heterogeneous database consists of a set of interconnected, autonomous component databases.
Data Streams : where data flow in and out of an observation platform (or window) dynamically.
What is Data characterization?
It is a summarization of the general characteristics or features of a target class of data.
example :
Data characterization. A data mining system should be able to produce a description
summarizing the characteristics of a student who has obtained more than 75% in every semister , The result could be a general profile of the student.
What is Data discrimination?
It is a comparison of the general features of target class data objects with the general features of objects from one or a set of contrasting classes.
Example:
Data discrimination. A data mining system should be able to compare two groups of colleges such as the colleges getting a result of 80% distinction and some colleges rarely reaching that mark.
Explain Classification in the process of DataMining
Classification is the process of finding a model (or function) that describes and distinguishes data classes or concepts, for the purpose of being able to use the model to predict the class of objects whose class label is unknown. The derived model is based on the analysis of a set of training data (i.e., data objects whose class label is known).
A classification model can be represented in various forms, such as
1) IF-THEN rules,
student ( class , "undergraduate") AND concentration ( level, "high") ==> class A
student (class ,"undergraduate") AND concentrtion (level,"low") ==> class B
student (class , "post graduate") ==> class C
2) a decision tree,

3) neural network.

What is Cluster Analysis?
Clustering analyzes data objects without consulting a known class label. Clustering can also facilitate taxonomy formation.
Example
Cluster analysis. Cluster analysis can be performed on group of students to find out students with same IQ.
What is Outlier Minining ?
A database may contain data objects that do not comply with the general behavior or model of the data. These data objects are outliers. The analysis of outlier data is referred to as outlier mining.
Example
Outlier analysis. Outlier analysis may uncover fraudulent usage of credit cards by detecting purchases of extremely large amounts for a given account number in comparison to regular charges incurred by the same account. Outlier values may also be detected with respect to the location and type of purchase, or the purchase frequency.
What is Evolution Analysis?
Data evolution analysis describes and models regularities or trends for objects whose behavior changes over time.
Example:
Evolution analysis. The data of result the last several years of a college would give an idea if quality of graduated produced by it.
An interesting pattern is one which is easily understood by humans, valid on new or test data with some degree of certainty, potentially useful,and novel. A pattern is also interesting if it validates a hypothesis that the user sought to confirm. An interesting pattern represents knowledge.
Classification of Data Mining Systems :
The Datamining systems can be classified based on :
- The kinds of databases mined
- The kinds of knowledge mined
- The kinds of techniques utilized
- The applications adapted
What are Data Mining Task Primitives?
A data mining task can be specified in the form of a data mining query, which is input to the data mining system. A data mining query is defined in terms of data mining task primitives.
Example:
To get the list of names of firsy year students getting more than 80%
1. use database DMTcollege db
2. use hierarchy location hierarchy for T.year , score hierarchy for S.percentage
3. mine classification as promising students
4. in relevance to S.percentage ,T.year
5. from student S, college T
6. where S.item ID = T.item ID
7.
C.percentage ≥ 80 and year = 1
8. display as rules
Integration of a Data Mining System with a Database or Data Warehouse System
DataBase and DataWarehouse systems, possible integration schemes include
- No coupling: No coupling means that a DM system will not utilize any function of a DB or DW system
- Loose coupling: Loose coupling means that a DM system will use some facilities of a DB or DW system, fetching data from a data repository managed by these systems, performing data mining, and then storing the mining results either in a file or in a designated place in a database or data warehouse.
- Semitight coupling: Semitight coupling means that besides linking a DM system to a DB/DW system, efficient implementations of a few essential data mining primitives (identified by the analysis of frequently encountered data mining functions) can be provided in the DB/DW system.
- Tight coupling: Tight coupling means that a DM system is smoothly integrated into the DB/DW system.
Some issues we encounter in Data Mining
- Mining methodology and user interaction issues
- Mining different kinds of knowledge in databases
- Interactive mining of knowledge at multiple levels of abstraction:
- Incorporation of background knowledge
- Data mining query languages and ad hoc data mining
- Presentation and visualization of data mining results
- Handling noisy or incomplete data
- Pattern evaluation—the interestingness problem
The performance of dataminig system is measered on the following issues:
- Efficiency and scalability of data mining algorithms
- Parallel, distributed, and incremental mining algorithms
- Issues relating to the diversity of database types
- Handling of relational and complex types of data
- Mining information from heterogeneous databases and global information systems