DATA MINING---TOWARDS
BUSINESS INTELLIGENCE
-BY
Ms.
Vandana Sharma(Lecturer Comp. Sc.(IMR))
Ms. Neeru
Saxena(Lecturer Comp. Sc.(IMR))
During the last several years, data mining techniques have
been used by companies to understand the demographics of
their customers and to provide them with personalized
interactions. There are various data mining techniques that
have been deployed in order to identify hidden trends and
new opportunities within the data. These various data mining
techniques have been embedded into software applications
that process complex algorithms in order to provide
meaningful information. While end-user data mining
applications are available, they have not been extensively
deployed throughout organizations because they are often not
understood. One way to understand the capabilities of data
mining is to compare it to other business intelligence (BI)
technologies.
The revenue by customer could be totalled to another
question: "How much revenue was generated this year?" In
addition, other questions such as: "What customer generated
the most revenue for the company?" and "What customer
generated the least amount of revenue for the company?"
could also be answered. While the query result was useful
and addressed several questions, this BI technology will not
identify unusual patterns or reveal unusual relationships.
What the user requested was revenue by customer for the
current year and that is the information that was provided –
no more, no less.
Online Analytical Processing (OLAP)
OLAP applications provide users with the ability to manually
explore and analyze summary and detailed information. For
example, a user creates and performs an OLAP analysis that
answers the question, "What was the revenue for each quarter
of this year by geographic region and customer?" The results
from this analysis would contain geographic region, customer
name, revenue and quarters selected. Figure 2 is a
representation of the result set produced by the OLAP
analysis.
Data Mining
Data mining can best
be described as a BI technology that has various techniques
to extract comprehensible, hidden and useful information
from a population of data. Data mining makes it possible to
discover hidden trends and patterns in large amounts of
data. The output of a data mining exercise can take the form
of patterns, trends or rules that are implicit in the data.
There are various data
mining techniques that can be deployed; each serving a
specific purpose and varying amounts of user involvement.
Figure 3 displays the progression of data mining techniques
in the order of user involvement.
Neural networks
are highly evolved systems that provide predictive modeling.
These systems are very complex, and it takes time to train
the system to perform human-like thinking. This data mining
technique has been used to detect potential fraudulent
credit card transactions.
Induction is a
data mining technique that induces rules inherent within the
data. The rules are used to understand the relationships
that exist. A classic example is: When people buy
diapers, they also buy beer 50 percent of the time.
Statistics is
the basis of all data mining techniques and requires
individuals highly skilled in mathematics to build and
interpret the results.
Visualization
displays the data in a graphical or three-dimensional map,
thereby allowing the user to identify trends, patterns and
relationships. While an image that is produced provides
another perspective of data relationships, visualization is
often incorporated in data mining applications.
While OLAP and
query language are listed by the Gartner Group as
data mining techniques, the amount of user involvement is
extensive and extremely time-consuming to identify hidden
trends and relationships. Therefore, using such techniques
is not cost-effective.
Utilizing a data
mining application, a user can ask, "What are the
distinguishing characteristics of our credit customers who
pay on time?" The results from the data mining exercise
would then be used to create the condition statement of an
ad hoc query that identifies customer names and contact
information within the database for the purposes of cross-
selling additional services.
Ad hoc query
applications scratch the surface of the value that exists
within a database while OLAP provides users with greater
depth and understanding. However, data mining digs deeper
and provides users with knowledge through the discovery of
hidden trends and relationships. The combination of data
mining with an ad hoc query or OLAP application is extremely
powerful and provides users with knowledge about the data
that is analyzed and the ability to act upon the knowledge.
Figure 4 depicts the value and purpose of the BI
technologies addressed herein.
What
is Master Data?
The
enormous interest in master data management (MDM) that has
appeared in the past couple of years has not yet generated a
great deal of methodological progress. Hopefully, as data
professionals, consultants, and vendors grapple with the
complex issues involved, the situation will improve. A
central problem, however, is that there is little agreement
about what master data is. It is usually defined by
examples, like product, customer, or account, as if to say
“I know it when I see it”. Alternatively, master data is
defined using generalities such as that it is simply highly
shared data, or that it is data used by an application, but
which is not produced by the application.
Definitions do matter. They tell us something fundamental
about what is being defined. In the case of master data,
there is a special need for a greater understanding because
MDM is still at an early level of maturity. For several
years, I have been using an approach to categorizing data
that provides a detailed definition of master data. I have
found this approach useful in that it can be practically
applied to master data management problems.
Is Data
Just Data?
A fundamental
question about data is whether it is homogenous. In other
words, are the boxes we see in a data model, or the tables
contained in a physical database, all the same in terms of
their properties, behaviors, and management needs as data?
The fact that we are even talking about master data
management indicates that there are qualitative differences
among entities (at the logical level) or tables (at the
physical level). There is, in fact, strong evidence that we
can categorize data within a taxonomy that recognizes the
different roles that data plays in the operational
transactions of the enterprise.
|