Data Mining In A Nut Shell

In today’s business world, information about the customer is a necessity for a businesses trying to maximize its profits. A new, and important, tool in gaining this knowledge is Data Mining. Data Mining is a set of automated procedures used to find previously unknown patterns and relationships in data. These patterns and relationships, once extracted, can be used to make valid predictions about the behavior of the customer.
Data Mining is generally used for four main tasks: (1) to improve the process of making new customers and retaining customers; (2) to reduce fraud; (3) to identify internal wastefulness and deal with that wastefulness in operations, and (4) to chart unexplored areas of the internet (Cavoukian). The fulfillment of these tasks can be enhanced if appropriate data has been collected and if that data is stored in a data warehouse. According to Stanford University, "A Data Warehouse is a repository of integrated information, available for queries and analysis. Data and information are extracted from heterogeneous sources as they are generated....This makes it much easier and more efficient to run queries over data that originally came from different sources." When data about an organization’s practices is easier to access, it becomes more economical to mine. “Without the pool of validated and scrubbed data that a data warehouse provides, the data mining process requires considerable additional effort to pre-process the data” (SAS Institute).
There are several different types of models and algorithms used to “mine” the data. These include, but are not limited to, neural networks, decision trees, rule induction, boosting, and genetic algorithms.

Neural networks are physical cellular systems which can acquire, store, and
utilize experiential knowledge (Zurada). Neural networks offer a way to efficiently model large and complex problems. Decision trees are diagrams used for making decisions in business or computer programming. Branches are used to represent choices with associated risks, costs, results, or probabilities. Rule induction is a way of deriving a set of rules to classify cases (Two Crows). These set of rules differ from those in a decision tree in that they are independent from one another. Boosting is a technique in which multiple random samples of data are taken and a classification model for each set of data is made (Two Crows). The genetic algorithm is a model of machine learning, whose behavior is based on the processes of evolution in nature. Populations of data are resented by chromosomes and then go through a process of evolution. The members of one set of data compete to pass on their most favorable characteristics to the next generation of data. This process continues until the best data is found. Many of the models and algorithms used in data mining are simplifications of the linear regression model.
Data Mining is largely, if not entirely used for business purposes. The highest users of data mining include banking, financial, and telecommunications industries (Two Crows).

A survey taken by Two Crows Corporation turned up these applications of data mining:
· Ad revenue forecasting
· Churn (turnover) management
· Claims processing
· Credit risk analysis
· Cross-marketing
· Customer profiling
· Customer retention
· Electronic commerce
· Exception reports
· Food-service menu analysis
· Fraud detection
· Government policy setting
· Hiring profiles
· Market basket analysis
· Medical management
· Member enrollment
· New product development
· Pharmaceutical research
· Process control
· Quality control
· Shelf management/store management
· Student recruiting and retention
· Targeted marketing
· Warranty analysis
Data mining will have a different effect on different industries in the business world. In the telecommunications industry, for example, in order to retain or build market share and expand or develop new products and services, service providers will have to make the necessary adaptations and changes that the industry and pace setting technology requires.
“The most successful telecommunications companies will, of course, be the ones who can develop and market products and services that customers will buy,” says Julian Kulkarni, SAS institute Europe’s Product Marketing Coordinator for telecommunications. “But high customer churn rates in telcom markets show that you cannot depend on customer loyalty. To thrive, companies must know their customers, their products, their own operations, and the competition better.”
The key to succeeding in this rapidly changing industry is to understand the customer, or the market that the customer represents. Through data mining, telecommunications companies can know what their customers have done in the past and what they