Mining Data

By Miriam Wasserman

December 1, 2000

SCENE 1: It's late November 1999. The Celtics are struggling with their second lineup. In a typical game, the team can be up by 14 points; and when the second unit comes in, the lead is lost. It is time for Frank Vogel to come into play. Vogel, the Celtics' video coordinator, is in charge of running the game statistics through Advanced Scout, a data analysis package developed by IBM. Vogel's research confirms the coaches' observations: The second unit’s defense is holding up, but the offense is failing. More important, the statistics tell him that in situations where one of the star players, Paul Pierce or Antoine Walker, is moved to the second unit, there is no drop-off in the performance of the first unit, and the production of the second unit increases. His recommendations help the coaches formulate their new strategy.

SCENE 2: A patient has been having ulcer problems. He goes to the doctor and then buys the prescription she recommends at the local pharmacy. End of story? No. The record of the transaction — the drug bought, the location of the purchase, the value paid, and the name of the prescriber (none of which includes any of the patient’s identifying information) — goes to IMS America, one of the largest pharmaceutical market research companies in the world. The transaction is added to a database of over 1.5 billion prescriptions generated that year from over 33,000 retail pharmacies (and medical mail order) which matches the prescriptions to over 600,000 physicians. With these data, the company can track which physicians have changed their prescribing behavior, and pharmaceutical companies can fine-tune their ulcer-drug marketing campaign: which physicians should be visited by medical reps and which should just receive an informational package.

What do the two episodes have in common? They both include the use of data-mining computer technology to search for patterns in data. In the case of the drugstore prescription, the software can look within the prescribing habits of doctors, in particular therapeutic classes, to determine the characteristics of doctors who tend to be “brand switchers” and of those who tend to be “loyal” to particular brands.

In the case of the Celtics, the statistical package allows Vogel to determine which specific lineups are most effective against another team’s lineup, and under what circumstances a player’s potential is maximized. Although data mining by itself is not going to get the Celtics to the playoffs, Vogel can, for instance, run queries to find whether Antoine Walker is more effective with Dana Barros or with Kenny Anderson, by matching all of Walker’s game minutes with each of the point guards. The coaches can observe examples of the findings on the game’s video, since the program lists exact times when the studied sequence was in play. “It definitely has found some trends we hadn’t recognized,” says Jay Wessel, director of technology for the Celtics.

One does not have to be an NBA player to have one’s actions analyzed by statistical software designed to detect patterns. In fact, these sorts of applications have increasingly come to be used in many more mundane transactions. As we go about our modern lives, we leave a trail of data behind. Supermarket purchases, bank transactions, credit card purchases, phone calls, retail catalog orders, and each click of the mouse can be recorded, stored, and analyzed. These data reveal information about who we are: our habits, our preferences, and what we are interested in. And, this information means money. That is, if companies can figure out what to do with it.

HOW IT WORKS
For the most part, what companies use data mining for is not new. Assistant coaches have had the responsibility of keeping an eye out for more effective team combinations just as pharmaceutical companies have been marketing their products to doctors for decades — and both have been analyzing data in order to make these decisions. But data-mining programs allow them to analyze greater amounts of data faster and potentially more efficiently. And, in some cases, the more widespread use of data-mining techniques has meant that data being collected for one purpose (or without the individual’s awareness) is being used for another purpose.

Present-day data-mining technology and its application to business developed almost as an afterthought. The field, which now has its own journal, an annual conference (this year’s is in Boston), and at least two regular newsletters, has been cobbled together from several other domains, including machine learning, statistics, and decision support.

According to data-mining experts Michael Berry and Gordon Linoff, in the early 1980s, researchers in machine learning — a subset of artificial intelligence focusing on writing software that allows computers to learn by example — began looking for commercial applications when funding for artificial intelligence research dried up. Statisticians, for their part, had been developing the theoretical underpinnings for predictive modeling, sampling methodologies, and experimental design.

At the same time — thanks to improvements in computers and data storage capacity, and the development of new technologies such as scanners — companies found themselves sitting on top of piles of data. The NBA had been collecting game statistics long before anyone thought of mining them, and credit card companies habitually recorded for billing purposes what, when, and where purchases were made. Similarly, supermarkets had introduced scanners and bar codes to eliminate the need to price items individually on the shelf and accelerate the checkout process. It was just a matter of time before the developers of data-mining technology realized that they could use these data to do such things as keep track of the combinations of products individual people bought, the time of day certain products were likely to be purchased, or the way people responded to special offers such as coupons.

Since then, firms that specialize in data-mining software have been developing a variety of techniques, depending on the particular problem and quality of the data. Clustering is one example of what Gordon Linoff calls “undirected” data mining, in which a program is designed to find possible associations and similarities in the data without any specific guidelines. In market basket analysis, for instance, the program indicates affinities among certain products that tend to be bought together, say a particular kind of golf ball with a particular type of club. The software can also be used to sort people into groups according to shared characteristics, whether they be demographic facts, known political attitudes, or past purchases (to name just a few). Thus is born the “soccer mom” or the “conservative retiree.” Using this so-called “psychographic” information can help companies to better target and tailor their products and marketing messages to particular groups.

The patterns discovered are then evaluated by a human analyst to decide whether the groupings give some useful information. Ideally, such a program would be able to pick out some previously unknown and perhaps even counterintuitive correlations that have high market value. But this is not always the case. Sometimes, the groupings can be so obvious that they offer no new insight; in others, they can come up with correlations that are hard to capitalize for marketing purposes and may even be spurious. (One widely cited example is that people who buy diapers also tend to buy beer.)

In the end, it is up to the business analysts to decide how to make use of the information — to place the golf clubs with the balls in the same rack or to offer a coupon for the balls with the purchase of the club, for instance.

“Directed” methods of data mining are more widespread. In this case, the algorithms track patterns associated with very specific results: patterns associated with credit card fraud, for example. Nestor, Inc., a company based in Providence, Rhode Island, uses a technique called “neural networks” to answer just that question for their banking clients. The software learns to recognize customers’ card-use patterns, which allows it to automatically detect deviations that may represent fraudulent transactions. To develop the model, Nestor asks the client to provide transaction data accumulated over four months, including every known fraudulent transaction. First, the algorithms identify characteristics and patterns that are likely indicators of fraud. Then, the resulting model is fine-tuned by asking it to predict fraud in a new batch of data. Finally, these results are compared to the known instances of fraud so that the system can learn from the differences. “It is like a very small brain whose total knowledge is credit card fraud; you are repeatedly exposing the network to these patterns that are strengthening these connections,” says Bernard Chartier, director of modeling services for Nestor. A 60 to 70 percent detection rate is considered very good.

Once the model is in use, clients can load in new fraud data to refresh it. Still, Nestor recommends a complete retraining of client systems every 18 months. Says Nestor’s director of worldwide marketing, Tom Spillane, “Fraud is a transient behavior that is always changing.”

UP CLOSE AND PERSONAL
Small businesses that don’t have a lot of clients don’t have a need for massive data mining. Their owners and operators often know the needs and tastes of their clients better than a computer could. But in larger firms, particularly in industries that naturally accumulate large amounts of detailed transaction data, such as firms in banking, insurance, telecommunications, catalog retail, utilities, and supermarkets, applications of data mining are increasingly widespread.

Perhaps the most common application of data mining — and one of the ones that has been around longest (since the 1950s) — is credit scoring, a statistical method used to predict the probability that a loan applicant or existing borrower will default or become delinquent. Credit scoring is now widely used for consumer lending, particularly with credit cards and mortgage loans, and is becoming more common in small business lending (see sidebar).

Companies also mine their customer data to try to figure out who their best customers are, and what products they are likely to buy. They then use that information to buy lists of potential customers with the identified characteristics or to pitch products and promotions to particular segments of their client pool. Many of the unsolicited phone calls, letters, and e-mails that enter our lives on a daily basis originate in this way.

The Vermont Country Store, a small family-owned business based in Manchester Center, Vermont, whose catalogs offer items that are now hard to find, such as Ovaltine and Olivetti manual typewriters, has used data mining to increase the effectiveness of their catalog mailings. Although he won’t say by how much, vice president of marketing Larry Shaw affirms that the company’s mailings have increased in profitability since it started applying data-mining techniques on its more established catalogs. Some good predictors of how valuable a customer is to the company turned out to be quite straightforward: how recently and how many times a customer has ordered, for instance. But others were more surprising. “[We found that] someone who buys products from different categories [e.g., food and housewares] is more profitable as a customer than someone who buys the same number of products from the same category,” says Shaw. The company has four different catalogs, and data mining has been successful in matching the best catalog for each customer.

Such targeted marketing is only a small part of customer relationship management (CRM), the latest trend in data mining. Mining their customer transaction data — often augmented with additional demographic information — companies calculate a “lifetime value” for each customer. Knowing the traits of their most valuable customers not only allows firms to try to acquire more customers with similar characteristics, but also lets them know which customers they want to spend money and effort on retaining and which ones they are willing to let go. It also flags “cross-sell” opportunities, telling companies which of their customers are most likely to purchase from other product lines. And, it helps single out lower-profit-segment customers who are likely candidates to be upgraded to a higher “platinum” or “gold” category. (Companies can then mail them offers for higher value products, expand their credit line, or entice them to join frequent-buyer rewards programs, for instance.)

In practice, this means that companies have to ensure that they can recognize the customer through any of their channels of contact, whether in a branch, on the web, or through the call center’s different departments (sales, customer service, and so on), so that they can treat the customer accordingly. For instance, according to Business Week, Sanwa Bank segments its customers into three categories, A, B, and C. When you call or e-mail the company to ask the bank to waive the fee on a bounced check, the customer representative gets your score within seconds. If you are among the business’s most profitable customers (an A), the fee is waived without questions. If you are in the least profitable category, you are less likely to be forgiven. Somewhere in the middle, and you will have to negotiate with the representative.

Sanwa Bank offers just one example of what many other banks are doing. This type of segmentation can affect even how long you have to wait on the phone. In some firms, automated systems can recognize the customer’s category the moment the caller punches in an account number or any other identifying information. If you are in the most profitable category, you may be automatically jumped to the head of the queue, leaving less profitable customers on hold. Or, if you have been identified as a potential buyer of additional services, your call may be routed to a representative with experience in selling that product — regardless of the motive for your call.

Companies are understandably reluctant to disclose the effectiveness of their customer relationship management strategy. But, according to a recent Wall Street Journal article, Harrah’s Entertainment Inc., a Las Vegas-based casino business, has seen its revenues more than double. Using information gathered through electronic “frequent-gambler cards,” the casino learned that gamblers who spent a relatively modest $100 to $499 per trip, about 30 percent of gamblers, accounted for the majority of the casino’s profits. Armed with this information, the casino proceeded to experiment on how to increase this group’s loyalty at least cost, by testing different promotions on them. For instance, they found that an offer of $60 in chips got people to gamble much more than the more expensive promotional $125 package of a free room, two meals, and $30 in chips. And, the rate at which people responded to their mail offers more than doubled, from 3 to 8 percent.

Fraud detection, credit scoring, targeted marketing, and customer relationship management applications are now the most common applications of data mining. But, as the NBA example showed, they are not the only ones. Data-mining techniques are also being used to improve manufacturing processes, develop new drugs, and relate information in the human genome to particular diseases, among other things. New and novel applications are constantly appearing. The ability to mine files of text such as e-mails or news reports is one of the promising fields, according to Mark Brown, SAS global data mining program manager. For example, Nestor has already been approached by a client to develop a program that would assess the content of customer complaint calls and help predict which are likely to result in litigation.

The INTERNET: DATA MINING GOES ON STEROIDS
But it is on the Internet where data-mining applications are creating the greatest stir. As Jeff Averick, data quality specialist with DiscJockey.com in Salem, Massachusetts, puts it, when it comes to the Internet, “this thing goes on steroids.” With cookies — tiny data files created on a user’s hard drive in response to a command from a website which allows that website to recognize the user’s computer every time they visit — and other technologies that can track customers’ every activity, the opportunities for customer profiling on the web are almost limitless. With so much information, the firm can seek to drive segmentation to a category of one — instead of dividing clients into, say, three categories based on lifetime value, companies can aim to personalize and customize their customer interactions and their marketing pitch to each individual. And all this can happen as quickly as the time it takes to click on to the next screen.

Since the Internet also makes it easier for customers to hop from business to business and shop for the lowest price, specialists argue that e-businesses have greater need for data mining. By giving personalized service, firms aim to gain their clients’ loyalty. So, a person who likes receiving tailored book recommendations from Amazon.com might be less likely to try other sites.

Much of the personalization on the web today uses relatively simple techniques, according to Stern Business School professor Foster Provost. One of the ways in which Amazon.com makes book recommendations is simply by identifying the most frequently purchased books by customers who also bought the book you are browsing. But more elaborate programs are evolving and spreading. Nestor has just launched a product that scores each click of the mouse on the probability that the behavior is going to result in a purchase. When different people enter a web site that uses this technology, they are shown different offers based on their clicking behavior. A browser-based tool will make recommendations of what to show the potential customer — based on the score — while he or she is on the site. “When you buy milk, there is a probability that you are going to buy cookies, so we are going to present you with the option to buy them,” says Chartier.

Or, just take a virtual stroll to www.sas.com, the web site of SAS Institute — one of the market leaders in data-mining software — as this author did in the process of her research for this article. A search for information on data mining eventually takes you to a page where you can register to download “white papers” on different topics. As you fill in the form, you notice that the category for data mining (which you just spent the last 10 minutes browsing) is already selected for you. (Of course, you do have the option to mark other categories offered on the page.) Not only that, but the next day when you turn on your computer, you have received an e-mail from a SAS employee who says they have noticed your interest and offers you their contact information in case you have any questions — or wish to place an order.

And this is just the beginning. In the future, “systems may be able to design experiments automatically and get results on the fly,” says Provost. Companies may develop learning systems that choose a segment of customers on whom they want to try a new scheme, get the instant results, learn from their behavior, and improve on the next try, all on an automated basis. Thus, the program would be running experiments like those conducted by Harrah’s Entertainment Inc. to determine which offer is going to get the best response at the lowest cost from specific customers — only most of the process would be automatic.

MINEFIELDS FOR DATA MINERS
As promising as the field may be, data mining is not without its pitfalls. “The quality of the data can make or break the quality of the data mining,” says Jeff Averick. “You can have all the great algorithms and technology but if you can’t rally the data to the cause, the algorithms can lead you in the wrong direction.” Oftentimes, the data are a proxy for something else that is likely to be linked to a purchase decision. The address may be associated with wealth or income, for instance. But, if the data are not a good proxy or not sufficient (what if you live in that address as a nanny, butler, or gardener), data mining can give false results or you can misinterpret them. And this means that you would not only be wasting your time, but you may also end up taking counterproductive measures.

In order to mine their information, companies first have to integrate, extract, transform, and cleanse data to serve a purpose for which it was never intended. Handling the massive amounts of data, ensuring accuracy, and integrating data gathered from all different entry points is a time-intensive and costly endeavor — particularly for old-line companies with legacy systems from different parts of the business that have to be made to talk to one another.

Moreover, to get value out of the data mining, companies must be able to change their mode of operation and maintain the effort. In the case of supermarket loyalty cards, for instance, “the commitment has to be one that will endure because of the enormous amount of mailing and chronicling you have to do,” says Bernard Rogan, spokesman for Shaws Supermarkets, which recently acquired Star Supermarkets and their Star Advantage loyalty card program. If the company lets the loyalty program languish, customers might start wondering why all the information about their purchases is being collected. The company will then be in a bind, because loyalty programs are also hard to end. Customers who have been choosing to fly on a particular airline to accumulate a given number of miles will not appreciate it if the program is curtailed or changed before they reap the rewards.

But perhaps the most important challenge to the spread of data-mining applications is the growing concern over privacy. Unease about how private firms acquire and handle data has been on the rise, particularly since the early 1990s when public uproar forced Equifax and Lotus Development Corporation to cancel the sale of their Lotus Marketplace: Households — a series of disks containing the names, addresses, buying habits, and income information of about 120 million Americans.

Companies that use data mining for target marketing are often walking a tightrope between personalization and respect for privacy. The actions companies have taken to know their customers better and use this information have, in some cases, backfired. In its attempt to start a “friends and family” program in the United Kingdom, British Telecom mailed its customers a “five favourite calls” list with the most frequently dialed numbers in each account. According to the British magazine, The People, this resulted in a broken marriage when an unsuspecting wife realized she didn’t recognize the most frequently dialed number from their home. The errant husband told the publication that he was considering suing BT for having blown the whistle on his carefully concealed 20-year affair.

Going beyond the potential backlash of the market, privacy advocates and the Federal Trade Commission (FTC) have been pushing towards stricter rules such as those applied in European Union countries. At a minimum, the guidelines proposed by the FTC state that companies must disclose their information practices before collecting any personal information and that consumers should have a choice as to whether and how personal information may be used. Also, the FTC states that consumers should be able to view and contest the accuracy and completeness of the data collected about them.

But the implementation of these guidelines in data mining is not always straightforward and can be costly to companies. Firms would have to let customers know that they are using billing and account information (to name a couple of categories) for mining purposes. Yet, companies often don’t know specifically what they are going to do with this information until the data-mining process reveals patterns in the data. Moreover, providing customers with access to the data in an intelligible form can be costly and cumbersome. And it can raise the very privacy concerns it is designed to appease: How does a company guarantee that the person who requests to review and correct the information is really the person whose data was collected? And, if a company guarantees that it will not share the data with others, what happens to the data when a company is bought or goes into bankruptcy and has to sell its assets?

In a sense, technology is outrunning the ability of our legal system to handle the ethical and property issues that arise. As privacy expert Jason Catlett sees it, data mining is pushing the definition of privacy from individuals’ claims over determining what information about them is communicated to others to include determining what information is created by others.

Also, the technology that renders data useful as a source of information makes it more valuable as a commodity that can be sold. Defining which information is personal and owned exclusively by the individual and which can be owned by companies — as well as the guidelines for what can be done with the different types of information — remains a challenge for the future.

As the rules are laid out and the technology becomes more widespread, data mining could have an impact on the efficiency with which companies cater to the preferences of individual customers, in the same way that it has been improving the efficiency with which loans are evaluated, fraud is detected, and NBA coaches formulate their strategies. Better targeting can reduce costs for companies and offer customers products they are more likely to buy, reducing the amount of junk mail. However, there can still be winners and losers, as those who turn out to be in the least-profitable segment for a company will see their options reduced — as those bank customers whose late fees are not forgiven can attest.

And the models are not fail-safe. For all the talk of prediction, companies cannot impel you to buy their products, they can only try to pitch their best offer. And in many ways, this is not so different from the corner grocer of yore who could greet you by name and tell you that the apples you like so much are especially juicy and ripe this week. And for you, just for you, he will cut you a special deal.

KEEPING SCORE WITH DATA MINING

Credit scoring has benefited both banks and consumers by reducing the time needed to approve loans and the costs of evaluating them. Using credit bureau data, credit scoring tries to isolate the effects of various applicant characteristics on loan delinquencies and defaults. The end result is a “score” that allows banks to grade applicants in terms of risk and determine a threshold below which they decide it is too risky to lend. Each lending institution does not have to build its own credit model; companies such as Fair, Isaac, and Co., Inc. of San Rafael, California, produce scores that smaller lenders can use.

But, do all customers benefit? Like all criteria used for evaluations, credit scoring is open to the question of whether the score gives a fair and accurate reflection of the creditworthiness of the potential borrower. Some consider credit scoring a more objective process that helps ensure that the same underwriting standards are applied to all borrowers. By law, scores cannot consider a person’s ethnic group, religion, gender, marital status, and national origin. Fair, Isaac, and Co., Inc. evaluates five main parts from people’s credit reports: payment history (i.e., late payments, bankruptcies), amount owed, length of credit history, new credit, and types of credit in use — in determining an applicant’s score.

Nonetheless, minorities have lower credit scores on average than white applicants. Scoring industry representatives say that this is because factors that affect a borrower’s ability to meet financial obligations, such as income, property, education, and employment, are not equally distributed by race or national origin in the United States.

Still, models can only be as accurate as the underlying data. Mistakes will affect the results. And how well the scores assess risk depends on whether history can accurately predict the future. If a large change occurs that has not been accounted for — say, in the cultural acceptance of bankruptcy — accuracy will drop. Moreover, applicants with no credit histories are excluded from scoring models. Thus, increasing reliance by lenders on scoring can create barriers to credit for these people.

Skeptics argue that in the case of scoring for small business loans, lending to low- and moderate-income areas may be limited by scoring as these areas tend to be underrepresented in the samples used to build the models. But one recent study by the Atlanta Fed found that — after accounting for community and bank characteristics — banks that used scoring did not lend significantly less in low- or moderate-income areas than in high-income areas.
Return to article

HOW MUCH ARE YOU WILLING TO PAY?
Collecting and tracking large amounts of customer information can help companies better serve their clients. Aside from allowing them to offer a more personalized service, it can make it easier for businesses to ensure that the right products are available to the right people at the right time. But it can also help them determine how much each customer is willing to pay — and charge accordingly.

Charging customers differently is not new. Many businesses have found ways to charge customers different prices for essentially the same product. At any point in time, for instance, a large U.S. airline carrier is likely to have 20 or more fares available on a given route, according to economists Severin Borenstein and Nancy Rose. (And, this variety of fares refers only to direct coach class travel in the largest U.S. service domestic markets.)

In order to do this, companies must find ways to sort their customers according to their willingness to pay for the product, and they must seek ways to make customers reveal their preferences. In the case of the airline industry, a good example is the Saturday night stay requirement. By offering lower fares on flights including a Saturday night stay, airline companies can sort the business travelers — who are more likely to want to be back home by the weekend and who are willing to pay more in order to suit their business needs — from the vacationers who, given a high enough price, might decide to drive or not leave their hometown at all.

Through collecting customer information and applying data mining, companies can better figure out such preferences. They can more accurately target discounts or promotions to particular segments of their client pool — effectively changing the price. Or, they can better figure out ways in which they can tailor a product, say create a number of versions designed for each customer segment, in order to charge different rates for the same underlying good.

In a competitive market, one would expect that prices for the same good would converge. (The higher fares would not persist because rival companies could gain market share by charging lower fares.) Still, one can find different prices for the same good even on the Internet, where you might expect a smaller dispersion in prices because comparison shopping is easier and less costly. MIT professors Erik Brynjolfsson and Michael D. Smith found that Internet retailer prices differed by an average of 33 percent for books and 25 percent for compact disks.

Companies clearly gain when they manage to charge customers according to how much they are willing to pay. If you, as a customer, get passed over for discounts or get offered the higher-priced packet, an improved ability on the part of businesses to charge different prices might not seem such a great innovation. But all customers don’t necessarily lose. If the ability to extract different prices means that consumers who might not have had access to the product under a single price scheme can now buy it, then those consumers will gain. Moreover, points out Stern Business School professor Foster Provost, as consumers increasingly realize the value of their information, companies may find that they have to start giving greater incentives in exchange for such information.