As previously noted, I have a Computerworld column coming out next week on data mining. The heart of the column is an enumeration of markets where data mining applications were having genuine success. Before I sat down to actually write the column, my list went something like this:
- There’s a large set of “early warning” apps where text mining is being deployed. Many of those same apps are addressed by data mining of tabular data too – antifraud, to start with, and also warranty tracking and indeed most of the rest.
- Data mining has been huge in CRM.
- The use of data mining in manufacturing to do failure analysis, improve quality, etc. is really on the rise. This goes at least somewhat beyond what one could reasonably pigeonhole as “early warning.”
- Data mining plays a big role in the life sciences, and is being applied to a broad range of other sciences as well.
- Data mining is a huge part of R&D at search engine and antispam vendors.
By the time I submitted the column, the list had morphed into:
- Customer offer targeting.
- Other CRM applications, often of text mining, such as reputation management or just sentiment tracking.
- National security, antifraud, and crime prevention.
- Purer portfolio/risk management applications.
- Defect tracking.
- Health care and scientific research.
For lots of examples and explanation of the categories, please see the column when available. (Theoretically that should be on the inauspicious date of September 11. In practice, it could be any time next week. I’ll post a link here when I know of one that works.)
While the latter version of the list may be slicker and more precise, which is why I went with it in the column, I think the former is more useful for a discussion of why those particular apps are the ones that get adopted. Simply put, data mining apps are concentrated at two extremes:
- Seeking “gold nuggets” of insight.
- Continuous process improvement.
What’s more, if I had to pick just one of those categories, I’d pick #2. The annals of BI are replete with examples of insights that just leapt out of reports and danced straight to the bottom line. But those stories are generally about reports and OLAP analyses, not full-blown statistical workups. Don’t get me wrong — I’m sure there are plenty of cases of data mining producing hugely valuable sudden insights. But, uh, I can’t think of any right now, at least not in the mainstream statistical analyses we usually think of when we hear “data mining.” (Perhaps some kindly product vendors will help me out with examples. If nothing else, there should be examples in the life sciences, forensics, product quality, etc. – i.e., in applications where there only ever was one single answer to discover in the first place. )
Where data mining does succeed all the time is in areas such as marketing efficiency improvement – mailing smarter, better targeting customer offers, and of course avoiding “bad guy” customers such as fraud or default risks in the first place. Text mining is something of an exception to that rule – but then, despite its name, it’s not clear that all of text mining should be classified as data mining anyway. Some of it is just “knowledge/fact/information extraction”, which generally is used to inform analytic technologies of some sort or other. But those can be regular BI or text search or whatever, with data mining just being one of the candidates on the fact-consumer-technology candidate list.