Data mining is hugely important, but it does have issues with accessibility. The traditional model of data mining goes something like this:
- Data is assembled in a data warehouse from transactional information, with all the effort and expense that requires. Maybe more data is even deliberately gathered. Or maybe the data is in large part acquired, at moderate cost, from third-party providers like credit bureaus.
- The database experts fire up long-running, expensive data extraction processes to select data for analysis. Often, special data warehousing technology is used just for that purpose.
- The statistical experts pound away at the data in their dungeons, torturing it until it reveals its secrets.
- The results are made available to business operating units, both as reports and in the form of executable models.
Each in its own way, KXEN and Verix (the imminent new name of the company now called Business Events) want to change all that.
KXEN believes they have found a one-size-fits-all set of data mining models and algorithms. This is not an SVM (Support Vector Machine), which they actually don’t offer any more, but rather something else from the fertile brain of SVM co-inventor Vladimir Vapnik, called Structured Risk Minimization. While the details have been published, they asked me not to write about them anyway for some kind of security-through-obscurity competitive reason. So let’s just say that these are not just the linear models they previously were or seemed to be stuck with. (For a small company with limited footprint, there sure is a lot of false information out there about how the whole thing works.)
A limited set of models lets one design a fairly simple user interface, especially when the models are good at helping one zoom through what otherwise can be annoying steps (like variable reduction, in which you choose which 80-90%+ of the data columns to disregard). Based on that relative simplicity, KXEN wants to let business users data mine directly, without being dependent on statistical specialists and their machinery. They position this as providing better results, because it allows rapid-cycle-time data exploration.
They also have a pickier statistical point to make, which is that their model-building process is streamlined and automated enough that it’s realistic to build lots of parallel “local” models, e.g. for each store or region in a retail chain. By way of contrast, in traditional data mining one would normally have one model used for all localities, but perhaps with additional variables indicating which locality the model was currently referring to. KXEN confidently believes that its way is superior, but in a recent discussion didn’t actually provide me with much beyond hand-waving to back that claim up.
I don’t actually have a good feel for how well these pitches are being received by the market. KXEN’s biggest sales successes seem to be via partnerships with various other analytics players, and it’s tough to judge whether that’s due more to price or to embeddability or to the fundamental merits of their overall case.
Business Events, imminently to be renamed Verix, is a raw start-up with a story even more extreme than KXEN’s: Sophisticated analytic results just delivered on a SaaS basis, with no thinking required by the customer at all. Obviously, this can only make sense if the universe of possible results is rather limited, and indeed it is.
Verix’s approach assumes a classical star set-up: A single measure/fact table and a complexly hierarchical set of dimensions. Verix looks exhaustively at time series on the facts, pulls out all series that are showing anomalies in two or more dimensions at once, and isolates exactly the point in the dimension network where the anomaly is occurring. If sales of frobalizing widgets in Houston are off plan, it identifies whether this is really a Houston issue or a Texas one, and whether it’s a problem just for frobalizers or – gulp – for the entire widget category.
The company claims that some insights you get this way just wouldn’t have been found by conventional BI. E.g., if frobalizers are down in Houston but up in Dallas, and the analysis stopped at Texas, nobody (not even the Houston district manager?) would ever know of the great Houstonian frobalizer downturn.
The company sounds like they’re working on all the right things to generalize this model. Initial interest in what they have seems to be concentrated in the pharmaceutical and CPG (Consumer Packaged Goods) industry, although there are a couple of paying telecom customers as well. One thing pharma and CPG have in common is that a lot of your raw data comes from third parties, such as IMS, and so your sales data are visible to your competitors anyway. Given that, it’s easy to believe that the SaaS nature of the service isn’t causing a lot of customer discomfort.
And by the way, IT departments aren’t involved in the Verix buying process whatsoever.