It has been nearly two years since I got a briefing from KXEN (see my KXEN First Look here) so I was glad to get an update recently. KXEN was founded in 1998 and is a VC funded, privately held company headquartered in San Francisco. Their revenue is evenly split 45/45 between Europe and America with Asia Pacific making up the difference and growing rapidly, especially in Japan. They have over 400 customers worldwide in telecoms, financial services and retail with a strong focus on customer experience management. Customers include Vodaphone, Orange, US Cellular, Wells Fargo, HSBC, Barclays Fidelity Experian, Rogers, Cox, Discover and Lowes. They have customers building a wide range of solutions beyond CRM lifecycle management however, such as ad word bidding and electrical supply optimization for instance. Their customers range in size from very large to much more modest.
From a solution perspective, Telco is all about cross-sell, up-sell, churn (lots of different kinds of products); financial services focused on direct marketing optimization mostly, though some fraud and risk models especially outside the US; retail is focused on some classic CRM like direct marketing, email campaigns, banner ads and so on but also doing work around store performance, size and color optimization.
KXEN’s core focus is on cutting modeling time. KXEN often get in the door because modeling teams are overwhelmed and looking for a way to do some of the models quicker. They have a strong emphasis on the need to industrialize and create a “factory” mindset for the creation and deployment of models. Something, of course, I agree with (see this post on the need to industrialize analytics). Key partners include Teradata and MicroStrategy.
From a product perspective they see an ever increasing need for speed. The web and cloud computing are changing customer relationships – making it easier for new companies to come to market, easier for customers to find your competitors and churn etc. There is an explosion of channels combined with faster business and product cycles. Getting the right product to the right customer at the right time is harder than ever before. The amount of valuable customer data being collected is higher than ever (though in many cases only small amounts are actually being used) and the data must be increasingly applied in real time (with quarterly product catalogs planned months in advanced being replaced with complex offers delivered on the web in real time for instance). What all this does for companies that “get” analytics is generate a massive increase in the demand for analytic models. At the same time a whole new class of companies – smaller or without a predictive analytics team – have realized they need analytics and are trying to leapfrog to get to best in class fast. For these customers traditional approaches take too long – 6-8 weeks to build a model is too slow and too resource intensive.
KXEN customers talk about getting 7 times or 10 times this productivity by focusing KXEN’s automation on critical tasks like data preparation. As the Rexer survey shows year after year, modelers spend a long time in data preparation, modeling, deployment and management of models with only 30% focused on the tasks where modelers can really add the most value – problem analysis and evaluating results. Given the need to get the best decision possible, modelers need to focus on these activities and this means eliminating what KXEN refer to as “the modeling bottleneck”.
For KXEN the critical thing is getting good models from people who are not PhD statisticians. They see companies trying to go from needing 10 models to, for example, 1,600 a year because they want to do regional, product and customer segment specific models. For these customers it is no longer an issue of how good a model an expert data miner might build as there are simply too many models. It is about a reality of needing good enough models at scale.
10 years ago, KXEN delivered its core data mining automation product to provide customers an order of magnitude improvement in time to produce a model. But the rate of increase in demand for models has continued to grow. KXEN’s new product capabilities are around Analytical Data Management and are aimed at addressing this massive expansion of demand. As a point of reference they talk about how BI solved a similar problem with exploding report demand – BI tools added a semantic layer that allowed a business user to create their own reports. KXEN has introduced a semantic layer for predictive modeling.
KXEN Analytical Data Management (ADM) lets an analyst define the metadata once, select a time-stamped population and then build an analytic data set (ADS) automatically. Previously each model would have required the construction of its own ADS. Now modelers can create a model of the business entity and a map to the data from which they can generate multiple ADS.
Then they can use the modeling automation to build models using these ADS. As demand for models rises, typically for more and more granular models, the payoff gets greater. Once you have a lot of models, of course, managing and refreshing them becomes an issue. The KXEN Modeling Factory (KMF) refreshes these ADS and models automatically, redeploying updated models into production while generating alerts for deviations or other problems.
Generally modelers start with an object model, implemented in a set of databases. ADM connects to a database and allows you to define analytical objects against this data. Each analytical object is associated with an Analytical Record (AR) that groups the attributes for that object – for customers, say, or accounts. Attributes are color-coded by domain and each attribute could be read from a specific table or calculated in a defined way. Typical ADS flattening is modeled into this and the analyst can define many calculated fields in minutes that might be relevant. Defining a field allows modelers to specify joins, filters, running totals over time and more. These definitions can result in multiple fields (such as when a modeler says they want a total for the last 6 months the tool will create 6 fields for 1,2,3,4,5 and 6 months). All these field definitions are managed, versioned etc.
Once the AR is developed, a user can create a time-stamped population, select a target to train the model and build models using the automated tools. Time stamps can be absolute or relative like 3 months after each person became a customer or 3 months before they churned. For instance, to build a model that predicts who will accept an offer the user needs to select the population who end up with the product and specify a timestamp 3 months earlier.
This involves creating the third ADM element – timestamp populations. Timestamp populations involve navigating the database, linking tables and defining filters. This is a parallel object to go with the analytical record.
These take advantage of the range of KXEN coders. The text coder can be used to extract meaning from unstructured text, using common text analytics features like stop word removal, stemming etc. The event log can create event aggregates for multiple time periods for continuous variables, using the dates in the timestamp population. The sequence coder finds categorical groups and automatically puts rare categories into an “other” category if this makes sense. And the social network analysis tools automate network creation from data containing pairwise links of various kinds.
The user can specify a target to predict (which can be a calculated field) and the model is then built through the core KXEN tools using the ADS that is generated (and created in the database) based on the population definition. Modeling techniques include robust regression (automated routines that handle potentially thousands of input attributes), segmentation and clustering (optimized for semi-supervised or targeted clustering), time series (handling trends, periodicity and seasonality), and association rules.
Once the model is built, the user can review the predictive variables, the predictive power of the models and the robustness of the model. The model can be reviewed to see how variables contribute, how values distribute across these variables, how different values correlate to results, lift curves and so on.
Models can be generated as Java, SQL Code, SAS code, PMML etc using KXEN Model Export (KMX). The generated SQL is focused only on the predictive variables and runs in database. Simulation of examples is available (and available through an API) and the Modeling factory can manage refreshes.