I recently caught up with Alteryx,Inc., a company focused on democratizing Big Data – on how to get better decision making and data management at scale in an era of Big Data. Alteryx was founded in 2010 (and came from the origins of a company called SRC) and has 130 employees in North America.
When it comes to exploiting Big Data, clearly we are not simply going to find enough data scientists but Alteryx’s point of view is that there are lots of data analysts and if these folks can be empowered then more could be done. As the Big Data market shifts from technical infrastructure to business outcomes, they see companies wanting to focus on predictive analytics, on making analytics more accessible and on delivering these analytics without requiring a lot of infrastructure.
Alteryx has about 300 enterprise customers across all industries with a particular focus in telco, Managed Service Providers and B2C “storefront” companies in retail, restaurants and real estate companies.
The platform supports:
- Data integration.
Using a data workflow engine that brings in all relevant data regardless of format the Alteryx platform can ingest structured data, log files, cloud data, social and other unstructured data and device data without involving IT or data scientists.
Users can enrich this with data from external sources e.g. household data, company data and location data from Alteryx partners such as the U.S. Census, Experian, D&B and TomTom.
This is then rapidly analyzed using common predictive analytics based on foundational R components.
- Create and share Apps in a private of public cloud.
Architecturally, the design time editor is an installed desktop product that can deploy the resulting designs to private or public clouds. At its heart this is a visually oriented data pipelining product with a wide range of database connectors as well as support for file-based sources. This pipelining environment allows joins, fuzzy matching, parsing and transformation. The designer can drop browse tools into the workflow to take a look at the actual records at various points once they start running data through. Large workflows can be managed and the relative amount of data in each branch is shown. Report nodes can also be dropped into the flow for more formal presentation.
These flows can be packaged up and parameterized for deployment using a wizard interface. These apps can be made available publicly or privately within a company. They expose a simple UI based on the parameters defined and can have multiple tabs to manage these parameters. These apps can only be published in ways that match the data they need, and Alteryx provides mechanisms for customers to run Alteryx inside their firewall and then to push data to Amazon to support a more public app.
Apps can be made unpackable so that it can be edited/extended in the designer or they can be included as a black box, password protected etc. This protection can be nested, allowing different levels of unpacking for different potential users. Locking and transparency are handled separately, allowing transparent but locked usage for instance.
These apps and workflows can result in visualizations, reports or just a set of data. They can be deployed as user interface-focused apps but the apps automatically expose a JSON interface in addition allowing programmatic access to the same apps. The workflow of an App can also write output to an API, run external commands, etc. for more programmatic use cases. Apps can also write data back to the sources and use in-database analytics transformations in the pipeline (Hortonworks, Teradata and Aster are supported so far).
From a predictive analytics perspective, Alteryx supports a wide range of data preparation functions (imputing values, filtering, transformations, regular expressions and date time etc) and has bundled R tools for various data investigation tasks (contingency tables, scatter plots etc) as well. Decision Trees, Random Forest, Linear and Logistic Regression, clustering and some time series functions are also included. All these R bundles are wrapped and managed to make it easy to configure and use them – they can be seamlessly included in a pipeline. Alteryx bundles more than 30 R-based Predictive tools to easily drag and drop predictive functionality into an analytics model (see this blog from Dr. Dan Putler of Alteryx for more details).
You can get more information on Alteryx here and Alteryx is one of the vendors in our Decision Management Systems Platform Technology Report.