This is how the future of data analytics should be like, says Infosys’ Data Analytics AVP

This is how the future of data analytics should be like, says Infosys’ Data Analytics AVPWhen you say self service analytics, one normally thinks of tools like tableau or spotfire. These are tools that allow the user to slice and dice reports and dashboards, and drill down into them.

These are all self service descriptive analytics. However, the need of the hour is to make predictive analytics also self service. Business users should not have to reach out to statisticians every time they need small changes to a predictive model.

Now how would that work? Imagine a business user wanting to cluster retail stores in the state of Florida using self service analytics. She picks a store clustering model created for that retailer for clustering stores across the US.

However since the retailer has multiple formats and she wants to run it only for the supermarket format, she leaves out the Convenience or Express formats and also the Hyper formats. She also selects the state as Florida. Having done this she is allowed to select the variables that she wants to use for the clustering.

These variables form a superset, however, when the model runs it selects the most relevant variables from this set based on a principal component analysis. She is also allowed to enter the number of clusters she wants or a range of the number of clusters. And the tool selects the correct number in that range. The model obviously needs to have been written in a fashion where the parametric inputs can be taken.


While that sounds perfectly alright it may not be very simple for 5 major reasons:

1. If business users change scope or variables in a predictive model what does that do to the accuracy of the model?

2. Since data behaviour drives the choice of algorithms in a model how does that happen when a business user selects a new variable?

3. Advanced models need significant data science knowledge. How would such models operate in a self service mode?

4. Statistical models are hardcoded based on the business problem they are trying to address. How would a business user change scope or variables in such a case?

5. When a model is run and an output is generated there are various measures which indicate the confidence and reliability of the results. How would a business user understand these measures?

The above challenges are not without basis and they have led to self service analytics platforms being utilised as analytics tools for citizen data scientists instead of business users, and on the other hand, user driven Self Service Analytics being limited to simple regular operational models that have lower risks from minor loss of accuracy and a high iteration tendency.

Analytics platforms today

A couple of years back when we spoke of analytics platforms it was assumed you were talking of big data platforms with connections to analytical interfaces. Today many Enterprises are already using analytics platforms which allow easy model building and data set application to models. SAS, IBM, KNIME, Rapidminer Alteryx, H2O are all analytics platforms that enterprises are using today to perform Analytics without necessarily writing code directly or re-writing code for updated datasets. Today most of these analytics platforms are being used by junior data scientists or, in some cases citizen data scientists.

A large retailer with a very large number of stores has leveraged platforms such as the above to use predictive analytics in site selection for new stores without creating new models every time there is a site selection exercise. A major low cost airline has leveraged platforms such as the above to conduct Diagnostic analysis for their airplanes and have also made analytical models available to the right analysts, to be used repeatedly. Another major retailer has used such platforms to ensure their analysts have a quick learning curve in doing statistical analysis especially in Customer analytics.

A global auto major is using such platforms to reduce replication in their model building, so that statistical models such as for dealer network optimization can be applied to new data sets without being rewritten, thus saving crucial analyst time. A major fast food chain is using such platforms to ensure consistency in their analysis of outlet location and competitive as well as cannibalizing locations. This is achieved by reusing similar models across business regions and groups.

But very few enterprises have started to use the same analytics platforms to try and develop self service analytics that business users can use. Irrespective of the challenges listed earlier, the way to effectively create business user self-service is by adopting the following:

1. A paradigm shift to realise that business users do significant amount of analysis today, it is just that they do it using Excel. They are not mathematically challenged as a rule and as businesses become more analysis driven, this will be even more true.

2. An analytical code written in SAS or R or any other tool is finally software code at the end of the day. If written correctly any software code can be parameterized to allow user input at every instance of execution.

3. Even if there are minor accuracy losses by allowing business users to run statistical models using modified inputs, these accuracy losses are minimal when compared to the risk that decisions will be taken by Business users on gut feel and not using analytics. Since self-service analysis will increase the amount of data based decision making in the firm, minor accuracy losses will be acceptable.

4. Contextual help buttons on the platform not only ensure the business users can start to comprehend any statistical terms on the page, but also serve the purpose of providing training to the business users and elevating their analytical abilities.

We can see that across industries we've started to make analytics easier and more repeatable. Making it amenable to business user self service is a logical end in this journey of making analytics scalable, and the best proof is that some enterprises are already getting there. The only hurdles are in our mind.

(The article is authored by Subhashis Nath, AVP - Senior Industry Principal, Data Analytics, Infosys)