“Hello, World!” — Automatically infer understandable and actionable classification models with the Fraunhofer IAIS RuleCreator
The rite of passage for any new programmer is to write a program that outputs the message “Hello, World!”, usually in one line of code. A similar rite of passage exists for data scientists and the exercise nearly always involves analysing the Iris flower data set, collected in the 1930s, observing three different species of Iris flowers. However, the typical data science “Hello World” exercise cannot be completed with just one line of code as this example demonstrates with more than 50 lines of code.
At Inspirient, we often get asked “what exactly are you automating?” Our answer is that we reduce the 50 or so lines of code a data scientist would have written in possibly one hour of work down to a single click. We do this by using AI and related technologies to fully automate the technical Data Science decision making process, in this specific case by employing the Fraunhofer IAIS RuleCreator for automated model inference [1].
Inferring a model from the observed data is at the very core of Machine Learning, as stated in MIT's ‘Introduction to Machine Learning’. In our initial data science example, the inference process comprises code lines 25 to 35. With the Fraunhofer IAIS RuleCreator, a model can be automatically derived, even with only 150 rows of data of the Iris flower data set. Automated data import, exploratory analysis, model inference, and visualization take no more than 20 seconds for this data set.
The rule-based model derived by the Fraunhofer IAIS RuleCreator is one of many techniques that can be used for building a classifier. The key advantage of such a rule-based classifier is that it is easy to understand by everyone, not only data scientists. This is essential when looking at these techniques as relevant tools for decision makers in business or society.
Automated model inference has a wide variety of use cases, beyond the classification of scientific data. Businesses can, for example, use this technology to better understand the segmentation of the customer base, optimize their processes, or identify early indicators for desirable or undesirable behavior such as fraud.
The Inspirient Automated Analytics Engine automates the entire data analytics process end-to-end: From the assignment of input data, pattern and outlier detection, automated visualization of patterns, weak points and opportunities to automatic generation of textual explanations and recognition of the underlying relationships and rules. Most other analytics solutions rarely include these textual explanations and observations regarding the underlying data relations, which are both critical to provide a deeper level of analysis and more actionable conclusions.
- Daniel Trabold and Henrik Grosskreutz. Parallel subgroup discovery on computing clusters – First results. BigData 2013, p. 575-579