The data decide on the success of AI projects. A central quality criterion is their degree of maturity. Different AI methods can be used depending on this, which offers companies different possibilities. From simple data and structure analyses to the creation of forecasts to the complete automation of complex processes. A data strategy forms the basis for achieving a high level of maturity and thus the success of AI projects.
In the past, AI systems worked primarily with algorithms. Expert systems are an example of this. Algorithms are deterministically structured, comparable to hard-wired solutions. In the meantime, however, data-driven processing is in the foreground in AI. Algorithmic challenges and limitations can be overcome based on neural networks, deep learning, or reinforcement learning.
Data-driven processing processes initially require that data is available in the appropriate quantity and quality. The higher the degree of data maturity, the greater the added value of AI systems. An increase in data maturity typically occurs in the stages of data selection, data cleansing and improvement, data labelling, and data preparation to develop reinforcement learning models.
With labelling, data is assigned target values, such as a particular category for an image or a feeling for a voice sequence. Reinforcement learning aims to train intelligent agents based on a reward system and use them for complex decision-making situations. New, previously unrecorded decision situations can be generated automatically using so-called Monte Carlo simulations and used for advanced agents’ training to achieve greater decision-making security.
Table of Contents
Example AlphaGO
A classic example of using large amounts of data and modern AI learning methods such as reinforcement learning is Google’s AlphaGO development. It was the first system to beat professional players of Go’s Chinese board game. It should be noted that Go is significantly more complex than chess, for example, due to the size of the board and the higher number of possible moves. As a result, Google also had to invest a great deal of time and money developing the database. Based on this database, Google had two Go agents play against each other repeatedly to improve their skills for developing the new AlphaGO Zero version. The AlphaGO Zero Agent outperformed the original AlphaGO Agent by a wide margin.
For companies, the following approach is recommended when improving data maturity from a cost perspective:
- Systematic collection of data.
- Purchase of pre-trained models or labelled datasets.
- Building your resources. This process should result in mature data, i.e., data that can be used to train AI models – without any further manual work.
Also Read: The Impact Of Artificial Intelligence In HR
Three AI Methods At A Glance
“Learning from data” is, therefore, the task. Depending on the degree of data maturity, a company can use different AI methods: from unsupervised learning with relatively immature data to supervised learning with labelled data to reinforcement learning with a realistic evaluation system and a database supplemented by Monte Carlo simulations enriched.
Unsupervised learning aims to enrich data for analytical projects and identify data structures. Typical areas of application are the grouping of data, the reduction of dimensions, the identification of patterns, data compression, and methods of natural language processing.
The methods and algorithms used include:
- Cluster analysis, especially k-means, hierarchical methods, Kohonen self-organising maps, growing neural gas
- Principal Component Analysis
- Multidimensional scaling
- NLP methods, in particular TF-IDF (Term Frequency – Inverse Document Frequency), topic analysis
Supervised learning aims to create forecasts or automatically recognize images, language, and moods. Typical application areas are classification, regression, and time-series analyses.
Methods and algorithms used include:
- Linear/Logistic Regression, Decision Tree
- Neural Networks, Gradient Boosting, Random Forest
- Deep Learning, CNN (Convolutional Neural Network), LSTM (Long Short-Term Memory)
Ultimately, reinforcement learning is about fully automating complex processes or courses of action. Typical activities concern the classification and evaluation of situations and alternative courses of action, the modelling of rewards or punishments, and the development of AI agents. During the application phase, the model can learn by repeatedly making alternative decisions at random and evaluating their rewards.
Here, too, a wide variety of methods and algorithms are used, depending, among other things, on whether the decision space is discrete or continuous. Examples are:
- Monte Carlo simulation
- DQN (Deep Q Learning)
- SARSA (State Action Reward State Action)
- DDPG (Deep Deterministic Policy Gradient)
- On/Off Policy Algorithms
- Model-Based/-Free algorithms
A critical difference between supervised learning and reinforcement learning is that supervised learning has two phases: training and forecasting. On the other hand, reinforcement learning and forecasting run in parallel. The proportion of learning new situations to use what has already been learned (explore vs. exploit) is successively reduced during the training.
Concerning the differences between the individual AI processes and the individual level of maturity of the data, a company must also consider one crucial aspect: the higher the level of maturity, the higher the costs. Data engineers usually do the data preparation of the unsupervised learning method, and data scientists are carried out fully automatically. The cost aspect only plays a subordinate role here. Whether the labelling of the supervised learning procedures has to be carried out automatically or manually, the costs for data preparation can increase significantly, especially when domain know-how is required for labelling, such as speech recognition models. Amazon and Co. have had to hire many linguists to develop their language assistants in recent years. The data preparation for developing a reinforcement agent is usually very complex.
On the one hand, domain know-how is required to develop a realistic rating system. On the other hand, alternative courses of action can often only be classified and evaluated manually or partially automatically. Both are cost drivers in data processing. An example of this is the cost of an autonomous driving system. The actual cost driver is not the development of the system itself but the preparation and provision of the data required for it.
There are, therefore, also new approaches in the development of AI models concerning the cost aspect. An example of this is self-supervised learning or weak supervision. In principle, supervised learning models require manual and time-consuming data labelling, which is associated with high costs. The Weak Supervision process takes up this challenge. Unstructured or imprecise data is automatically labelled to be used in supervised learning. The results are a cost reduction and process acceleration in model development.
Also Read: The Role Artificial Intelligence Plays In Cybersecurity
AI Usage Options Based On Data Maturity
But how does the degree of maturity of data as a measure correspond precisely to real AI application scenarios? A customer example from the insurance industry shows the differences.
An insurance company’s relevant customer data includes demographic, contracts, contact history, claims reports, and settlements. At a low level of maturity, only operational data were systematically recorded, but not the contact history or customer feedback. There was no systematic evaluation of reports, pictures, and expert opinions. Even with the operative data, it was possible to develop forecast models, for example, for retention, but they were very imprecise due to the lack of data, especially customer reactions.
For this customer, CGI developed a data strategy to systematically collect data from the different areas of the company and combine it for AI applications – for example, concerning contact history, customer feedback, and internal regulatory efforts. In addition, projects were initiated to analyse and evaluate damage reports and evaluate reports and expert opinions using NLP methods. On this basis, efficient models for determining fraud probabilities and estimating the amount of damage could be developed.
Overall, the enormous potential of internal and external data sources can be used with AI methods. The degree of data maturity is of crucial importance. If no suitable database is available, there is no successful AI implementation.