Data Gravity is a phenomenon that occurs as the volume of data stored by your company increases. Find out what the consequences are and how to deal with them.
Data-driven applications such as artificial intelligence, Internet of Things or Big Data analytics are now used by many companies in all industries for growth. However, by taking advantage of these innovations, companies must also face the concept of “Data Gravity” or data gravity.
The Data Gravity concept was invented in 2010 by software engineer Dave McRory. The idea is that data and applications are attracted to each other, the way objects are attracted to each other by the law of gravity. It is, therefore, a metaphor aimed at imaging this phenomenon.
As the data sets grow, they become increasingly difficult to move. Just as Earth’s gravity keeps us on the ground, the datasets are kept in the same location. This also concerns other elements such as applications and processing power which are attracted to the data.
Using technologies such as mobile devices or IoT, digital transformation is generating huge volumes of data within companies. It is impossible to manage such volumes using a traditional approach.
Usually, data analysis platforms and applications are incorporated into their own software and hardware stacks. Data, on the other hand, is stored in DAS (direct-attached storage) systems. It is, therefore, necessary to migrate this data to analytical platforms such as Splunk, Hadoop or TensorFlow to analyze it.
However, this practice becomes unthinkable for massive volumes of data distributed between different storage systems. Data migration to analytical clusters then becomes complex, costly and risky. This is particularly the case if you want to launch an analysis tool in the Cloud for data stored on-site, and vice versa.
Therefore, businesses must take action to deal with the gravity of the data. It is necessary to change the design of IT architectures.
The first step is to design architecture around scale-out network-attached storage (NAS) platform allowing data consolidation. The platform must imperatively integrate security, data protection and resilience functionalities. Access to the data sets must be strictly controlled, and the data must remain available even in the event of a failure. In addition, all applications and data must benefit from this protection in a uniform manner. This is the advantage of keeping only one copy of the data in a consolidated system.
In addition, this dedicated data platform must be highly extensible. If your storage needs increase massively, it must be able to adapt to it. However, it is important that costs do not increase as quickly as the volume of data.
This is why the platform must also allow storage optimization, both in terms of performance and capacity. To do this, it must be possible to choose between different third-party storage. The faster and more expensive thirds will be used for the most frequently accessed data, while the higher and cheaper thirds will be used for cold data. The system must be able to automatically determine which the third party to store each data.
Finally, it is essential that your data platform is compatible with a wide variety of analysis and AI platforms and applications. It should support the ones you use today, but also the ones you will use tomorrow. An architecture corresponding to these criteria will allow you to face the problem of data gravity for your Big Data and AI solutions on a large scale.