When companies back major digitalization initiatives and invest in new technologies, it’s another way of saying that they want to be data driven in their transactional operations and in their business intelligence.
However, no matter how scintillating a new technology is, it will be only as good as the data that drives it. This is a main reason why data management, as it has over the past five years, has continued to dominate CIOs as a top concern.
Are we winning or losing the data management battle?
In 2023, healthcare experts reported that “as much as 95% of hospital data goes unused,” and it is likely that high percentages of unused data plagued other industry sectors as well.
Also in 2023, only 16% of organizations surveyed believed that data had been successfully integrated into their business processes and that the data was actively being used for decision making.
Finally, there are the AI systems that everybody wants — yet, how soon will they get them if data is a problem?
“GenAI is NOT a pure data science problem. It is equally a DATA problem,” writes Chad Anderson, CEO at Gable.ai. “Data is fuel for the model, in the same way a healthy diet is fuel for an athlete. If garbage goes in, then garbage comes out.”
Most CIOs I talk with confirm this. Consequently, they are unsure as to how much they are willing to trust their data, and they understand that data preparation, integration and management are still works in progress.
Drafting a Data Battle Plan
For most organizations, achieving high quality, fully integrated and trustworthy data is a battle. It therefore requires a battle plan.
A majority of companies find that they already have battle plans. Unfortunately, these plans tend to address data only on certain fronts in the battlefield. They lack an overall approach to data that can successfully bring all data under universal, high-quality management.
There are data purity, governance and security standards that are set forth as SLAs for data vendors.
There are ETL (extract-transform-load) rules and operations that IT defines whenever corporate data is moved from one data repository to another, and that ensure that the data being moved is first cleaned, prepared and formatted for the target data repository before it is integrated into that repository.
There are programmed routines that edit and verify data throughout the day as workers use applications and databases.
In short, there is a lot being done already to assure that data is of high quality and can be used. Yet, CIOs, IT staffers and end users still have reservations that the data they use is of high and trustworthy quality.
Why is this?
A Plan of Attack
Disparate data
In 2023, three out of four companies reported that internal collaboration was hindered because of data silos.
Individual pools of data in user departments create inconsistencies between data and business decisions. They also produce disparate forms of data that can’t be integrated into a common data repository without undergoing ETL.
The plot thickens when data is ingested from outside vendor sources that potentially represent data in alternate formats. This data must also be ETL’d.
Knocking down data silos is one way that companies can help achieve data unity. Another way is by automating all data intake processes with ETL so that data is normalized before it ever enters a data repository.
Lack of data control
In 2024, data generation reached 361 billion emails sent daily, 16 million texts sent every minute, and 378.77 million terabytes of data created daily. Data is streaming into enterprises at enormous volumes and velocities and not all of it is useful.
There are companies that are afraid to lose data because they think it could be useful “some day”. However, it’s also important to control the data flow by determining what you need to keep and what you don’t. For instance, in network communications, it’s not useful to maintain all data in the stream, including handshakes and other jitter that goes on between devices. Eliminating some of the metadata from the flow seems like a straightforward thing to do, but too many companies aren’t willing to do it.
Organizing data
Approximately 80% of data in companies is now unstructured, meaning that this data comes in with no data key, metadata, etc., that would be needed to manage or access it in a meaningful way.
Getting unstructured data under control so it can be utilized by the enterprise is the number one data management challenge for most companies, because it takes time (human time, in most cases) to develop keys or tags for the data, in some cases transforming the data into structured data.
Without taking this first step toward organizing data, businesses will be unable to manage, mine or use the data they collect.
Security
IBM’s average estimated cost of a data breach in 2024 was $4.88 million. If organizations are going to avoid data breaches, their governance and security policies and practices must be airtight and up to date, and security safeguards around data must be robust. This includes not only protecting internal data repositories but also assuring that data incoming from and outgoing to third parties and the cloud are properly secured and, when in transit, preferably encrypted. Additionally, companies should set aside dollars for conducting annual (at a minimum) cyber and internal audits, using outside firms to do these.
Conclusion
Data management is a foundational piece for digitalization, AI, automation, new system deployment and edge computing. There is virtually no part of the enterprise that data doesn’t touch.
This might be why CIOs and IT leaders wring their hands in frustration when they think about how they will get their arms around all of this data. However, in the course of their frustration, it’s also time to take stock of the steps that have already been taken to better manage data, whether it’s been rendering unstructured data usable, normalizing data so it can work with more than one system, or even knocking down a data silo or two.
What now could greatly benefit these companies is the orchestration of a complete data management plan. This plan would undoubtedly reveal holes in the data management battle lines that need to be filled, but it will also reveal those areas where true progress has been made.