
One primary loophole was that the data lake was built and maintained by a separate engineering or analytics team, which didn’t understand the data in depth as thoroughly as the source teams. Typically, there were multiple copies or slightly modified versions of the same data floating around, along with accuracy and completeness issues. Every mistake in the data would need multiple discussions and eventually lead back to the source team to fix the problem. Any new column added to the source tables would require tweaks in the workflows of multiple teams before the data finally reached the analytics teams. These gaps between source and analytics teams led to implementation delays and even data loss. Teams began having reservations about putting their data in a centralized data lake.
Data mesh architecture promised to solve these problems. A polar opposite approach from a data lake, a data mesh gives the source team ownership of the data and the responsibility to distribute the dataset. Other teams access the data from the source system directly, rather than from a centralized data lake. The data mesh was designed to be everything that the data lake system wasn’t. No separate workflows for migration. Fewer data sanity checks. Higher accuracy, less duplication of data, and faster turnaround time on data issues. Above all, because each dataset is maintained by the team that knows it best, the consumers of the data could be much more confident in its quality.
Why users lost faith in data mesh
But the excitement around data mesh didn’t last. Many users became frustrated. Beneath the surface, almost every bottleneck between data providers and data consumers became an implementation challenge. The thing is, the data mesh approach isn’t a once-and-done change, but a long-term commitment to prepare a data schema in a certain way. Although every source team owns their dataset, they must maintain a schema that allows downstream systems to read the data, rather than replicating it. However, a general lack of training and leadership buy-in led to improper schema planning, which in turn led to multiple teams performing similar actions on the same data, resulting in duplication of data and effort and increased compute costs.

