What is Data Synchronization + How to Keep Data in Sync

What is Data Sync / Data Synchronization in app development?

Data Synchronization (Sync) is the process of establishing consistency and consolidation of data between different devices. It is fundamental to most IT solutions, especially in IoT and Mobile . Data Sync entails the continuous harmonization of data over time and typically is a complex, non-trivial process. Even corporates struggle with its implementation and had to roll back Data Sync solutions due to technical challenges.

The question Data Sync answers is

How do you keep data sets from two (or more) data stores / databases – separated by space and time – mirrored with one another as closely as possible, in the most efficient way?

Data Sync challenges include asynchrony, conflicts, slow bandwidth, flaky networks, third-party applications, and file systems that have different semantics.

Data Sync versus Data Replication in Databases

Data replication is the process of storing the same data in several locations to prevent data loss and improve data availability and accessibility. Typically, data replication means that all data is fully mirrored / backed up / replicated on another instance (device/server). This way, all data is stored at least twice. Replication typically works in one direction only (unidirectional); there is no additional logic to it and no possibility of conflicts.

In contrast, Data Sync typically relates to a subset of the data (selection) and works in two directions (bi-directional). This adds a layer of complexity, because now conflicts can arise. Of course, if you select all data for synchronisation into one direction, it will yield the same result as replication. However, replication cannot replace synchronization.

Why do you need to keep data in sync?

Think about it – if clocks were not in sync, everyone would live on a different time. While I can see an upside to this, it would result in many inefficiencies as you could not rely on schedules. When business data is not in sync (up-to-date everywhere), it harms the efficiency of the organization due to:

Isolated data silos
Conflicting data / information states
Duplicate data / double effort
Outdated information states / incorrect data

In the end, the members of such an organization would not be able to communicate and collaborate efficiently with each other. They would instead be spending a lot of time on unnecessary work and “conflict resolution”. On top, management would miss an accurate overview and data-driven insights to prioritize and steer the company. The underlying mechanism that keeps data up-to-date across devices is a technical process called data synchronization (Sync). And while we expect these processes to “just work”, someone needs to implement and maintain them, which is a non-trivial task.

Growing data masses and shifts in data privacy requirements call for sensible usage of network bandwidth and the cloud. Edge computing with selective data synchronization is an effective way to manage which data is sent to the cloud, and which data stays on the device. Keeping data on the edge and synchronizing selective data sets effectively, reduces the data volume that is transferred via the network and stored in the cloud. Accordingly, this means lower mobile networking and cloud costs. On top, it also enables higher data security and data privacy, because it makes it easy to store personal and private data with the user. When data stays with the user, data ownership is clear too.

Unidirectional Data Replication

Bidirectional Data Synchronization

Out-of-the-box Sync magic: Syncing is hard

Almost every Mobile or IoT application needs to sync data, so every developer is aware of the basic concept and challenges. This is why many experienced developers appreciate out-of-the-box solutions. While JSON / REST offers a great concept to transfer data, there is more to Data Sync than what it looks like at a glance. Of course, the complexity of Sync varies widely depending on the use case. For example, the amount of data, data changes, synchronous / asynchronous sync, and number of devices (connections), and what kind of client-server or peer-to-peer setup is needed, all affect the complexity.

iceburg-building-data-synchronization

What looks easy in practice hides a complex bit of coding and opens a can of worms for testing. For an application to work seamlessly across devices – independent of the network, which can be offline, flaky, or only occasionally connected – an app developer must anticipate and handle a host of local and network failures to ensure data consistency. Moreover, for devices with restricted memory, battery and/or CPU resources (i.e. Mobile and IoT devices), resource sensitivity is also essential. Data storage and synchronization solutions must be both effective / efficient, and sustainable.

How to Keep Data in Sync Without the Headache?

Thankfully, there are out-of-the-box data synchronization solutions available on the market, which solve data syncing for developers. They fall broadly into two categories: cloud-dependent data synchronization, and independent, “edge” data synchronization. Cloud-based solutions, like Firebase, require a connection to the internet to function. Data is sent to and requested from the cloud constantly. Edge solutions, like ObjectBox, also offer “Offline Sync”: Data is stored in an efficient on-device database, synchronization on and between edge devices can be done continually without an Internet connection, and Dat Sync with a cloud or a backend that is not located on premise occurs once the device(s) goes online. Below, we summarize the most popular market offerings for data synchronization (offline and cloud based):