Distributed Synchronization Under Data Churn
MetadataShow full item record
Nowadays an increasing number of applications need to maintain local copies of remote data sources to provide services to their users. Because of the dynamic nature of the sources, an application has to synchronize its copies with remote sources constantly to provide reliable services. Instead of push-based synchronization, we focus on pull-based strategy because it doesn’t require source cooperation and has been widely adopted by existing systems. The scalability of the pull-based synchronization comes at the expense of increased inconsistency of the copied content. We model this system under non-Poisson update/refresh processes and obtain sample-path averages of various metrics of staleness cost, generalizing previous results and studying its statistical properties. Computing staleness requires knowledge of the inter-update distribution at the source, which can only be estimated through blind sampling – periodic downloads and comparison against previous copies. We show that all previous approaches are biased unless the observation rate tends to infinity or the update process is Poisson. To overcome these issues, we propose four new algorithms that achieve various levels of consistency, which depend on the amount of temporal information revealed by the source and capabilities of the download process. Then we focus on applying freshness to P2P replication systems. We extend our results to several more difficult algorithms – cascaded replication, cooperative caching, and redundant querying from the clients. Surprisingly, we discover that optimal cooperation involves just a single peer and that redundant querying can hurt the ability of the system to handle load (i.e., may lead to lower scalability).
Li, Xiaoyong (2016). Distributed Synchronization Under Data Churn. Doctoral dissertation, Texas A & M University. Available electronically from