Version vector

From Wikipedia, the free encyclopedia

A version vector is a mechanism for tracking changes to data in a distributed system, where multiple agents might update the data at different times. The version vector allows the participants to determine if one update preceded another (happened-before), followed it, or if the two updates happened concurrently (and therefore might conflict with each other). In this way, version vectors enable causality tracking among data replicas and are a basic mechanism for optimistic replication. In mathematical terms, the version vector generates a preorder that tracks the events that precede, and may therefore influence, later updates.

Version vectors maintain state identical to that in a vector clock, but the update rules differ slightly; in this example, replicas can either experience local updates (e.g., the user editing a file on the local node), or can synchronize with another replica:

  • Initially all vector counters are zero.
  • Each time a replica experiences a local update event, it increments its own counter in the vector by one.
  • Each time two replicas a and b synchronize, they both set the elements in their copy of the vector to the maximum of the element across both counters: . After synchronization, the two replicas have identical version vectors.

Pairs of replicas, a, b, can be compared by inspecting their version vectors and determined to be either: identical (), concurrent (), or ordered ( or ). The ordered relation is defined as: Vector if and only if every element of is less than or equal to its corresponding element in , and at least one of the elements is strictly less than. If neither or , but the vectors are not identical, then the two vectors must be concurrent.

Version vectors[1] or variants are used to track updates in many distributed file systems, such as Coda (file system) and Ficus, and are the main data structure behind optimistic replication.[2]

Other mechanisms[]

  • Hash Histories [3] avoid the use of counters by keeping a set of hashes of each updated version and comparing those sets by set inclusion. However this mechanism can only give probabilistic guarantees.
  • Concise Version Vectors [4] allow significant space savings when handling multiple replicated items, such as in directory structures in filesystems.
  • Version Stamps [5] allow tracking of a variable number of replicas and do not resort to counters. This mechanism can depict scalability problems in some settings, but can be replaced by Interval Tree Clocks.
  • Interval Tree Clocks[6] generalize version vectors and vector clocks and allows dynamic numbers of replicas/processes.
  • Bounded Version Vectors [7] allow a bounded implementation, with bounded size counters, as long as replica pairs can be atomically synchronized.
  • Dotted Version Vectors [8] address scalability with a small set of servers mediating replica access by a large number of concurrent clients.

References[]

  1. ^ Douglas Parker, Gerald Popek, Gerard Rudisin, Allen Stoughton, Bruce Walker, Evelyn Walton, Johanna Chow, David Edwards, Stephen Kiser, and . Detection of mutual inconsistency in distributed systems. Transactions on Software Engineering. 1983
  2. ^ David Ratner, Peter Reiher, and Gerald Popek. Dynamic version vector maintenance. Technical Report CSD-970022, Department of Computer Science, University of California, Los Angeles, 1997
  3. ^ ByungHoon Kang, Robert Wilensky, and John Kubiatowicz. The Hash History Approach for Reconciling Mutual Inconsistency. ICDCS, pp. 670-677, IEEE Computer Society, 2003.
  4. ^ Dahlia Malkhi and Doug Terry. Concise Version Vectors in WinFS.Distributed Computing, Vol. 20, 2007.
  5. ^ Paulo Almeida, Carlos Baquero and Victor Fonte. Version Stamps: Decentralized Version Vectors. ICDCS, pp. 544-551, 2002.
  6. ^ Paulo Almeida, Carlos Baquero and Victor Fonte. Interval Tree Clocks. OPODIS, Lecture Notes in Computer Science, Vol. 5401, pp. 259-274, Springer, 2008.
  7. ^ José Almeida, Paulo Almeida and Carlos Baquero. Bounded Version Vectors. DISC: International Symposium on Distributed Computing, LNCS, 2004.
  8. ^ Nuno Preguiça, Carlos Baquero, Paulo Almeida, Victor Fonte and Ricardo Gonçalves. Brief Announcement: Efficient Causality Tracking in Distributed Storage Systems With Dotted Version Vectors. ACM PODC, pp. 335-336, 2012.

External links[]

Retrieved from ""