Location services.
------------------

- Introduction

Overview of location services by a few examples of these location services,
the primitives the provide, and hopefull from this get an understanding
of these services.

I'm going to introduce some common traits and ground rules of these services,
followed by
an overview of Chord.  Ching will examine the two protocols Can and Tapestry.

- Common 
Common features between p2p "location services", examples can, tapestry and
chord.

Location services are the basis of p2p.

Based on a loose network of nodes with no pre determined structure.  These
nodes can freely route to and from each other symmetrically.

The specific location services we examine use overlay networks on top of the
physical network.

These location services have to deal with several issues that dictate their
limitations. Scalability, resilancy, and decentralization.

Scalability:  each node cannot know about every other node, else it would
	not scale to large number of nodes.

Resiliancy: service must be tolerant and degrade gracefully in the presence
	of nodes that appear and disappear arbitrarily.  Aka availability
	where the system as a whole is still available despite consistant 
	change.

Decentralization: services do not have a single point of failure, that is
	responsiblity is shared accross all participants.


Applications could range from a mere lookup service, to a distributed hashing
service, to distributed dns, to the location of
distributed resources like gnapster, or gnutella.  We're going to talk more
about data services on top of these primitives next time with CFS.

common service offerings:
ipaddress lookup(key);

the simple extension of this service is a lookup service that resolves
from a key to a value where the value is stored at the node resolved from
the lookup.

Chord
-----


Chord specifically primitive, given a key, maps the key onto a node.

[Chord's answer to service requirements, ]
Scalabilty, chord only has to know about small number of other nodes O(logN)
nodes in the system.

Resilancy, chord offers O(logN) for effiecient routing, but performance
degrades gracefully if information is out of date, that is if new nodes
have appeared, or the ones it knows about 

Decentralization, all nodes are treated equally in that they all serve
the same role as the next.

- Chord protocol

Chord like we talked about before assigns keys to nodes.  Chord uses
Consistent Hashing which with a high probablity balances load across all
the nodes, nodes roughly recieve the same number of keys.  Chord improves
on consitent hashing in that each node only has to keep routing info
for O(logN) nodes as opposed to N.

Nancy brought up that the logN statement did not make it clear whether
N was maximum possible participants, or the current participant.  She 
continued that it appears the analysis done in the papers seems to be based
on the number of participants, that the algorithm is indeed adaptive.

Keys are assigned by arranging the nodes in a circular ring. Keys and nodes
reside in the same namespace.

Key k is assigned
to the first node in the network that is equal or larger than k.  this node
is known as the "successor of k". aka the answer to the lookup(k).  This
would be the node where you may store data if you were building higher
level services.

The adaptability in this protocol is that when a node enters or leaves the
network keys belonging to the new node are now moved to others.
Order O(K/N) keys move since there are only K keys to begin with, and
distributed amongst the N nodes evenly.  Removing a node would require K/N
keys to be redistributed. P.4

Ring is arranged in this model by every node having a pointer to it's 
successor, i.e the next in the ring.

- Simple key location

Paper talks about basic key location where the query propogates around
the ring until a node is higher value than they key, that is the successor.
Satisfies the lookup, but does not satisfy the scalabilty, takes O(N) to 
do lookup, nor the availability, since one failure can break the chain.

- Scalable key location

[Finger Table]
m bits in the node identifier.  Each node maintains table of m entries.  
Table is called the finger table, each entry is called a finger.  each ith
entry is the node address of the successor of (n+2^i-1) in the address
space, where n is the current node's identifier in the ring.
all work is mod 2^m, so that you can get around the ring.

First finger is the successor since n+2^0 is n+1.  Great drawing of
the fingers on P5. 

It was also brought up that the number of entries could be even reduced
from the bounds by consolidating the entries, ommiting duplicate entries
and modifying the protocol by walking down the finger table looking for
one greater than the value looked for.

Alg.  If lookup is between the n and it's successor, successor is the answer
to lookup.  If not, find first entry that precedes lookup and perform
lookup on that node.  This moves the search closer to the answer at least
halfway along the remaining distance between the node and the target. 
(Proof Page 6).

Can be implemented in recursive or iterative.  iterative is simpler, and
less reliant on other nodes, recursive can implement things like network
locality - server selection, and caching.

Iterative invocation would involve the querying client to ask the result
of it's first lookup for it's lookup answer and the client proceeding to then
query that specific node.  In a recursive call, the client could imbed it's 
address in the query, and the lookup nodes could fwd the query along until
the successor of the original query is found, at which time it would be
returned to the originator of the query.

It was brought up that, caching not only made sense only in a recursive 
invocation, but one where the answer was propogated back in the lookup chain.

It was also brought up that network locality lookup is currently being
implemented in the current Chord implementation.  Resolution in the finger
table can be done, by choosing any of the fingers between the successor
and the relavent finger based on metrics such as network latency.  Nancy
brought up that almost any information could be maintained in these tables,
as more entries in the row.

Every one also keeps a predecessor pointer.

- Joining and leaving - interesting stuff
A requirement to Chord to correctly handle lookup, at least the successor
pointer must be up to date after joins and leaves.  Uses a background
stabilization protocol.

n.join(n') = calls lookup for (n) with n' and that node is the n's successor.
Key: join does not make anyone else know n, it only lets it do lookups.

periodically each node calls stabilize which asks it's successor what it's 
predecessor is. if the predecessor is different, and valid, that is 
falls between node and successor, then this new one is node's successor.

this is also done to find out if the successor needs to know
about it.  if this is true, then n calls n'.notify.  n' then checks
to see if it has a predecessor, if not n is it, if it does, it finds
if n is in the range of it's current predecessor and itself, if so
n is the new predecessor. as soon as successor links are right, lookups
will work to find new node. worst case will be linear scans if
finger tables are not fixed.

another prediodic call, fix_fingers, which just does lookups on 
each node's fingers in rotation.

last periodic call is check_predecessor, if not present "removal", sets
predecessor link to null. which allows the notify to work.  Removal is
completely passive.

[proof in [24] et al for simultaneous joins will be clear at some point.]

- performance with joins
join happens 1 of 3 things
1. regular lookup O(logN) fingers are current. stabilization has finished.
2. succesors are correct but fingers are not, correct but slower, but will
	get there. (Proof in p. 7)
3. successors are not correct in region, but by definition, join did
	not happen until the succesors are right in this region.

- stability.
ring is stable if all successors and fingers are correct.

- failure resilancy. P.8
Correctness relies on the succesor pointers.  if multiple nodes fail, 
a node could have a bad successor pointer if no fingers point to before
the new succesor. Modify protocol by maintaining a list of r successors.
Update is able to keep this list up to date by call in stabilize to retrieve
the successor's list and chuck the last element in the list and add the
successor.  If successor fails, it looks in it's next entry and gets the
updated list from that person.  Chord then tollerates r failures.

High probabily will not be affected by failure (1/2)^r for 1/2 the 
failure case. P.8

- Notification is like failure
Can do better by notifying predecessor.

- Conclusion

Important points, again is that
	- services offer availbility, scalability, and decentralization. 
	- services all offer lookup services primitives. 
	- Chord does this by maitaining O(logN) information at each node
	- Chord lookups in O(logN).


CAN
---

In CAN, the objectIDs are coordinates in a $d$-torus.  Each node is
responsibe for a region in the $d$-torus.  Simple direct path routing
can be done in $(d/4)(n^{1/d})$ hops in expectation.  And each node is
expected to have $O(d)$ neighbors.  When $d$ is set to be $O(\log N)$,
the asymptotic performance of CAN matches those of Chord and Tapestry.
However, since $d$ is fixed, the $N$ here needs to the largest
possible number of nodes.

The major weakness is the fragmentation caused by node leaves and
failures.
When a leaving node's region $A$ cannot be merged with another region to

orm a bigger region, region $A$ has be to taken over by some other node.
Eventually, more and more nodes have to be responsible for multiple
regions.
The mechanism for handling failures is similar. 

The strength of CAN is its simplicity. Its lookup
 algorithm is simpler than most other existing peer-to-peer networks.

The CAN paper also discussed and evaluated some optimizations, 
such as multiple realities and multiple hosts per region.