Cooperative File System

- Introduction - 

CFS peer-to-peer readonly storage systems. 
- Decentralized architecture.
- Built on top of chord.
- Distributes blocks of files, meta data of files and filesystem layers over
	CFS servers

Claimed goals and results for CFS:
	- aggressive approach to load balance by spreading file blocks
	randomly over servers.
	- download performance analgous to ftp
	- effeciency and fast recovery times after failure
	- simple algs.

File system goals:
Decentralized control
Scalability
Availablity
Load balance
Presistence
Quotas
Effiency

- Overview -

CFS built core is DHash and Chord.  The filesystem is implemented in a
layered fashion.
	FS - interprets blocks as files; presents a file system interface
	DHash - stores and knows only about unstructure blocks - reliably
	Chord - Maintains routing tables and such to find blocks.

Uses 160 bit block identifier space in chord. 

Dhash puts block management on top of chord providing caching primitives
	to CFS as well as replication to a small number of servers.
Dhash also allows every physical node to control roughly how much data
	is stored by creating virtual servers.
		Each server in CFS roughly has around the same amount
		of content, by nature of the hashing alg.  However, if 
		one wanted a server to serve more content than the
		average amount, this phyisical server might run multiple
		virtual servers. 

- Related work - 

- simiar to PAST but block grained storage as opposed to whole file grained.

- System structure -
Built for Linux, and BSD variants using file system interface similar to 
	SFSRO.
Uses traditional file system structure usage of blocks.  Diagram on p.3
Maximum block size on the order of 10 K

Publishing:
Publisher inserts blocks into the system by naming each block with the
hash of the content as its idenifier.  
Publisher signs root block with private key.
Clients specify filesystem by public key that verifies root block.
	Guarantees clients see a consistent file system, although client
	may see an old version if root block is latent.
Publisher updates by updating root block in place.

Time semantics:
All data is only valid for finite interval.
CFS discards expired data.
Publisher must continually ask for extensions.
Forms the basis with which the replication and caching can be constructed
lazily.

Professor Lynch points out that this architecture is also known as 
multi-reader, single-writer consitancy.

- Chord layer -

Addressed in previous talk.

Similar to previous papers, roughly, except that it uses the name
predecessor. Chord lookup will give immediate predecessor to the succesor
so that the client can have free pick to the fastest of the replicas.

However, interesting tidbit about how it uses server selection we talked about
last time, taking advantage of network locality.

Also then uses a metric to determine "closer" where Min(C(ni))
	C(ni) = di + d x H(ni)
	H(ni) = ones((ni-id) >> (160-logN)

Latencies maintained passively when finger tables are updated.

- DHash layer -

Block layer.

Implementation:
If lookup fails on a server, in that the server is down.  The
info is trickled back up the lookup chain to alert it that the server is
down and asks for its next best predecessor.

Replication:
k copies for replication.
Keep block's replicas at k servers after block's successor in ring.
If server fails, lookup is still nearby the successors.
Key is that the protocol is modified when lookup fails to try k servers
after the expected.
Replicas are not going to be physcially near each other because of the
consistent hashing.  Replicated block likely at another phyisical network
block.

Caching:
Dhash provides caching layer.
Each dhash node keeps fixed percentage of disk allocated for caching.
Every lookup at each piece of the search path, examines cache.
LRU replacement at each node.
Cached copies will likely be close to the successor node since
	ones far away are unlikely to be queried upon.
Differnence between caching at replicas.
	Replicas are at predictable places so that it can be sure
	they are always present.
	Caches cannot be easily counted, number can fall to zero.
** Consistancy **
Avoids most consitancy problems in that blocks are keyed by content
	hashes.
Root blocks however, are keyed by a public key.
Root blocks can become stale. i.e. an entire view could become stale.
Key is that the filesystem integrity remains.

- Load Balance -
Block sized chunking and consistant hashing gives general approach of
balanced distribution. 

Block sized chunking and chord id hashing distributes a file across the
entire ring, so a popular file will distribute load accoss number of
servers.

Virtual servers can allow a node to host more data, and offsetting
the extra cost of network hops, each virtual server can see each other's
routing tables.

- Quotas -
Fix DOS file size abuse by imposing a per-IP-address quota

- Updates and deltions -
CFS allows updates from publishers.  Root files can only be updated
with the signed key.
Deltion is not explicit since each block has a timeout, there is a 
passive deletion.


-----------------------------------------------------------------------

PAST

- Introduction - 
Built on top of Pastry.
Storage in the nodes of the Pastry network.
Files in their entirety are distributed as opposed to chunked as in CFS.
	Substantial difference from CFS.

- Overview - 
General services offered:
	Insert(k (copies)) - Immutable copies inserted into the system.
	Lookup
	Reclaim - Does not guarentee delete.

- Compare with CFS - 

Similarities
Data stored on nodes.
Built on overlay networks.
Use hashes of content to get placement in overlay network.
Caches in the query path to improve performance.
Caches and replicas are treated differently in both systems.
Use of content hashing and signing.
Take advantage of physical network locality (enhanced versions of CFS).

Differences
File based, not block based. *** BIGGEST DIFFERENCE ***
Anyone can insert.  Prof. Lynch pointed out that this anyone inserts is only
	for their own files, multiple people are not inserting the same
	file.  That means that PAST is still single writer, multi reader.
Insert operation allows explicit specification of k replicas whereas CFS
	uses a system wide number and does not parameterize this in the
	insert operation as PAST does.
Reclaim more explicit.  CFS lets passive time expiration handle reclaims.
Not a filesystem as CFS is, but just an interface for files themselves.