Cooperative File System - Introduction - CFS peer-to-peer readonly storage systems. - Decentralized architecture. - Built on top of chord. - Distributes blocks of files, meta data of files and filesystem layers over CFS servers Claimed goals and results for CFS: - aggressive approach to load balance by spreading file blocks randomly over servers. - download performance analgous to ftp - effeciency and fast recovery times after failure - simple algs. File system goals: Decentralized control Scalability Availablity Load balance Presistence Quotas Effiency - Overview - CFS built core is DHash and Chord. The filesystem is implemented in a layered fashion. FS - interprets blocks as files; presents a file system interface DHash - stores and knows only about unstructure blocks - reliably Chord - Maintains routing tables and such to find blocks. Uses 160 bit block identifier space in chord. Dhash puts block management on top of chord providing caching primitives to CFS as well as replication to a small number of servers. Dhash also allows every physical node to control roughly how much data is stored by creating virtual servers. Each server in CFS roughly has around the same amount of content, by nature of the hashing alg. However, if one wanted a server to serve more content than the average amount, this phyisical server might run multiple virtual servers. - Related work - - simiar to PAST but block grained storage as opposed to whole file grained. - System structure - Built for Linux, and BSD variants using file system interface similar to SFSRO. Uses traditional file system structure usage of blocks. Diagram on p.3 Maximum block size on the order of 10 K Publishing: Publisher inserts blocks into the system by naming each block with the hash of the content as its idenifier. Publisher signs root block with private key. Clients specify filesystem by public key that verifies root block. Guarantees clients see a consistent file system, although client may see an old version if root block is latent. Publisher updates by updating root block in place. Time semantics: All data is only valid for finite interval. CFS discards expired data. Publisher must continually ask for extensions. Forms the basis with which the replication and caching can be constructed lazily. Professor Lynch points out that this architecture is also known as multi-reader, single-writer consitancy. - Chord layer - Addressed in previous talk. Similar to previous papers, roughly, except that it uses the name predecessor. Chord lookup will give immediate predecessor to the succesor so that the client can have free pick to the fastest of the replicas. However, interesting tidbit about how it uses server selection we talked about last time, taking advantage of network locality. Also then uses a metric to determine "closer" where Min(C(ni)) C(ni) = di + d x H(ni) H(ni) = ones((ni-id) >> (160-logN) Latencies maintained passively when finger tables are updated. - DHash layer - Block layer. Implementation: If lookup fails on a server, in that the server is down. The info is trickled back up the lookup chain to alert it that the server is down and asks for its next best predecessor. Replication: k copies for replication. Keep block's replicas at k servers after block's successor in ring. If server fails, lookup is still nearby the successors. Key is that the protocol is modified when lookup fails to try k servers after the expected. Replicas are not going to be physcially near each other because of the consistent hashing. Replicated block likely at another phyisical network block. Caching: Dhash provides caching layer. Each dhash node keeps fixed percentage of disk allocated for caching. Every lookup at each piece of the search path, examines cache. LRU replacement at each node. Cached copies will likely be close to the successor node since ones far away are unlikely to be queried upon. Differnence between caching at replicas. Replicas are at predictable places so that it can be sure they are always present. Caches cannot be easily counted, number can fall to zero. ** Consistancy ** Avoids most consitancy problems in that blocks are keyed by content hashes. Root blocks however, are keyed by a public key. Root blocks can become stale. i.e. an entire view could become stale. Key is that the filesystem integrity remains. - Load Balance - Block sized chunking and consistant hashing gives general approach of balanced distribution. Block sized chunking and chord id hashing distributes a file across the entire ring, so a popular file will distribute load accoss number of servers. Virtual servers can allow a node to host more data, and offsetting the extra cost of network hops, each virtual server can see each other's routing tables. - Quotas - Fix DOS file size abuse by imposing a per-IP-address quota - Updates and deltions - CFS allows updates from publishers. Root files can only be updated with the signed key. Deltion is not explicit since each block has a timeout, there is a passive deletion. ----------------------------------------------------------------------- PAST - Introduction - Built on top of Pastry. Storage in the nodes of the Pastry network. Files in their entirety are distributed as opposed to chunked as in CFS. Substantial difference from CFS. - Overview - General services offered: Insert(k (copies)) - Immutable copies inserted into the system. Lookup Reclaim - Does not guarentee delete. - Compare with CFS - Similarities Data stored on nodes. Built on overlay networks. Use hashes of content to get placement in overlay network. Caches in the query path to improve performance. Caches and replicas are treated differently in both systems. Use of content hashing and signing. Take advantage of physical network locality (enhanced versions of CFS). Differences File based, not block based. *** BIGGEST DIFFERENCE *** Anyone can insert. Prof. Lynch pointed out that this anyone inserts is only for their own files, multiple people are not inserting the same file. That means that PAST is still single writer, multi reader. Insert operation allows explicit specification of k replicas whereas CFS uses a system wide number and does not parameterize this in the insert operation as PAST does. Reclaim more explicit. CFS lets passive time expiration handle reclaims. Not a filesystem as CFS is, but just an interface for files themselves.