-
Notifications
You must be signed in to change notification settings - Fork 2
DocDirCache
This document describes directory entry caching in mount_slash.
The code is implemented in slash2/mount_slash/dircache.c and
slash2/mount_slash/main.c.
There are two ways dirents enter the dircache:
- via
READDIRreply from the MDS - via per-dirent metadata operations such as LOOKUP, RENAME, etc.
Both types create entries in the global hash table (called the
namecache) that provides quick lookups on subsequent basename requests.
The READDIR path also caches the entire contents of the READDIR
reply buffer which gets directly returned to the user application that
called readdir(3).
Entries in the namecache hash table are hashed based on their parent directory's FID and the basename. The namecache is maintained through each namespace modification or access (e.g. LOOKUP and RENAME, UNLINK, CREATE, etc.) and in bulk from READDIR replies.
The entries themselves are allocated from pool but the backing dirent structure is allocated differently depending on which method brought the entity into cache:
- READDIR allocates the dirent buffers and points the
dircache_entto this memory. - individual operations independently
PSCALLOC(3)their own buffer to hold thepscfs_dirent
mslfsop_readdir() is the routine that handles READDIRs from an
application via FUSE.
This routines scans each page in the dircache attached to the file ("FID
cache member handle" or just "fcmh") for a dircache_page that matches
the request, determined by the getdents(2) offset argument.
Upon reception of a READDIR reply, the dirents buffer from the reply is
processed and a sorted array is made so binary searches can be performed
to find a dirent with the given offset.
This is used to determine which dircache_page cached in the fcmh
contains the 'next' dirent the application is requesting.
If all pages are scanned and the offset is not found, a new page handle
is created and marked LOADING so as not to be used.
An asynchronous RPC is issued and a callback is setup.
The FUSE READDIR handler thread then waits on the page for the callback
to run, either via timeout or via failure/successful reply.
Not tying a thread up by instead having immediate return from the FUSE READDIR handler then having the callback itself issue
pscfs_reply_readdir(3)may be a better approach.
Perhaps a balanced tree to lessen the expense of linear searching would be a better approach.
The callback invokes msl_readdir_cb() which examines the reply.
If the reply is small, it is processing immediately; otherwise a bulk
RPC will be on its way so the callback essentially does nothing.
An error return from the MDS RPC reply also triggers immediate
processing.
If the reply was large and a bulk RPC is necessary to complete the request, an incoming RPC from the MDS will eventually be received.
TODO: if the connection to the MDS is severed, the client will hang as there is no code to reissue the request in such cases.
Once the dirents are received by the client, msl_readdir_finish() is
called which registers the dirents into the dircache and namecache and
stashes file attributes contained alongside the dirents in the RPC (like
NFS READDIR+).
