Skip to content

DocDirCache

Jared Yanovich edited this page Jun 24, 2015 · 2 revisions

Directory Entity Caching

Overview

This document describes directory entry caching in mount_slash. The code is implemented in slash2/mount_slash/dircache.c and slash2/mount_slash/main.c.

There are two ways dirents enter the dircache:

  • via READDIR reply from the MDS
  • via per-dirent metadata operations such as LOOKUP, RENAME, etc.

Both types create entries in the global hash table (called the namecache) that provides quick lookups on subsequent basename requests. The READDIR path also caches the entire contents of the READDIR reply buffer which gets directly returned to the user application that called readdir(3).

Data Structure

Entries in the namecache hash table are hashed based on their parent directory's FID and the basename. The namecache is maintained through each namespace modification or access (e.g. LOOKUP and RENAME, UNLINK, CREATE, etc.) and in bulk from READDIR replies.

The entries themselves are allocated from pool but the backing dirent structure is allocated differently depending on which method brought the entity into cache:

  • READDIR allocates the dirent buffers and points the dircache_ent to this memory.
  • individual operations independently PSCALLOC(3) their own buffer to hold the pscfs_dirent

READDIR Handling

mslfsop_readdir() is the routine that handles READDIRs from an application via FUSE. This routines scans each page in the dircache attached to the file ("FID cache member handle" or just "fcmh") for a dircache_page that matches the request, determined by the getdents(2) offset argument.

Upon reception of a READDIR reply, the dirents buffer from the reply is processed and a sorted array is made so binary searches can be performed to find a dirent with the given offset. This is used to determine which dircache_page cached in the fcmh contains the 'next' dirent the application is requesting.

If all pages are scanned and the offset is not found, a new page handle is created and marked LOADING so as not to be used. An asynchronous RPC is issued and a callback is setup. The FUSE READDIR handler thread then waits on the page for the callback to run, either via timeout or via failure/successful reply.

Not tying a thread up by instead having immediate return from the FUSE READDIR handler then having the callback itself issue pscfs_reply_readdir(3) may be a better approach.

Perhaps a balanced tree to lessen the expense of linear searching would be a better approach.

The callback invokes msl_readdir_cb() which examines the reply. If the reply is small, it is processing immediately; otherwise a bulk RPC will be on its way so the callback essentially does nothing. An error return from the MDS RPC reply also triggers immediate processing.

If the reply was large and a bulk RPC is necessary to complete the request, an incoming RPC from the MDS will eventually be received.

TODO: if the connection to the MDS is severed, the client will hang as there is no code to reissue the request in such cases.

Once the dirents are received by the client, msl_readdir_finish() is called which registers the dirents into the dircache and namecache and stashes file attributes contained alongside the dirents in the RPC (like NFS READDIR+).

Clone this wiki locally