Skip to content
Jared Yanovich edited this page Aug 9, 2015 · 4 revisions

Remote Procedure Calls (RPC)

This document describes the architecture of the networking and RPC layers in SLASH2.

Note: the terms "server" and "client" used in this document follow standard convention of one host wishing to send a request to a peer and received a reply. The three components in a SLASH2 deployment (MDS, IOS, CLI) often play server and client roles with other components types to provide their service.

Overview

SLASH2 leverages a number of libraries which provide higher level networking facilities than many customary lower level primitives typically provided by the kernel and its support libraries. These are LNET, the Lustre networking stack, and its accompanying libraries.

Lustre networking Stack (LNET)

SLASH2 uses a forked version of the Lustre Networking (LNET) library with a few changes to fit into SLASH2 code base. As SLASH2 is fully user-mode, much of the kernel-mode code in SLASH2's LNET has been removed.

The LNET fork is actually in PFL along with the accompanying RPC library PFLRPC.

Our version of LNET uses the same constructs for supporting underlying transport support, although with SLASH2 primarily only the usocklnd (user-mode sockets Lustre networing device) is used.

Please refer to the Lustre documentation for additional information on the workings of these libraries.

PFL RPC layer

This module provides an API for higher level RPC operations. It is forked from an older version of the Lustre ptlrpc API.

Please refer to the Lustre documentation for additional information on the workings of these libraries.

Note: significant changes have been made to this library to support the different environments that SLASH2 deployments often consist of.

Note: this APIcode is in the process of being renamed from the "PSC" prefix to the "PFL" prefix so unfortunately some of the APIs names are still in this transition.

The PFL RPC API is primarily implemented in pfl/rpcclient.c and pfl/service.c which provide three constructs for applications built upon PFL RPC:

  • pscrpc_export - a handle to a connected client for a service provided by this operating daemon

  • pscrpc_import - a handle for a connection to a remote service

  • pscrpc_request (rq) - a single RPC exchange, which takes slightly different lifecycles in the client than in the server:

    • clients always initiate requests (filling in rq_reqmsg) and the PFL RPC API attaches the rq_repmsg reply message when it is received from the server;

    • servers invoke the corresponding service thread handling routine in one of the service's worker thread's context when an incoming request is received. The thread processes the request and generates a response. The response is thereafter transmitted by the PFL RPC module back to the client.

An additional API layer defined in pfl/rsx.h (RPC simple exchange) provides a higher level RPC send interface: pfl_rsx_newreq() and pfl_rsx_waitrep(). RSX also contains the bulk data processing methods rsx_bulkclient() and rsx_bulkserver() which are used to transmit data that are larger than the allowed message sizes which are defined when services are registered.

SLASH2 application layer

The SLASH2 networking API is primarily implemented in share/rpc_common.c and include/slashrpc.h which provide an additional construct for communication among SLASH2 daemons:

  • slashrpc_cservice (csvc) - a higher level structure which uses a pscrpc_import to issue requests to servers and handle replies in a versatile fashion

Server mode operation in SLASH2 is pretty straightforward: a PFL RPC service is registered during daemon initialization and incoming requests are handled when received. pscrpc_thread_spawn() registers a new service for clients and spawns worker threads to handle the requests.

Certain tie-ins are made when operations must be accomodated in a persistent fashion e.g. bmap leases issued to clients are written to stable storage and also retained in memory after an RPC exchange has finished.

Many RPC operations are stateless and this mode is preferred when possible.

API Tour

A higher level API "wrapper" is defined in slash2/include/slconn.h which calls the PFL RPC and RSX primitives explained above but offer additional functionality beyond that provided by the lower layers:

  • piggybacking of extra fields into RPCs to keep API usage simple yet offer increased communication
  • encryption/authentication/integrity of message contents
  • automatic operation counting
  • higher level event handling, such as cleanup during failure

Clients can create new RPC requests with MSL_RMC_NEWREQ(). This mount_slash API (with the msl prefix) is used to create a new RPC structure which is intended to be received by the MDS and is issued by CLI (RMC). RPCs can be processed synchronously or asynchronously; SL_RSX_WAITREP() performs a blocking wait until the server replies or times out; SL_NBRQSET_ADD() pushes the request out and arranges for a callback to be invoked when the server replies or times out.

Many RPC behavorial specifics can be tuned by setting appropriate fields in pscrpc_request e.g. rq_bulk_abortable, which prevents the entire import (and thus connection) from failing if the server doesn't need a bulk tranmission to satisfy the request.

Clone this wiki locally