Skip to content

Key Features of RestFlow

tmcphillips edited this page Feb 2, 2013 · 10 revisions

RestFlow is designed for scientists and developers accustomed to writing (even simple) scripts and executing them at the command prompt. RestFlow provides an easy way to connect scripts together to yield sophisticated computational pipelines.

Features for scientists

RestFlow is meant to be used directly by scientists automating scientific workflows. The following features likely will be appreciated by anyone building computational pipelines with RestFlow:

  • Specify workflows simply. Tell the system how to route data between steps in a workflow using a simple, text-based language that is easy to understand and share.

  • Use scripting languages to implement workflow steps. Specify individual steps in a workflow by writing scripts in Python, Perl, Bash, Tcl, or Groovy. Let the RestFlow engine load variables in your scripts with input values each time they are called.

  • Write steps that execute within the workflow engine. Use the Groovy programming language to quickly develop new workflow steps--often using a single line of code--that execute within the RestFlow engine itself.

  • Use a simple text editor. No software development environment, compiler, or graphical user interface is required to develop and run new workflows or to implement new actors (reusable workflow steps). A simple text editor and a command prompt will do.

  • Script your workflows. RestFlow workflows can be run from the command line and scripted. Workflows can be defined and run within self-contained, executable shell scripts.

  • Organize your results. The text expressions used to describe the connections between steps in a workflow also serve to specify locations of intermediate and final output files on disk.

  • Visualize your workflow. Generate dataflow diagrams that graphically summarize the steps in a workflow and how data flows between them. Use the diagrams to check your design and get feedback from colleagues. Include the diagrams in publications of your results.

Features for software developers

RestFlow can be seen as a general-purpose scientific application development platform based on the dataflow metaphor. Scientific programmers will appreciate the following second set of key features:

  • Minimal footprint. RestFlow is based on the Spring Framework and employs its dependency injection approach. Workflow steps are represented within the workflow engine by Java objects, and only a minimum of additional support objects are required beyond them. In other words, a workflow implemented in RestFlow is a custom Java application that comprises little more than the objects representing the steps in the workflow.

  • Loosely coupled applications. Workflows are based on the actor-oriented programming paradigm. Each step in a workflow is executed by an actor, and actors are completely ignorant of the function and state of other actors, of their connections to upstream and downstream actors, and of the function of the workflow as a whole. Furthermore, no class inheritance is required to develop new actors. RestFlow employs interfaces and reflection to keep inheritance hierarchies shallow and to minimize coupling between software components. A simple Java bean can serve as an actor without modification.

  • Pluggable directors. RestFlow encapsulates the algorithms used to schedule actor steps and transfer data between actors in special workflow components called directors (analogous to the directors of Kepler and Ptolemy II). This approach allows workflow designers to select a director based on the requirements of a particular workflow, to replace the director later when those requirements change, and even to develop new, custom directors with different capabilities.

  • Safe concurrency. Mulithreaded directors enable actors in a workflow to execute concurrently. Because actors interact only via immutable data flowing between them, many common concurrency pitfalls (risk of data corruption, deadlock, and race conditions) are automatically avoided without requiring locks. RestFlow thus provides a safe multithreaded programming model.

  • Standards-based. RestFlow depends wherever possible on broadly used and supported technologies and standards, including the Spring Framework and Apache Commons libraries for component-based application design , YAML for declaring actors and specifying workflows, and Maven for dependency management.

First class scientific workflow capabilities

  • Pipeline parallelism. Actors may be invoked multiple time in a single workflow run and concurrently with invocations of other actors. This provides support for workflows that require upstream instrument control and data acquisition steps to occur concurrently with downstream data processing steps.

  • Workflow cycles (loops). RestFlow supports general directed graphs that may include cycles implementing feedback loops and indefinite (do-while) loops.

  • Nested workflows. Any workflow can serve as an actor in a different workflow. Each subworkflow of a hierarchically composed workflow can have a different director as needed without restriction.

  • Actor class hierarchies. Actors can be derived from other actors (without writing any code) so that customizations and specializations of actors can be saved, reused, and shared.

  • Data modeling and streaming. RestFlow makes explicit the overall organization of data operated on and produced by the workflow, while simultaneously supporting data streaming and multiple actor invocations per workflow run.

Clone this wiki locally