Skip to content

SciJava Location API Improvements #251

Open
@gab1one

Description

@gab1one

During a conversation with @ctrueden we created the following plan for further improvements of the scijava-location API:

Abstraction structure:

  1. Location:
    Contains the metadata, where a file is located, what credentials are needed for access, etc.
  2. Session:
    Provides access to data sources on a host.
  3. DataHandle:
    Provides access to the bytes in a data source, is used e.g. by scifio formats to read images.

SessionService

  • can create a Session for a remote, takes Location as input
    • "remote" = the host/non-path part of a location URI)
  • caches Sessions for each remote
  • whenever a Session is handed out, increment a usage ref
  • whenever a DataHandle is closed, it decrements its session's usage ref
    • if applicable -- some DataHandles don't need Sessions

Protocols

  • List of protocols we want:
    • SSH SCP
    • (S)FTP
    • HTTP/HTTPS:
      • with resume, to avoid excessive rereading
      • support PUT for uploads?
    • OMERO
    • HDFS
    • Cloud block storage:
      • Amazon S3
      • Openstack Swift
      • Azure Block Storage
      • Google Cloud Storage

Service/plugin architecture

  • Session will need to be a new Plugin type
  • SessionService will be a HandlerService
    • Model it after DataHandleService; it is very similar
      but with additional API, obviously -- e.g., fetchOrCreate or some such
  • Update the DataHandleService to have a new method:
    • create(L extends Location, Session<L>)
    • This method allows reusing a specific session.
    • This session may not be closed when the handle is closed
  • The existing DataHandleService method create(Location) will simply ask the SessionService to fetchOrCreate a session for that Location.
  • Naively, it might seem like we need a "Remote" interface or some such, but I
    actually think we won't need it. I think each Session will be able to extract
    the information it needs from its associated type of Location, and that will be
    good enough, and simpler.
  • Finish the StreamHandle interface(?) for DataHandles that are built on InputStream/OutputStreams.
    • Some location types like URLHandle would probably(?) benefit from extending StreamHandle
    • But if none of the protocols would actually benefit from extending StreamHandle, we could decide not to do it.

Protocol specific notes

  • Native HTTP / HTTPS support in Java seems to lack support for resume, looks like we need an external library for that.
  • We need to make sure that we only cache sessions that support concurrent access. -> Need method public boolean isConcurent() in interface.
  • Sessions might only allow for limited concurrent access, if we encounter this we will need a public int concurentAccessLimit() method.
  • When do we close automatically created sessions?
    • Need a cache eviction strategy like LRU
    • Don't immediately close Session when reference count reaches 0
    • Maybe use a connection pool with a modifiable limit?

Implementation specific notes

  • calling dispose() on the SessionService will close all connections.
  • Currently working on this on the more-handles-gabriel branch

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions