Open
Description
During a conversation with @ctrueden we created the following plan for further improvements of the scijava-location API:
Abstraction structure:
- Location:
Contains the metadata, where a file is located, what credentials are needed for access, etc. - Session:
Provides access to data sources on a host. - DataHandle:
Provides access to the bytes in a data source, is used e.g. by scifio formats to read images.
SessionService
- can create a Session for a remote, takes Location as input
- "remote" = the host/non-path part of a location URI)
- caches Sessions for each remote
- whenever a Session is handed out, increment a usage ref
- whenever a DataHandle is closed, it decrements its session's usage ref
- if applicable -- some DataHandles don't need Sessions
Protocols
- List of protocols we want:
- SSH SCP
- (S)FTP
- HTTP/HTTPS:
- with resume, to avoid excessive rereading
- support PUT for uploads?
- OMERO
- HDFS
- Cloud block storage:
- Amazon S3
- Openstack Swift
- Azure Block Storage
- Google Cloud Storage
Service/plugin architecture
- Session will need to be a new Plugin type
- SessionService will be a HandlerService
- Model it after DataHandleService; it is very similar
but with additional API, obviously -- e.g., fetchOrCreate or some such
- Model it after DataHandleService; it is very similar
- Update the DataHandleService to have a new method:
create(L extends Location, Session<L>)
- This method allows reusing a specific session.
- This session may not be closed when the handle is closed
- The existing DataHandleService method
create(Location)
will simply ask the SessionService tofetchOrCreate
a session for that Location. - Naively, it might seem like we need a "Remote" interface or some such, but I
actually think we won't need it. I think each Session will be able to extract
the information it needs from its associated type of Location, and that will be
good enough, and simpler. - Finish the StreamHandle interface(?) for DataHandles that are built on InputStream/OutputStreams.
- Some location types like URLHandle would probably(?) benefit from extending StreamHandle
- But if none of the protocols would actually benefit from extending StreamHandle, we could decide not to do it.
Protocol specific notes
- Native HTTP / HTTPS support in Java seems to lack support for resume, looks like we need an external library for that.
- We need to make sure that we only cache sessions that support concurrent access. -> Need method
public boolean isConcurent()
in interface. - Sessions might only allow for limited concurrent access, if we encounter this we will need a
public int concurentAccessLimit()
method. - When do we close automatically created sessions?
- Need a cache eviction strategy like LRU
- Don't immediately close Session when reference count reaches 0
- Maybe use a connection pool with a modifiable limit?