Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assembler: Mode.DEFAULT / Mode.REUSE are not considered and result in Mode.CREATE #1476

Open
Aklakan opened this issue Aug 11, 2022 · 3 comments
Labels

Comments

@Aklakan
Copy link
Contributor

Aklakan commented Aug 11, 2022

Version

4.6.0-SNAPSHOT

What happened?

(Credits to @LorenzBuehmann for spotting this)

While creating an assembler config that creates two services with different configurations over the same spatial dataset we noticed that the spatial index was unexpectedly loaded twice. The assembler framework documentation suggests that - by default - the same Java object created from an RDF resource should be reused wherever that RDF resource is referenced in the assembler spec.
However, in the code, the Mode flag is eventually ignored and always a fresh Java object is returned:

public abstract class DatasetAssembler {
    @Override
    public Dataset open(Assembler a, Resource root, Mode mode) {
        DatasetGraph dsg = createDataset(a, root) ;
        return DatasetFactory.wrap(dsg);
    }
}

I am not yet sure of the assembler life-cycle: Whether the cache of created objects would have to be an additional argument to the assembler, or whether it can be an attribute of the assembler.

Are you interested in making a pull request?

Maybe

@Aklakan Aklakan added the bug label Aug 11, 2022
@afs
Copy link
Member

afs commented Aug 16, 2022

Simpler example: two references to an in-memory dataset. This always cause two databases.

:service1 rdf:type fuseki:Service ;
    fuseki:name "first" ;
    fuseki:dataset :dataset ;
    .
:service2 rdf:type fuseki:Service ;
    fuseki:name "second" ;
    fuseki:dataset :dataset ;
    .
  :dataset rdf:type ja:MemoryDataset ;
    .

but

:dataset rdf:type  tdb2:DatasetTDB2 ;
    tdb2:location "DB2" ;
    .

is one database because TDB manages instances itself (it has to anyway) using the location as a key.

Dealing with shared/fresh via Mode does not work. In the first example, should it be one database or two? Either may have been intended.

The assembler design implicitly assumes use-once assemblers (but that is a contradiction to the registry) and use on a single thread.

  • There needs to be a scope object passed around.
  • The needs to be a better policy for same vs fresh.

The assembler subsystem needs replacing, new API and simplified. Realistically, that means new "thing builder", with new API and new algorithm.

Fuseki is the main user nowadays and that ought to be the primary use case. A Fuseki service is not assembled, it's specific code to process that description; the dataset of the service is an assembler build.

Probably (it needs detailed analysis) the build policy should be "same URI, same thing" per build run which an individual builder can ignore. Something like

    interface Constructor<X> {
        public default <X> construct<X>(BuildContext cxt, Graph descriptionGraph, Node description) {
            cxt.computeIfAbsent(description, (d)->newItem(cxt, descriptionGraph, d));
        }
        public <X> newItem<X>(BuildContext cxt, Graph descriptionGraph, Node description) { return null; }
    }

but before code, we need to scope this with use cases and asking what use if made of existing assemblers outside of Fuseki.

The new process would be able to call existing assemblers via Assembler.general, (so supports all legacy, including legacy configurations in the field). It should on graph/node not Resource.

The good news is that building services in Fuseki is located in one area of code and also that the core of build-dispatch isn't that large.

tl;dr: design first!

@Aklakan
Copy link
Contributor Author

Aklakan commented Aug 16, 2022

Probably (it needs detailed analysis) the build policy should be "same URI, same thing" per build run which an individual builder can ignore.

The concept of the assembler is very similar to that of spring bean definitions which support different scopes:
https://docs.spring.io/spring-framework/docs/3.0.0.M3/reference/html/ch04s04.html

The most relevant ones are:

Scope Description
singleton Scopes a single bean definition to a single object instance per Spring IoC container.
prototype Scopes a single bean definition to any number of object instances.

In java this is done with

class MyBeanDefinitions {
  @Bean
  @Scope("prototype")
  MyBean myBean() { ... }

  // When Spring IoC proceeds to create the dependent beans, it takes
  // care of calling the following bean creation functions each with a fresh instance
  // of MyBean because of its prototype scope annotation:
  @Bean
  MyDependentBean myDependentBean1(MyBean myBean) { ... } 

  @Bean
  MyDependentBean myDependentBean2(MyBean myBean) { ... } 
}

Perhaps these annotations could be simply expressed in the RDF using a ja:scope attribute?
E.g.

  <#myMemoryDatasetBean> a ja:MemoryDataset ;
   ja:scope ja:prototype ; # Each reference to <#myMemoryDatasetBean> would create a new Java object
   .

The limitation above might be that it could be difficult to reference a specific instance. But then again there could be a special helper construct for turning a "prototype" bean into a "singleton" one such as:

  <#myConcreteBean> a ja:Reference ;
  ja:scope ja:singleton ; # singleton scope should be the default anyway
  ja:target <#myMemoryDatasetBean> .

Now any reference to <#myConcreteBean> would refer to the same Java object, whereas any reference to <#myMemoryDatasetBean> would create a new instance.

@afs
Copy link
Member

afs commented Aug 16, 2022

On the theme of prototype but making it context sensitive, not static either-or, and more RDF-y:

That concept can be extended with an intermediate:

<#fresh> a ja:NewInstance ;
    ja:prototype <#description>;
    .

i.e. the constructor for ja:NewInstance would make a newItem call to <#description> whereas direct use would make a construct call.

But we need Jena/Fuseki use cases first. Avoid over engineering!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants