Skip to content

rdflib.namespace + rdflib.term Type Refactoring #1241

@JaimieMurdock

Description

@JaimieMurdock

The class heirarchy for rdflib.namespace and rdflib.term is introducing errors in implementation.

- object
  - rdflib.namespace.ClosedNamespace
    - rdflib.namespace._RDFNamespace
  - rdflib.term.Node
    - rdflib.term.Identifier
      - rdflib.term.BNode
      - rdflib.term.Literal
      - rdflib.term.URIRef
        - rdflib.term.Genid
        - rdflib.term.RDFLibGenid
- str
  - rdflib.namespace.Namespace
  - rdflib.namespace.URIPattern

For example, the lack of relationship between Namespace and ClosedNamespace causes multiple definitions of term, __getitem__, __getattr__, and (with PR#1237) __contains__.

Additionally, some of the convenience of Namespace inheriting from str is now gone, so the ClosedNamespace loses some portability. For example, to show ref = URIRef(...) membership in a ns = Namespace(...), the code would be ref.startswith(ns). However, with a ClosedNamespace, this is an error and the code is ref.startswith(cns.uri) (ignoring for a moment validity of the membership relationship - just checking prefixes).

First Step

Refactoring the Namespace and ClosedNamespace heirarchy should be easy without destroying existing functionality.

Proposed class heirarchy:

- str
  - rdflib.namespace.Namespace
    - rdflib.namespace.ClosedNamespace
      - rdflib.namespace._RDFNamespace

This should add some new functions to the ClosedNamespace from the str type.

This may already have an open PR in #1213. However, the implementation there doesn't address duplicate implementation of __getitem__, __getattr__, and term.

Next Steps

Investigate issues with the URIRef representation. Identifier currently has multiple inheritence from Node and text_type. Double check representation.

There is a lot of added complexity coming from Python 2 support and six. Removing six and the now-EOL Python 2 compatibility could simplify long-term maintenance. (#1014 polled on this for 6.0.0, but there doesn't seem to be an issue open to remove it and coordinate?) We'd also be inheriting from stdlib str and not text_type!

Scope Creep

I'm unsure of what the proper granularity for a ticket in rdflib is, but if we want to add some outer bounds for scope: adding type annotations to the code base may help tremendously with finding errors like this.

Also, I'd like to contribute, not just make work for other people :) I've got dev cycles I can put on rdflib and am happy to discuss the best direction for my energy with the maintainers. Already meeting to discuss some work on PySHACL soon. If there's a public roadmap beyond the 6.0.0 tag or Wiki, I'd love to learn more!

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions