Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make legacyDataType accessible? #2977

Open
hoijui opened this issue Jan 25, 2025 · 1 comment
Open

Make legacyDataType accessible? #2977

hoijui opened this issue Jan 25, 2025 · 1 comment
Labels
enhancement Incrementally add new feature

Comments

@hoijui
Copy link
Contributor

hoijui commented Jan 25, 2025

Version

5.4.0-SNAPSHOT (a1ca2b7)

Feature

In jena-core/src/main/java/org/apache/jena/graph/NodeFactory.java,
There is a line:

/*package*/ static final boolean legacyLangTag = false;

There is currently no way for a library user to change this value.
For my use-case (an RDF Linter), it is very useful to set this to true.
I did so in code, installing the library from source locally, and then using it in the linter. That allows me to find issues with lang-tags in Turtle files, for example.

Would it be an option to set this field to public, so that I don't have to maintain a separate version of the library and keeping it up to date, just for this functionality?

Are you interested in contributing a solution yourself?

Yes

@hoijui hoijui added the enhancement Incrementally add new feature label Jan 25, 2025
@afs
Copy link
Member

afs commented Jan 27, 2025

There should be no problem exposing this for Eyeball-NG.
But it affects the whole system, unless you mean setting true, parsing then resetting. When run as a command line tool, that could be acceptable; run as a library, then it is not.

The rest of the system assumes language tags are unique.

A better way that should work:

There is a way to tap into exactly what is coming out of a parser before NodeFactory.

A parser run has a FactoryRDF object. It is settable with RDFParserBuilder factory. All nodes creation should be via this route.

ParserProfile is the interface of events coming out of the parser including node creation - it includes line/column in the parser.It calls an FactoryRDF

One method of FactoryRDF is createLangLiteral(String lexical, String langTag) so it is seeing the language tag before going to NodeFactory that canoicalizes it.

By inheriting or wrapping, you could test the language tag, and pass it on to usual FactoryRDF method having noted any issues.

It is only at NodeFactory.createLiteralLang/createLiteralDirLang that the language tag is manipulated. The equals and hashCode of Node_Literal are case sensitive so they don't get in the way.

Constructors for Node_Literal are package-scoped to prevent apps creating such bad literals.

To get the line/col number needs a ParserProfile but it is harder to set a custom one (possible, but it may need to be per language).

   static class FactoryRDF2 extends FactoryRDFStd {

        List<String> unwiseLangTags = new ArrayList<>();

        @Override
        public Node createLangLiteral(String lexical, String langTag) {
            if ( langTag != null ) {
                String langTag2 = LangTag.canonical(langTag);
                if ( ! langTag.equals(langTag2) )
                    unwiseLangTags.add(langTag);
            };
            return super.createLangLiteral(lexical, langTag);
        }
    }

    public static void main(String... args) throws IOException {

        String PREFIX = "PREFIX : <http://example/>\n";
        String data = PREFIX+"""
                :s1 :p1 'abc1'@en .
                :s2 :p2 'abc2'@en-GB .
                :s2 :p2 'abc3'@en-gb .
                  """;

        FactoryRDF2 factory = new FactoryRDF2();
        RDFParser.fromString(data, Lang.TTL).factory(factory).toGraph();
        System.out.println(factory.unwiseLangTags);
    }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Incrementally add new feature
Projects
None yet
Development

No branches or pull requests

2 participants