Skip to content

Clarification on graph digest function #825

Open
@scossu

Description

@scossu

I have gladly discovered that RDFLib implements an algorithm to calculate the checksum, or digest, of a graph. However I am confused by the results that I am receiving.

The documentation is not clear about which kind of hash algorithm is used for the output, which seems to have a quite unusual format to me:

>>> from rdflib import Graph, URIRef, compare
>>> gr = Graph()
>>> gr.add((URIRef('urn:s:1'), URIRef('urn:p:1'), URIRef('urn:o:1')))
>>> igr = compare.to_isomorphic(gr)
>>> igr
 <Graph identifier=N7c1f147d3aef41c38f4e65dadb69fb0f (<class 'rdflib.compare.IsomorphicGraph'>)
>>> stats = {}
>>> digest = igr.graph_digest(stats)
>>> digest
36839361122531509846328502269903823252312473480584219232869415679526775553513
>>> type(digest)
int
>>> len(str(digest))
77
>>> print(stats)
{'individuations': 0, 'graph_digest': '517256e8a181f880608b974dd782af082175884605e793549529f9e8b84ffde9', 'color_count': 0, 'initial_color_count': 0, 'to_hash_runtime': 0.000295, 'adjacent_nodes': 0, 'tree_depth': 0, 'triple_count': 1, 'initial_coloring_runtime': 0.000139, 'canonicalize_triples_runtime': 0.000166}
>>> len(stats['graph_digest'])
65

What does the result of the function represent? How I should interpret the 77-digit integer that I receive?

Note that the number of digits may change with the graph. A different graph gives me 78 digits.

What does the "graph_digest" key in the stats represent? It is made up of 65 hex characters!

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentationid-as-cntxttracking related issues

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions