Skip to content

Commit 8eb1b04

Browse files
authored
Merge pull request #2006 from aboutcode-org/747-new-hashid
Create new aboutcode.federated library #747
2 parents 98e5160 + 7096cb5 commit 8eb1b04

File tree

7 files changed

+3196
-0
lines changed

7 files changed

+3196
-0
lines changed

aboutcode/federated/CHANGELOG.rst

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
Changelog
2+
=============
3+
4+
5+
v0.1.0 (October 20, 2025)
6+
---------------------------
7+
8+
- Initial release of the ``aboutcode.federated`` library based on
9+
original work in the ``aboutcode.hashid`` library.

aboutcode/federated/README.rst

Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
aboutcode.federated
2+
===================
3+
4+
This is a library of utilities to compute ids and file paths for AboutCode
5+
federated data based on Package URL
6+
7+
8+
Federated data utilities goal is to handle content-defined and hash-addressable
9+
Package data keyed by PURL stored in many Git repositories. This approach to
10+
federate decentralized data is called FederatedCode.
11+
12+
13+
Overview
14+
========
15+
16+
The main design elements for these utilities are:
17+
18+
1. **Data Federation**: A Data Federation is a database, representing a consistent,
19+
non-overlapping set of data kind clusters (like scans, vulnerabilities or SBOMs)
20+
across many package ecosystems, aka. PURL types.
21+
A Federation is similar to a traditional database.
22+
23+
2. **Data Cluster**: A Data Federation contains Data Clusters, where a Data Cluster
24+
purpose is to store the data of a single kind (like scans) across multiple PURL
25+
types. The cluster name is the data kind name and is used as the prefix for
26+
repository names. A Data Cluster is akin to a table in a traditional database.
27+
28+
3. **Data Repository**: A DataCluster contains of one or more Git Data Repository,
29+
each storing datafiles of the cluster data kind and a one PURL type, spreading
30+
the datafiles in multiple Data Directories. The name is data-kind +PURL-
31+
type+hashid. A Repository is similar to a shard or tablespace in a traditionale
32+
database.
33+
34+
4. **Data Directory**: In a Repository, a Data Directory contains the datafiles for
35+
PURLs. The directory name PURL-type+hashid
36+
37+
5. **Data File**: This is a Data File of the DataCluster's Data Kind that is
38+
stored in subdirectories structured after the PURL components::
39+
40+
namespace/name/version/qualifiers/subpath:
41+
42+
- Either at the level of a PURL name: namespace/name,
43+
- Or at the PURL version level namespace/name/version,
44+
- Or at the PURL qualifiers+PURL subpath level.
45+
46+
A Data File can be for instance a JSON scan results file, or a list of PURLs in
47+
YAML.
48+
49+
For example, a list of PURLs as a Data Kind would stored at the name
50+
subdirectory level::
51+
52+
gem-0107/gem/random_password_generator/purls.yml
53+
54+
Or a ScanCode scan as a Data Kind at the version subdirectory level::
55+
56+
gem-0107/npm/file/3.24.3/scancode.yml
57+
58+
59+
60+
License
61+
-------
62+
63+
Copyright (c) AboutCode and others. All rights reserved.
64+
65+
SPDX-License-Identifier: Apache-2.0
66+
67+
See https://github.com/aboutcode-org/vulnerablecode for support or download.
68+
69+
See https://aboutcode.org for more information about AboutCode OSS projects.

0 commit comments

Comments
 (0)