Skip to content

Commit 0065d2b

Browse files
authored
Add gearhash, xetchunk and blake3 WASM packages (#1500)
This PR is an experiment to see what a pure-wasm [gearhash](https://github.com/srijs/rust-gearhash) looks like, using [https://www.assemblyscript.org](AssemblyScript), a language purpose made to generate WASM. Generating WASM from rust uses a lot of glue, the glue for assembly script seems a lot thinner: ```js async function instantiate(module, imports = {}) { const { exports } = await WebAssembly.instantiate(module, imports); return exports; } export const { memory, add, } = await (async url => instantiate( await (async () => { const isNodeOrBun = typeof process != "undefined" && process.versions != null && (process.versions.node != null || process.versions.bun != null); if (isNodeOrBun) { return globalThis.WebAssembly.compile(await (await import("node:fs/promises")).readFile(url)); } else { return await globalThis.WebAssembly.compileStreaming(globalThis.fetch(url)); } })(), { } ))(new URL("debug.wasm", import.meta.url)); ``` It will also be interesting to compare wasm size between pure wasm & from-rust wasm, as well as provide `gearhash` to the wider JS/NPM ecosystem. `gearhash` is a core component of the xet protocol chunking & hashing and a cool piece of tech! cc @Wauplin @hanouticelina @assafvayner @rajatarya for viz Note: also added `xetChunk` and `blake3` packages
1 parent 62ae923 commit 0065d2b

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

77 files changed

+5971
-0
lines changed
Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,64 @@
1+
name: Blake3 WASM - Version and Release
2+
3+
on:
4+
workflow_dispatch:
5+
inputs:
6+
newversion:
7+
type: choice
8+
description: "Semantic Version Bump Type"
9+
default: patch
10+
options:
11+
- patch
12+
- minor
13+
- major
14+
15+
concurrency:
16+
group: "push-to-main"
17+
18+
defaults:
19+
run:
20+
working-directory: packages/blake3-wasm
21+
22+
jobs:
23+
version_and_release:
24+
runs-on: ubuntu-latest
25+
steps:
26+
- uses: actions/checkout@v3
27+
with:
28+
# Needed to push the tag and the commit on the main branch, otherwise we get:
29+
# > Run git push --follow-tags
30+
# remote: error: GH006: Protected branch update failed for refs/heads/main.
31+
# remote: error: Changes must be made through a pull request. Required status check "lint" is expected.
32+
token: ${{ secrets.BOT_ACCESS_TOKEN }}
33+
- run: npm install -g corepack@latest && corepack enable
34+
- uses: actions/setup-node@v3
35+
with:
36+
node-version: "20"
37+
cache: "pnpm"
38+
cache-dependency-path: |
39+
packages/blake3-wasm/pnpm-lock.yaml
40+
# setting a registry enables the NODE_AUTH_TOKEN env variable where we can set an npm token. REQUIRED
41+
registry-url: "https://registry.npmjs.org"
42+
- run: pnpm install
43+
- run: git config --global user.name machineuser
44+
- run: git config --global user.email [email protected]
45+
- run: |
46+
PACKAGE_VERSION=$(node -p "require('./package.json').version")
47+
BUMPED_VERSION=$(node -p "require('semver').inc('$PACKAGE_VERSION', '${{ github.event.inputs.newversion }}')")
48+
# Update package.json with the new version
49+
node -e "const fs = require('fs'); const package = JSON.parse(fs.readFileSync('./package.json')); package.version = '$BUMPED_VERSION'; fs.writeFileSync('./package.json', JSON.stringify(package, null, '\t') + '\n');"
50+
git commit . -m "πŸ”– @huggingface/blake3-wasm $BUMPED_VERSION"
51+
git tag "blake3-wasm-v$BUMPED_VERSION"
52+
- run: pnpm publish --no-git-checks .
53+
env:
54+
NODE_AUTH_TOKEN: ${{ secrets.NPM_TOKEN }}
55+
- run: (git pull --rebase && git push --follow-tags) || (git pull --rebase && git push --follow-tags)
56+
# hack - reuse actions/setup-node@v3 just to set a new registry
57+
- uses: actions/setup-node@v3
58+
with:
59+
node-version: "20"
60+
registry-url: "https://npm.pkg.github.com"
61+
# Disable for now, until github supports PATs for writing github packages (https://github.com/github/roadmap/issues/558)
62+
# - run: pnpm publish --no-git-checks .
63+
# env:
64+
# NODE_AUTH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,64 @@
1+
name: Gearhash WASM - Version and Release
2+
3+
on:
4+
workflow_dispatch:
5+
inputs:
6+
newversion:
7+
type: choice
8+
description: "Semantic Version Bump Type"
9+
default: patch
10+
options:
11+
- patch
12+
- minor
13+
- major
14+
15+
concurrency:
16+
group: "push-to-main"
17+
18+
defaults:
19+
run:
20+
working-directory: packages/gearhash-wasm
21+
22+
jobs:
23+
version_and_release:
24+
runs-on: ubuntu-latest
25+
steps:
26+
- uses: actions/checkout@v3
27+
with:
28+
# Needed to push the tag and the commit on the main branch, otherwise we get:
29+
# > Run git push --follow-tags
30+
# remote: error: GH006: Protected branch update failed for refs/heads/main.
31+
# remote: error: Changes must be made through a pull request. Required status check "lint" is expected.
32+
token: ${{ secrets.BOT_ACCESS_TOKEN }}
33+
- run: npm install -g corepack@latest && corepack enable
34+
- uses: actions/setup-node@v3
35+
with:
36+
node-version: "20"
37+
cache: "pnpm"
38+
cache-dependency-path: |
39+
packages/gearhash-wasm/pnpm-lock.yaml
40+
# setting a registry enables the NODE_AUTH_TOKEN env variable where we can set an npm token. REQUIRED
41+
registry-url: "https://registry.npmjs.org"
42+
- run: pnpm install
43+
- run: git config --global user.name machineuser
44+
- run: git config --global user.email [email protected]
45+
- run: |
46+
PACKAGE_VERSION=$(node -p "require('./package.json').version")
47+
BUMPED_VERSION=$(node -p "require('semver').inc('$PACKAGE_VERSION', '${{ github.event.inputs.newversion }}')")
48+
# Update package.json with the new version
49+
node -e "const fs = require('fs'); const package = JSON.parse(fs.readFileSync('./package.json')); package.version = '$BUMPED_VERSION'; fs.writeFileSync('./package.json', JSON.stringify(package, null, '\t') + '\n');"
50+
git commit . -m "πŸ”– @huggingface/gearhash-wasm $BUMPED_VERSION"
51+
git tag "gearhash-wasm-v$BUMPED_VERSION"
52+
- run: pnpm publish --no-git-checks .
53+
env:
54+
NODE_AUTH_TOKEN: ${{ secrets.NPM_TOKEN }}
55+
- run: (git pull --rebase && git push --follow-tags) || (git pull --rebase && git push --follow-tags)
56+
# hack - reuse actions/setup-node@v3 just to set a new registry
57+
- uses: actions/setup-node@v3
58+
with:
59+
node-version: "20"
60+
registry-url: "https://npm.pkg.github.com"
61+
# Disable for now, until github supports PATs for writing github packages (https://github.com/github/roadmap/issues/558)
62+
# - run: pnpm publish --no-git-checks .
63+
# env:
64+
# NODE_AUTH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,64 @@
1+
name: Splitmix64 WASM - Version and Release
2+
3+
on:
4+
workflow_dispatch:
5+
inputs:
6+
newversion:
7+
type: choice
8+
description: "Semantic Version Bump Type"
9+
default: patch
10+
options:
11+
- patch
12+
- minor
13+
- major
14+
15+
concurrency:
16+
group: "push-to-main"
17+
18+
defaults:
19+
run:
20+
working-directory: packages/splitmix64-wasm
21+
22+
jobs:
23+
version_and_release:
24+
runs-on: ubuntu-latest
25+
steps:
26+
- uses: actions/checkout@v3
27+
with:
28+
# Needed to push the tag and the commit on the main branch, otherwise we get:
29+
# > Run git push --follow-tags
30+
# remote: error: GH006: Protected branch update failed for refs/heads/main.
31+
# remote: error: Changes must be made through a pull request. Required status check "lint" is expected.
32+
token: ${{ secrets.BOT_ACCESS_TOKEN }}
33+
- run: npm install -g corepack@latest && corepack enable
34+
- uses: actions/setup-node@v3
35+
with:
36+
node-version: "20"
37+
cache: "pnpm"
38+
cache-dependency-path: |
39+
packages/splitmix64-wasm/pnpm-lock.yaml
40+
# setting a registry enables the NODE_AUTH_TOKEN env variable where we can set an npm token. REQUIRED
41+
registry-url: "https://registry.npmjs.org"
42+
- run: pnpm install
43+
- run: git config --global user.name machineuser
44+
- run: git config --global user.email [email protected]
45+
- run: |
46+
PACKAGE_VERSION=$(node -p "require('./package.json').version")
47+
BUMPED_VERSION=$(node -p "require('semver').inc('$PACKAGE_VERSION', '${{ github.event.inputs.newversion }}')")
48+
# Update package.json with the new version
49+
node -e "const fs = require('fs'); const package = JSON.parse(fs.readFileSync('./package.json')); package.version = '$BUMPED_VERSION'; fs.writeFileSync('./package.json', JSON.stringify(package, null, '\t') + '\n');"
50+
git commit . -m "πŸ”– @huggingface/splitmix64-wasm $BUMPED_VERSION"
51+
git tag "splitmix64-wasm-v$BUMPED_VERSION"
52+
- run: pnpm publish --no-git-checks .
53+
env:
54+
NODE_AUTH_TOKEN: ${{ secrets.NPM_TOKEN }}
55+
- run: (git pull --rebase && git push --follow-tags) || (git pull --rebase && git push --follow-tags)
56+
# hack - reuse actions/setup-node@v3 just to set a new registry
57+
- uses: actions/setup-node@v3
58+
with:
59+
node-version: "20"
60+
registry-url: "https://npm.pkg.github.com"
61+
# Disable for now, until github supports PATs for writing github packages (https://github.com/github/roadmap/issues/558)
62+
# - run: pnpm publish --no-git-checks .
63+
# env:
64+
# NODE_AUTH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,64 @@
1+
name: Xetchunk WASM - Version and Release
2+
3+
on:
4+
workflow_dispatch:
5+
inputs:
6+
newversion:
7+
type: choice
8+
description: "Semantic Version Bump Type"
9+
default: patch
10+
options:
11+
- patch
12+
- minor
13+
- major
14+
15+
concurrency:
16+
group: "push-to-main"
17+
18+
defaults:
19+
run:
20+
working-directory: packages/xetchunk-wasm
21+
22+
jobs:
23+
version_and_release:
24+
runs-on: ubuntu-latest
25+
steps:
26+
- uses: actions/checkout@v3
27+
with:
28+
# Needed to push the tag and the commit on the main branch, otherwise we get:
29+
# > Run git push --follow-tags
30+
# remote: error: GH006: Protected branch update failed for refs/heads/main.
31+
# remote: error: Changes must be made through a pull request. Required status check "lint" is expected.
32+
token: ${{ secrets.BOT_ACCESS_TOKEN }}
33+
- run: npm install -g corepack@latest && corepack enable
34+
- uses: actions/setup-node@v3
35+
with:
36+
node-version: "20"
37+
cache: "pnpm"
38+
cache-dependency-path: |
39+
packages/xetchunk-wasm/pnpm-lock.yaml
40+
# setting a registry enables the NODE_AUTH_TOKEN env variable where we can set an npm token. REQUIRED
41+
registry-url: "https://registry.npmjs.org"
42+
- run: pnpm install
43+
- run: git config --global user.name machineuser
44+
- run: git config --global user.email [email protected]
45+
- run: |
46+
PACKAGE_VERSION=$(node -p "require('./package.json').version")
47+
BUMPED_VERSION=$(node -p "require('semver').inc('$PACKAGE_VERSION', '${{ github.event.inputs.newversion }}')")
48+
# Update package.json with the new version
49+
node -e "const fs = require('fs'); const package = JSON.parse(fs.readFileSync('./package.json')); package.version = '$BUMPED_VERSION'; fs.writeFileSync('./package.json', JSON.stringify(package, null, '\t') + '\n');"
50+
git commit . -m "πŸ”– @huggingface/xetchunk-wasm $BUMPED_VERSION"
51+
git tag "xetchunk-wasm-v$BUMPED_VERSION"
52+
- run: pnpm publish --no-git-checks .
53+
env:
54+
NODE_AUTH_TOKEN: ${{ secrets.NPM_TOKEN }}
55+
- run: (git pull --rebase && git push --follow-tags) || (git pull --rebase && git push --follow-tags)
56+
# hack - reuse actions/setup-node@v3 just to set a new registry
57+
- uses: actions/setup-node@v3
58+
with:
59+
node-version: "20"
60+
registry-url: "https://npm.pkg.github.com"
61+
# Disable for now, until github supports PATs for writing github packages (https://github.com/github/roadmap/issues/558)
62+
# - run: pnpm publish --no-git-checks .
63+
# env:
64+
# NODE_AUTH_TOKEN: ${{ secrets.GITHUB_TOKEN }}

β€Žpackages/blake3-wasm/BENCHMARK.md

Lines changed: 107 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,107 @@
1+
# BLAKE3 Performance Benchmark
2+
3+
This benchmark measures the throughput (MB/s) of the BLAKE3 hashing implementation when processing random data of various sizes.
4+
5+
## Features
6+
7+
- **Multiple data sizes**: Tests from 1 KB to 100 MB
8+
- **Three hashing methods**:
9+
- **Single-shot**: Direct `blake3(data)` calls
10+
- **Streaming**: Using `createHasher()` with single update
11+
- **Chunked**: Simulating large file processing with 64KB chunks
12+
- **Automatic iteration adjustment**: More iterations for smaller data sizes
13+
- **Warm-up runs**: Ensures consistent performance measurements
14+
- **Detailed reporting**: Shows time, throughput, and summary
15+
16+
## Usage
17+
18+
### Run the benchmark:
19+
20+
```bash
21+
pnpm run bench
22+
```
23+
24+
### From Node.js:
25+
26+
```javascript
27+
import { runBenchmark } from "./tests/bench.js";
28+
29+
const results = runBenchmark();
30+
```
31+
32+
### Individual size benchmark:
33+
34+
```javascript
35+
import { benchmarkSize } from "./tests/bench.js";
36+
37+
const result = benchmarkSize(1000 * 1000, 10); // 1MB, 10 iterations
38+
console.log(result);
39+
```
40+
41+
## Output Format
42+
43+
The benchmark provides:
44+
45+
- **Per-size results**: Time and throughput for each data size
46+
- **Summary table**: Comparison across all sizes and methods
47+
- **Best performance**: Highlights the fastest method and size
48+
49+
Example output:
50+
51+
```
52+
BLAKE3 Performance Benchmark
53+
============================================================
54+
55+
πŸ“Š Benchmarking 1.0 KB data (100 iterations, 100.0 KB total)
56+
────────────────────────────────────────────────────────────
57+
πŸ”Ή Single-shot: 12.65ms (7.90 MB/s)
58+
πŸ”Ή Streaming: 11.94ms (8.37 MB/s)
59+
πŸ”Ή Chunked: 12.44ms (8.04 MB/s)
60+
61+
πŸ“Š Benchmarking 64.0 KB data (100 iterations, 6.4 MB total)
62+
────────────────────────────────────────────────────────────
63+
πŸ”Ή Single-shot: 701.26ms (9.13 MB/s)
64+
πŸ”Ή Streaming: 688.19ms (9.30 MB/s)
65+
πŸ”Ή Chunked: 703.23ms (9.10 MB/s)
66+
67+
πŸ“ˆ SUMMARY
68+
============================================================
69+
Data Size | Single-shot | Streaming | Chunked
70+
────────────────────────────────────────────────────────────
71+
1.0 KB | 7.90 MB/s | 8.37 MB/s | 8.04 MB/s
72+
64.0 KB | 9.13 MB/s | 9.30 MB/s | 9.10 MB/s
73+
74+
πŸ† BEST PERFORMANCE
75+
────────────────────────────────────────────────────────────
76+
Method: Streaming
77+
Data Size: 64.0 KB
78+
Throughput: 9.30 MB/s
79+
```
80+
81+
## Throughput Units
82+
83+
The benchmark uses decimal units (power of 1000) for consistency:
84+
85+
- **MB/s**: Megabytes per second (1,000,000 bytes/second)
86+
- **GB/s**: Gigabytes per second (1,000,000,000 bytes/second)
87+
88+
## Data Sizes Tested
89+
90+
- **1 KB**: Small data performance
91+
- **64 KB**: Medium data performance
92+
- **1 MB**: Large data performance
93+
- **10 MB**: Very large data performance
94+
- **100 MB**: Massive data performance
95+
96+
## Iteration Counts
97+
98+
- **Small data** (< 1 MB): 100 iterations for statistical accuracy
99+
- **Medium data** (1-10 MB): 10 iterations for reasonable runtime
100+
- **Large data** (> 10 MB): 3 iterations to avoid excessive runtime
101+
102+
## Notes
103+
104+
- Random data is generated for each test to ensure realistic performance
105+
- Warm-up runs are performed before timing to ensure consistent results
106+
- All measurements use `performance.now()` for high-precision timing
107+
- The benchmark automatically adjusts iterations based on data size

0 commit comments

Comments
Β (0)