Skip to content

Commit

Permalink
doc: update bench and documents
Browse files Browse the repository at this point in the history
  • Loading branch information
liuq19 committed Oct 23, 2023
1 parent 84ba2d7 commit a5a279c
Show file tree
Hide file tree
Showing 6 changed files with 108 additions and 73 deletions.
77 changes: 49 additions & 28 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,14 +39,19 @@ More details about optimization can be found in [performance.md](docs/performanc

5. Supprt `RawValue`, `Number` and `RawNumber`(just like Golang's `JsonNumber`) in default.

6. The floating parsing percision is as Rust std in default.

## Quick to use sonic-rs

To ensure that SIMD instruction is used in sonic-rs, you need to add rustflags `-C target-cpu=native` and compile on the host machine. For example, Rust flags can be configured in Cargo [config](.cargo/config).

Choose what features?

`default`: the fast version that does not validate UTF-8 when parsing for performance.

`utf8`: provides UTF-8 validation when parsing JSON from a slice.


## Benchmark

Benchmarks environemnt:
Expand All @@ -63,6 +68,7 @@ Benchmarks:

The serialize benchmarks work in the opposite way.

All deserialized benchmark enabled utf-8, and enabled `float_roundtrip` in `serde-json` to get sufficient precision as Rust std.

### Deserialize Struct (Enabled utf8 validation)

Expand All @@ -74,31 +80,31 @@ Sonic-rs is faster than simd-json because simd-json (Rust) first parses the JSON

```
twitter/sonic_rs::from_slice
time: [718.60 µs 724.47 µs 731.05 µs]
time: [721.80 µs 747.81 µs 776.19 µs]
twitter/simd_json::from_slice
time: [1.0325 ms 1.0486 ms 1.0664 ms]
time: [1.0909 ms 1.1225 ms 1.1561 ms]
twitter/serde_json::from_slice
time: [2.3070 ms 2.3271 ms 2.3506 ms]
time: [2.3218 ms 2.3491 ms 2.3787 ms]
twitter/serde_json::from_str
time: [1.3797 ms 1.3996 ms 1.4237 ms]
time: [1.4123 ms 1.4460 ms 1.4842 ms]
citm_catalog/sonic_rs::from_slice
time: [1.3413 ms 1.3673 ms 1.3985 ms]
time: [1.2133 ms 1.2447 ms 1.2827 ms]
citm_catalog/simd_json::from_slice
time: [2.3324 ms 2.4122 ms 2.4988 ms]
time: [2.0556 ms 2.0822 ms 2.1126 ms]
citm_catalog/serde_json::from_slice
time: [3.0485 ms 3.0965 ms 3.1535 ms]
time: [2.9939 ms 3.0271 ms 3.0674 ms]
citm_catalog/serde_json::from_str
time: [2.4495 ms 2.4661 ms 2.4836 ms]
time: [2.4043 ms 2.4604 ms 2.5283 ms]
canada/sonic_rs::from_slice
time: [4.3249 ms 4.4713 ms 4.6286 ms]
time: [3.8612 ms 3.9070 ms 3.9574 ms]
canada/simd_json::from_slice
time: [8.3872 ms 8.5095 ms 8.6519 ms]
time: [8.8144 ms 8.9206 ms 9.0317 ms]
canada/serde_json::from_slice
time: [6.5207 ms 6.5938 ms 6.6787 ms]
time: [8.8703 ms 8.9586 ms 9.0555 ms]
canada/serde_json::from_str
time: [6.6534 ms 6.8373 ms 7.0402 ms]
time: [9.2865 ms 9.4272 ms 9.6032 ms]
```


Expand All @@ -113,40 +119,39 @@ The benchmark will parse JSON into a document. Sonic-rs seems faster for several

```
twitter/sonic_rs_dom::from_slice
time: [624.60 µs 631.67 µs 639.76 µs]
time: [589.34 µs 593.81 µs 599.02 µs]
twitter/simd_json::slice_to_borrowed_value
time: [1.2524 ms 1.2784 ms 1.3083 ms]
time: [1.2174 ms 1.2281 ms 1.2406 ms]
twitter/serde_json::from_slice
time: [4.1991 ms 4.3552 ms 4.5264 ms]
time: [3.9370 ms 3.9658 ms 3.9960 ms]
twitter/serde_json::from_str
time: [3.0258 ms 3.1086 ms 3.2005 ms]
time: [2.8013 ms 2.8278 ms 2.8584 ms]
twitter/simd_json::slice_to_owned_value
time: [1.8195 ms 1.8382 ms 1.8583 ms]
time: [1.7537 ms 1.7857 ms 1.8220 ms]
citm_catalog/sonic_rs_dom::from_slice
time: [1.8528 ms 1.8962 ms 1.9452 ms]
time: [1.7779 ms 1.8326 ms 1.8942 ms]
citm_catalog/simd_json::slice_to_borrowed_value
time: [3.5543 ms 3.6127 ms 3.6814 ms]
time: [4.0278 ms 4.1167 ms 4.2103 ms]
citm_catalog/serde_json::from_slice
time: [9.0163 ms 9.2052 ms 9.4167 ms]
time: [9.4022 ms 9.5598 ms 9.7242 ms]
citm_catalog/serde_json::from_str
time: [8.0306 ms 8.1450 ms 8.2843 ms]
time: [7.7487 ms 7.9720 ms 8.2212 ms]
citm_catalog/simd_json::slice_to_owned_value
time: [4.2538 ms 4.3171 ms 4.3990 ms]
time: [4.1156 ms 4.1760 ms 4.2489 ms]
canada/sonic_rs_dom::from_slice
time: [5.2105 ms 5.2761 ms 5.3474 ms]
time: [4.9905 ms 5.0650 ms 5.1539 ms]
canada/simd_json::slice_to_borrowed_value
time: [12.557 ms 12.773 ms 13.031 ms]
time: [11.931 ms 12.142 ms 12.384 ms]
canada/serde_json::from_slice
time: [14.875 ms 15.073 ms 15.315 ms]
time: [17.262 ms 17.433 ms 17.634 ms]
canada/serde_json::from_str
time: [14.603 ms 14.868 ms 15.173 ms]
time: [16.579 ms 16.773 ms 17.025 ms]
canada/simd_json::slice_to_owned_value
time: [12.548 ms 12.637 ms 12.737 ms]
time: [12.024 ms 12.209 ms 12.423 ms]
```


### Serialize Untyped

`cargo bench --bench serialize_value -- --quiet`
Expand Down Expand Up @@ -357,6 +362,22 @@ If we need parse a JSON number ***without loss of percision***, we can use `RawN

Detailed examples can be found in [raw_value.rs](examples/raw_value.rs) and [json_number.rs](examples/json_number.rs).

## FAQs

### About UTF-8

By default, sonic-rs does not enable UTF-8 validation. This is a trade-off to achieve the fastest performance.

- For the `from_slice` and `dom_from_slice` interfaces, if you need to validate UTF-8 for the parsed JSON, please use the `utf8` feature.

- For the `get` and `lazyvalue` related interfaces, due to the algorithm design, these interfaces are ***only suitable for use in valid-json scenarios***, and we will not provide UTF-8 validation in the future.

### About floating point precision

By default, sonic-rs uses floating point precision consistent with the Rust standard library, and there is no need to add an extra `float_roundtrip` feature like `serde-json` to ensure floating point precision.

If you want to achieve lossless precision when parsing floating-point numbers, such as Golang `JsonNumber` and `serde-json arbitrary_precision`, you can use `RawNumber`.

## Acknowledgement

Thanks the following open-source libraries. sonic-rs has some references to other open-source libraries like [sonic_cpp](https://github.com/bytedance/sonic-cpp), [serde_json](https://github.com/serde-rs/json), [sonic](https://github.com/bytedance/sonic), [simdjson](https://github.com/simdjson/simdjson), [yyjson](https://github.com/ibireme/yyjson), [rust-std](https://github.com/rust-lang/rust/tree/master/library/core/src/num) and so on.
Expand Down
72 changes: 45 additions & 27 deletions README_ZH.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ sonic-rs 的主要优化是使用 SIMD。然而,sonic-rs 没有使用来自`si
3. 从 JSON 中获取特定字段
4. 将 JSON 解析为惰性迭代器
5. 在默认情况下支持 `RawValue``Number``RawNumber`(就像 Golang 的 `JsonNumber`)。
6. 浮点数精度默认和 Rust 标准库对齐

## 如何使用 sonic-rs

Expand Down Expand Up @@ -63,6 +64,8 @@ Model name: Intel(R) Xeon(R) Platinum 8260 CPU @ 2.40GHz

序列化基准测试也是如此。

解析相关 benchmark 都开启了 UTF-8 校验,同时 `serde-json` 开启了 `float_roundtrip` feature, 以便解析浮点数具有足够精度,和 Rust 标准库对齐。

### 解析到结构体(启用 utf8 验证)

基准测试将把 JSON 解析成 Rust 结构体,JSON 文本中没有未知字段。JSON 中的所有字段都被解析为结构体字段。
Expand All @@ -73,31 +76,31 @@ Sonic-rs 比 simd-json 更快,因为 simd-json (Rust) 首先将 JSON 解析成

```
twitter/sonic_rs::from_slice
time: [718.60 µs 724.47 µs 731.05 µs]
time: [721.80 µs 747.81 µs 776.19 µs]
twitter/simd_json::from_slice
time: [1.0325 ms 1.0486 ms 1.0664 ms]
time: [1.0909 ms 1.1225 ms 1.1561 ms]
twitter/serde_json::from_slice
time: [2.3070 ms 2.3271 ms 2.3506 ms]
time: [2.3218 ms 2.3491 ms 2.3787 ms]
twitter/serde_json::from_str
time: [1.3797 ms 1.3996 ms 1.4237 ms]
time: [1.4123 ms 1.4460 ms 1.4842 ms]
citm_catalog/sonic_rs::from_slice
time: [1.3413 ms 1.3673 ms 1.3985 ms]
time: [1.2133 ms 1.2447 ms 1.2827 ms]
citm_catalog/simd_json::from_slice
time: [2.3324 ms 2.4122 ms 2.4988 ms]
time: [2.0556 ms 2.0822 ms 2.1126 ms]
citm_catalog/serde_json::from_slice
time: [3.0485 ms 3.0965 ms 3.1535 ms]
time: [2.9939 ms 3.0271 ms 3.0674 ms]
citm_catalog/serde_json::from_str
time: [2.4495 ms 2.4661 ms 2.4836 ms]
time: [2.4043 ms 2.4604 ms 2.5283 ms]
canada/sonic_rs::from_slice
time: [4.3249 ms 4.4713 ms 4.6286 ms]
time: [3.8612 ms 3.9070 ms 3.9574 ms]
canada/simd_json::from_slice
time: [8.3872 ms 8.5095 ms 8.6519 ms]
time: [8.8144 ms 8.9206 ms 9.0317 ms]
canada/serde_json::from_slice
time: [6.5207 ms 6.5938 ms 6.6787 ms]
time: [8.8703 ms 8.9586 ms 9.0555 ms]
canada/serde_json::from_str
time: [6.6534 ms 6.8373 ms 7.0402 ms]
time: [9.2865 ms 9.4272 ms 9.6032 ms]
```


Expand All @@ -112,37 +115,37 @@ canada/serde_json::from_str

```
twitter/sonic_rs_dom::from_slice
time: [624.60 µs 631.67 µs 639.76 µs]
time: [589.34 µs 593.81 µs 599.02 µs]
twitter/simd_json::slice_to_borrowed_value
time: [1.2524 ms 1.2784 ms 1.3083 ms]
time: [1.2174 ms 1.2281 ms 1.2406 ms]
twitter/serde_json::from_slice
time: [4.1991 ms 4.3552 ms 4.5264 ms]
time: [3.9370 ms 3.9658 ms 3.9960 ms]
twitter/serde_json::from_str
time: [3.0258 ms 3.1086 ms 3.2005 ms]
time: [2.8013 ms 2.8278 ms 2.8584 ms]
twitter/simd_json::slice_to_owned_value
time: [1.8195 ms 1.8382 ms 1.8583 ms]
time: [1.7537 ms 1.7857 ms 1.8220 ms]
citm_catalog/sonic_rs_dom::from_slice
time: [1.8528 ms 1.8962 ms 1.9452 ms]
time: [1.7779 ms 1.8326 ms 1.8942 ms]
citm_catalog/simd_json::slice_to_borrowed_value
time: [3.5543 ms 3.6127 ms 3.6814 ms]
time: [4.0278 ms 4.1167 ms 4.2103 ms]
citm_catalog/serde_json::from_slice
time: [9.0163 ms 9.2052 ms 9.4167 ms]
time: [9.4022 ms 9.5598 ms 9.7242 ms]
citm_catalog/serde_json::from_str
time: [8.0306 ms 8.1450 ms 8.2843 ms]
time: [7.7487 ms 7.9720 ms 8.2212 ms]
citm_catalog/simd_json::slice_to_owned_value
time: [4.2538 ms 4.3171 ms 4.3990 ms]
time: [4.1156 ms 4.1760 ms 4.2489 ms]
canada/sonic_rs_dom::from_slice
time: [5.2105 ms 5.2761 ms 5.3474 ms]
time: [4.9905 ms 5.0650 ms 5.1539 ms]
canada/simd_json::slice_to_borrowed_value
time: [12.557 ms 12.773 ms 13.031 ms]
time: [11.931 ms 12.142 ms 12.384 ms]
canada/serde_json::from_slice
time: [14.875 ms 15.073 ms 15.315 ms]
time: [17.262 ms 17.433 ms 17.634 ms]
canada/serde_json::from_str
time: [14.603 ms 14.868 ms 15.173 ms]
time: [16.579 ms 16.773 ms 17.025 ms]
canada/simd_json::slice_to_owned_value
time: [12.548 ms 12.637 ms 12.737 ms]
time: [12.024 ms 12.209 ms 12.423 ms]
```


Expand Down Expand Up @@ -356,6 +359,21 @@ fn main() {

详细示例可以在[raw_value.rs](examples/raw_value.rs)[json_number.rs](examples/json_number.rs) 中找到。

## 常见问题

### 关于 UTF-8

sonic-rs 默认并不开启 utf-8 校验,这是为了性能做出的权衡。

- 对于 `from_slice``dom_from_slice` 接口,需要对解析的 JSON 校验UTF-8,请使用 `utf8` feature.

- 对于 `get``lazyvaue` 相关接口,由于实现算法设计的原因,这些接口***只适合在 valid-json 场景下使用***,我们后续也不会提供 utf-8 校验。

### 关于浮点数精度

sonic-rs 默认使用和 Rust 标准库一致的浮点数精度,无需像 `serde-json` 那样添加额外的 `float_roundtrip` feature 来保证浮点数精度。

如果想在解析浮点数时,做到精度无损失,例如 Golang `JsonNumber``serde-json arbitrary_precision`,可以使用 `RawNumber`

## 致谢

Expand Down
14 changes: 6 additions & 8 deletions benches/deserialize_struct.rs
Original file line number Diff line number Diff line change
Expand Up @@ -75,14 +75,12 @@ macro_rules! bench_file {
.unwrap();

// verify sonic-rs parse
if stringify!($name) != "canada" {
let serde_val: $structure = serde_json::from_slice(&vec).unwrap();
let serde_out = serde_json::to_string_pretty(&serde_val).unwrap();

let value : $structure = sonic_rs::from_slice(&vec).unwrap();
let out = sonic_rs::to_string_pretty(&value).unwrap();
assert!(diff_json(&out, &serde_out));
}
let serde_val: $structure = serde_json::from_slice(&vec).unwrap();
let serde_out = serde_json::to_string_pretty(&serde_val).unwrap();

let value : $structure = sonic_rs::from_slice(&vec).unwrap();
let out = sonic_rs::to_string_pretty(&value).unwrap();
assert!(diff_json(&out, &serde_out));

let mut group = c.benchmark_group(stringify!($name));
group.sampling_mode(SamplingMode::Flat);
Expand Down
14 changes: 6 additions & 8 deletions benches/deserialize_value.rs
Original file line number Diff line number Diff line change
Expand Up @@ -46,14 +46,12 @@ macro_rules! bench_file {
.unwrap();

// verify sonic-rs parse
if stringify!($name) != "canada" {
let serde_out: serde_json::Value = serde_json::from_slice(&vec).unwrap();
let serde_out: serde_json::Value = serde_json::from_slice(&vec).unwrap();

let value = sonic_rs::value::dom_from_slice(&vec).unwrap();
let out = sonic_rs::to_string(&value).unwrap();
let rs_out1: serde_json::Value = serde_json::from_str(&out).unwrap();
assert_eq!(rs_out1, serde_out);
}
let value = sonic_rs::value::dom_from_slice(&vec).unwrap();
let out = sonic_rs::to_string(&value).unwrap();
let rs_out1: serde_json::Value = serde_json::from_str(&out).unwrap();
assert_eq!(rs_out1, serde_out);

let mut group = c.benchmark_group(stringify!($name));
group.sampling_mode(SamplingMode::Flat);
Expand Down Expand Up @@ -109,5 +107,5 @@ bench_file!(twitter);
bench_file!(github_events);

// criterion_group!(benches, canada, otfcc, citm_catalog, twitter, lottie, github_events, twitterescaped, book, poet, fgo);
criterion_group!(benches, twitter, citm_catalog, canada);
criterion_group!(benches, canada);
criterion_main!(benches);
2 changes: 1 addition & 1 deletion docs/performance.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Some deatils of sonic-rs optimization
# Some details of sonic-rs optimization

This document will introduce some performance optimization details of sonic-rs (commit `631411b`). Here are four main sections of optimization:

Expand Down
2 changes: 1 addition & 1 deletion fuzz/Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

0 comments on commit a5a279c

Please sign in to comment.