Skip to content

Commit

Permalink
*.md: various minor edits
Browse files Browse the repository at this point in the history
  • Loading branch information
Bulat-Ziganshin committed May 15, 2017
1 parent 5fb217d commit efa8d5a
Show file tree
Hide file tree
Showing 6 changed files with 15 additions and 12 deletions.
5 changes: 3 additions & 2 deletions Benchmarks.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@

All tests are performed on i7-4770 with 2-channel DDR3-1600 memory, employing all CPU cores.
All tests are performed on [i7-4770](https://ark.intel.com/products/75122/Intel-Core-i7-4770-Processor-8M-Cache-up-to-3_90-GHz)
with 2-channel DDR3-1600 memory, employing all CPU cores.
Speeds are measured in MiB/s (mebibytes/second), add 5% to convert into MB/s (megabytes/second).

Executables are compiled by (-DSIMD selects vectorizable code path):
Expand All @@ -19,7 +20,7 @@ For NTT(2^20), we expect speed of 1 GB/s for SSE2 version, and 2 GB/s for AVX2 v

### Reed-Solomon encoding

Reed-Solomon encoding (2^19 source blocks => 2^19 ECC blocks, 2052 bytes each) in GF(0xFFF00001):
Reed-Solomon encoding (2^19 data blocks => 2^19 parity blocks, 2052 bytes each) in GF(0xFFF00001):
```
rs64g-avx2: 1766 ms = 1162 MiB/s, cpu 12932 ms = 732%, os 31 ms
rs64g-sse2: 2354 ms = 872 MiB/s, cpu 16677 ms = 708%, os 62 ms
Expand Down
4 changes: 2 additions & 2 deletions GF.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ This program is useful for researching ring properties, in particular maximal or

### Lucky number: choosing the best base for computations

Since GF(2^n) doesn't have much roots of unity, NTT-based Reed-Solomon codes implementation can't perform computations in this field.
Since GF(2^n) doesn't have much roots of unity, efficient NTT-based Reed-Solomon codes implementation can't perform computations in this field.
Instead, we need to use other Galois Field, or even Ring modulo some number. GF(p^n) has a maximal order of p^n-1.
For rings, the maximal order is defined by complex formula that you can find in chapter `39.7 Composite modulus: the ring Z=mZ` of [FxtBook](http://www.jjj.de/fxt/fxtbook.pdf).

Expand Down Expand Up @@ -85,7 +85,7 @@ After the flag 0, remaining input items contains values of remaining output elem

Once input (source) data are recoded in this way, we need to store the extra bit in the way which ensure that the bit can be restored
in any situation when the data block can be restored. The best way to ensure this, that I found, is to save the extra bit as one more (1025'th)
source word. So, all operations are performed on 4100-byte blocks, and ECC sectors stored are 4100-byte long. Sad, but I don't see better choice.
source word. So, all operations are performed on 4100-byte blocks, and parity sectors stored are 4100-byte long. Sad, but I don't see better choice.
Remaining bits of the extra word can be used to store block checksum, although i don't see much gain in that.

Of course, when 64-bit base and/or GF(p^2) field are used, extra data will be increased to 8-16 bytes.
4 changes: 2 additions & 2 deletions Overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ Topics:
* [Discrete Fourier transform](https://en.wikipedia.org/wiki/Discrete_Fourier_transform)
* [Fast Fourier transform](https://en.wikipedia.org/wiki/Fast_Fourier_transform) and in particular
[Cooley–Tukey FFT algorithm](https://en.wikipedia.org/wiki/Cooley%E2%80%93Tukey_FFT_algorithm) as O(N*log(N)) algorithm implementing DFT
* [Number-theoretic transform](https://en.wikipedia.org/wiki/Discrete_Fourier_transform_(general)) as modified FFT
* [Fast Number-Theoretic Transform](https://en.wikipedia.org/wiki/Discrete_Fourier_transform_(general)) as modified FFT
employing the same add/sub/mul operations and unity roots, but in Galois Field

Once you grasped all these topics, you can grab some FFT implementation and convert it to NTT.
Expand Down Expand Up @@ -51,7 +51,7 @@ So we just multiply source `A` vector by Vandermonde `(n+k)*n` matrix generated
It's guaranteed that any `n` different `a[i]` numbers form an invertible Vandermonde matrix, so we can restore from any `n` remaining words after a loss.
* [Plank proposed](http://web.eecs.utk.edu/~plank/plank/papers/SPE-04.html) to start with Vandermonde `(n+k)*n` matrix
and then apply the [Gaussian elimination](https://en.wikipedia.org/wiki/Gaussian_elimination) in order to convert it to some `(I,M)` matrix.
As far as we perform this operation only once per a lot of ECC computations, we can ignore the time required by this operation.
As far as we perform this operation only once per a lot of parity computations, we can ignore the time required by this operation.
* PAR2 format employs `(I,V)` encoding matrix, i.e. it employs Vandermonde `k*n` matrix to compute `k` ecc words while employing the systematic code.
Despite of special form of `a[i]` used in their Vandermonde matrix, the restoration matrix is sometimes non-invertible.
But it seems to be a good compromise between the speed/complexity of computations and recovery strength.
Expand Down
10 changes: 6 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
FastECC implements O(N*log(N)) [Reed-Solomon coder], running at 1.2 GB/s on i7-4770 in (2^20, 2^19) config,
FastECC implements O(N*log(N)) [Reed-Solomon coder], running at [1.2 GB/s] on [i7-4770] in (2^20, 2^19) config,
i.e. calculating 524288 parity blocks from 524288 data blocks.
Version 0.1 implements only encoding, so it isn't yet ready for real use.

Expand All @@ -16,11 +16,11 @@ And computations in GF(2^32), implemented in the same way, will build one millio
The only exception is closed-source [RSC32 by persicum] with O(N*log(N)) speed, i.e. it spends O(log(N)) time per parity block.
Its speed with million parity blocks is 100 MB/s, i.e. it computes one million of 4 KB parity blocks
from one million of data blocks (processing 8 GB overall) in just 80 seconds.
Note that all speeds mentioned here are measured on i7-4770, employing all features available in a particular program -
Note that all speeds mentioned here are measured on [i7-4770], employing all features available in a particular program -
including multi-threading, SIMD and x64 support.

FastECC is open-source library implementing O(N*log(N)) encoding algorithm.
It computes million parity blocks at 1.2 GB/s.
It computes million parity blocks at [1.2 GB/s].
Future versions will implement decoding that's also `O(N*log(N))`, although 1.5-3 times slower than encoding.
Current implementation is limited to 2^20 blocks, removing this limit is the main priority for future work
aside of decoder implementation.
Expand All @@ -41,7 +41,7 @@ Moreover, it works with binary data, so no need for [recoding](GF.md#data-packin
## How

All O(N*log(N)) Reed-Solomon implementations I'm aware of, use fast transforms like FFT or FWT.
FastECC employs Number-Theoretic Transform that is just an FFT over integer field or ring.
FastECC employs fast Number-Theoretic Transform that is just an FFT over integer field or ring.
Let's see how it works. Note that below by `length-N polynomial` I mean any polynomial with order < N.

For any given set of N points, only one length-N polynomial may go through all these points.
Expand Down Expand Up @@ -184,6 +184,8 @@ So, overall, FastECC should replace any use of 16-bit RS codecs, while LDPC and
- [Hacker News story](https://news.ycombinator.com/item?id=14290617)


[1.2 GB/s]: Benchmarks.md#reed-solomon-encoding
[i7-4770]: https://ark.intel.com/products/75122/Intel-Core-i7-4770-Processor-8M-Cache-up-to-3_90-GHz
[Reed-Solomon coder]: https://en.wikipedia.org/wiki/Reed%E2%80%93Solomon_error_correction
[MultiPar]: https://www.livebusinesschat.com/smf/index.php?board=396.0
[RSC32 by persicum]: https://www.livebusinesschat.com/smf/index.php?board=399.0
Expand Down
2 changes: 1 addition & 1 deletion RS.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
### Prior art

The encoding and decoding algorithms implemented by FastECC were described in the paper
[An Efficient (n,k) Information Dispersal Algorithm based on Fermat Number Transforms](http://ieeexplore.ieee.org/document/6545355/)
[An Efficient (n,k) Information Dispersal Algorithm based on Fermat Number Transforms](https://pdfs.semanticscholar.org/141d/c4ee4cca45b4ed1c07f890f758e427597db8.pdf)
published in 2013 by Sian-Jheng Lin and Wei-Ho Chung.

The following is my own investigations written prior to reading this great paper :)
Expand Down
2 changes: 1 addition & 1 deletion compile.cmd
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,6 @@ cl -Fe%name%32m.exe -Fa%name%32.asm -arch:SSE2 %options_ms_cl% %options_ms_x86%
::g++ -std=c++14 -m32 -O3 %main% -static -fopenmp -o%name%32g-sse2 -msse2 -DSIMD=SSE2 -Xlinker --large-address-aware
::g++ -std=c++14 -m32 -O3 %main% -static -fopenmp -o%name%32g -mmmx -Xlinker --large-address-aware

::cl -Feprime.exe -O2 -EHsc prime.cpp
::cl -Feprime.exe -O2 -EHsc prime.cpp -link %options_ms_x86%

del *.exe.bak *.obj *.res >nul 2>nul

0 comments on commit efa8d5a

Please sign in to comment.