*.md: various minor edits

Bulat-Ziganshin · May 15, 2017 · efa8d5a · efa8d5a
1 parent 5fb217d
commit efa8d5a
Show file tree

Hide file tree

Showing 6 changed files with 15 additions and 12 deletions.
diff --git a/Benchmarks.md b/Benchmarks.md
@@ -1,5 +1,6 @@
 
-All tests are performed on i7-4770 with 2-channel DDR3-1600 memory, employing all CPU cores.
+All tests are performed on [i7-4770](https://ark.intel.com/products/75122/Intel-Core-i7-4770-Processor-8M-Cache-up-to-3_90-GHz)
+with 2-channel DDR3-1600 memory, employing all CPU cores.
 Speeds are measured in MiB/s (mebibytes/second), add 5% to convert into MB/s (megabytes/second).
 
 Executables are compiled by (-DSIMD selects vectorizable code path):
@@ -19,7 +20,7 @@ For NTT(2^20), we expect speed of 1 GB/s for SSE2 version, and 2 GB/s for AVX2 v
 
 ### Reed-Solomon encoding
 
-Reed-Solomon encoding (2^19 source blocks => 2^19 ECC blocks, 2052 bytes each) in GF(0xFFF00001):
+Reed-Solomon encoding (2^19 data blocks => 2^19 parity blocks, 2052 bytes each) in GF(0xFFF00001):
 ```
 rs64g-avx2:  1766 ms = 1162 MiB/s,  cpu 12932 ms = 732%,  os 31 ms
 rs64g-sse2:  2354 ms = 872 MiB/s,  cpu 16677 ms = 708%,  os 62 ms

diff --git a/GF.md b/GF.md
@@ -8,7 +8,7 @@ This program is useful for researching ring properties, in particular maximal or
 
 ### Lucky number: choosing the best base for computations
 
-Since GF(2^n) doesn't have much roots of unity, NTT-based Reed-Solomon codes implementation can't perform computations in this field.
+Since GF(2^n) doesn't have much roots of unity, efficient NTT-based Reed-Solomon codes implementation can't perform computations in this field.
 Instead, we need to use other Galois Field, or even Ring modulo some number. GF(p^n) has a maximal order of p^n-1.
 For rings, the maximal order is defined by complex formula that you can find in chapter `39.7 Composite modulus: the ring Z=mZ` of [FxtBook](http://www.jjj.de/fxt/fxtbook.pdf).
 
@@ -85,7 +85,7 @@ After the flag 0, remaining input items contains values of remaining output elem
 
 Once input (source) data are recoded in this way, we need to store the extra bit in the way which ensure that the bit can be restored
 in any situation when the data block can be restored. The best way to ensure this, that I found, is to save the extra bit as one more (1025'th)
-source word. So, all operations are performed on 4100-byte blocks, and ECC sectors stored are 4100-byte long. Sad, but I don't see better choice.
+source word. So, all operations are performed on 4100-byte blocks, and parity sectors stored are 4100-byte long. Sad, but I don't see better choice.
 Remaining bits of the extra word can be used to store block checksum, although i don't see much gain in that.
 
 Of course, when 64-bit base and/or GF(p^2) field are used, extra data will be increased to 8-16 bytes.
diff --git a/Overview.md b/Overview.md
@@ -11,7 +11,7 @@ Topics:
 * [Discrete Fourier transform](https://en.wikipedia.org/wiki/Discrete_Fourier_transform)
 * [Fast Fourier transform](https://en.wikipedia.org/wiki/Fast_Fourier_transform) and in particular
 [Cooley–Tukey FFT algorithm](https://en.wikipedia.org/wiki/Cooley%E2%80%93Tukey_FFT_algorithm) as O(N*log(N)) algorithm implementing DFT
-* [Number-theoretic transform](https://en.wikipedia.org/wiki/Discrete_Fourier_transform_(general)) as modified FFT
+* [Fast Number-Theoretic Transform](https://en.wikipedia.org/wiki/Discrete_Fourier_transform_(general)) as modified FFT
 employing the same add/sub/mul operations and unity roots, but in Galois Field
 
 Once you grasped all these topics, you can grab some FFT implementation and convert it to NTT.
@@ -51,7 +51,7 @@ So we just multiply source `A` vector by Vandermonde `(n+k)*n` matrix generated
 It's guaranteed that any `n` different `a[i]` numbers form an invertible Vandermonde matrix, so we can restore from any `n` remaining words after a loss.
 * [Plank proposed](http://web.eecs.utk.edu/~plank/plank/papers/SPE-04.html) to start with Vandermonde `(n+k)*n` matrix
 and then apply the [Gaussian elimination](https://en.wikipedia.org/wiki/Gaussian_elimination) in order to convert it to some `(I,M)` matrix.
-As far as we perform this operation only once per a lot of ECC computations, we can ignore the time required by this operation.
+As far as we perform this operation only once per a lot of parity computations, we can ignore the time required by this operation.
 * PAR2 format employs `(I,V)` encoding matrix, i.e. it employs Vandermonde `k*n` matrix to compute `k` ecc words while employing the systematic code.
 Despite of special form of `a[i]` used in their Vandermonde matrix, the restoration matrix is sometimes non-invertible.
 But it seems to be a good compromise between the speed/complexity of computations and recovery strength.

diff --git a/README.md b/README.md
@@ -1,4 +1,4 @@
-FastECC implements O(N*log(N)) [Reed-Solomon coder], running at 1.2 GB/s on i7-4770 in (2^20, 2^19) config,
+FastECC implements O(N*log(N)) [Reed-Solomon coder], running at [1.2 GB/s] on [i7-4770] in (2^20, 2^19) config,
 i.e. calculating 524288 parity blocks from 524288 data blocks.
 Version 0.1 implements only encoding, so it isn't yet ready for real use.
 
@@ -16,11 +16,11 @@ And computations in GF(2^32), implemented in the same way, will build one millio
 The only exception is closed-source [RSC32 by persicum] with O(N*log(N)) speed, i.e. it spends O(log(N)) time per parity block.
 Its speed with million parity blocks is 100 MB/s, i.e. it computes one million of 4 KB parity blocks
 from one million of data blocks (processing 8 GB overall) in just 80 seconds.
-Note that all speeds mentioned here are measured on i7-4770, employing all features available in a particular program -
+Note that all speeds mentioned here are measured on [i7-4770], employing all features available in a particular program -
 including multi-threading, SIMD and x64 support.
 
 FastECC is open-source library implementing O(N*log(N)) encoding algorithm.
-It computes million parity blocks at 1.2 GB/s.
+It computes million parity blocks at [1.2 GB/s].
 Future versions will implement decoding that's also `O(N*log(N))`, although 1.5-3 times slower than encoding.
 Current implementation is limited to 2^20 blocks, removing this limit is the main priority for future work
 aside of decoder implementation.
@@ -41,7 +41,7 @@ Moreover, it works with binary data, so no need for [recoding](GF.md#data-packin
 ## How
 
 All O(N*log(N)) Reed-Solomon implementations I'm aware of, use fast transforms like FFT or FWT.
-FastECC employs Number-Theoretic Transform that is just an FFT over integer field or ring.
+FastECC employs fast Number-Theoretic Transform that is just an FFT over integer field or ring.
 Let's see how it works. Note that below by `length-N polynomial` I mean any polynomial with order < N.
 
 For any given set of N points, only one length-N polynomial may go through all these points.
@@ -184,6 +184,8 @@ So, overall, FastECC should replace any use of 16-bit RS codecs, while LDPC and
 - [Hacker News story](https://news.ycombinator.com/item?id=14290617)
 
 
+[1.2 GB/s]: Benchmarks.md#reed-solomon-encoding
+[i7-4770]: https://ark.intel.com/products/75122/Intel-Core-i7-4770-Processor-8M-Cache-up-to-3_90-GHz
 [Reed-Solomon coder]: https://en.wikipedia.org/wiki/Reed%E2%80%93Solomon_error_correction
 [MultiPar]: https://www.livebusinesschat.com/smf/index.php?board=396.0
 [RSC32 by persicum]: https://www.livebusinesschat.com/smf/index.php?board=399.0

diff --git a/RS.md b/RS.md
@@ -7,7 +7,7 @@
 ### Prior art
 
 The encoding and decoding algorithms implemented by FastECC were described in the paper
-[An Efficient (n,k) Information Dispersal Algorithm based on Fermat Number Transforms](http://ieeexplore.ieee.org/document/6545355/)
+[An Efficient (n,k) Information Dispersal Algorithm based on Fermat Number Transforms](https://pdfs.semanticscholar.org/141d/c4ee4cca45b4ed1c07f890f758e427597db8.pdf)
 published in 2013 by Sian-Jheng Lin and Wei-Ho Chung.
 
 The following is my own investigations written prior to reading this great paper :)

diff --git a/compile.cmd b/compile.cmd
@@ -27,6 +27,6 @@ cl -Fe%name%32m.exe -Fa%name%32.asm -arch:SSE2 %options_ms_cl% %options_ms_x86%
 ::g++ -std=c++14 -m32 -O3 %main% -static -fopenmp -o%name%32g-sse2 -msse2 -DSIMD=SSE2 -Xlinker --large-address-aware
 ::g++ -std=c++14 -m32 -O3 %main% -static -fopenmp -o%name%32g      -mmmx              -Xlinker --large-address-aware
 
-::cl -Feprime.exe -O2 -EHsc prime.cpp
+::cl -Feprime.exe -O2 -EHsc prime.cpp -link %options_ms_x86%
 
 del *.exe.bak *.obj *.res >nul 2>nul