Skip to content

Commit

Permalink
Readme fixes and dependencies upgrade (#3)
Browse files Browse the repository at this point in the history
  • Loading branch information
d-led authored Oct 1, 2020
1 parent dc3323a commit cdf2189
Show file tree
Hide file tree
Showing 5 changed files with 142 additions and 143 deletions.
168 changes: 48 additions & 120 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,12 +8,32 @@ Endpoints:
- `/version` returns the server version


## Installing
## Quickstart

get the binary into `$GOPATH/bin`

```
go get -u github.com/siemens/link-checker-service
```

↓

```
link-checker-service serve
```

or run the service dockerized, without installing Go:

```
docker-compose up --build
```

or run from this source:

```
go run . serve
```

## Motivation

For a website author willing provide a link checking functionality there are few options available.
Expand All @@ -26,7 +46,9 @@ Thus, to minimize risk, a link checker should be isolated into a separate servic
Checking whether a link is broken seems like a trivial task, but consider checking a thousand links a thousand times.
Several optimizations and server, gateways, CDN or proxy implementation peculiarity work-arounds will need to be applied. This repository contains an implementation of such service.

## Example Request
## Usage

### Example Request

Start the server, e.g. `link-checker-service serve`, and send the following request body to `http://localhost:8080/checkUrls`:

Expand Down Expand Up @@ -81,6 +103,25 @@ Sample response:
}
```

### Large Requests Using JSON Streaming

JSON Streaming can be used to optimize the client user experience, so that the client
does not have to wait for the whole check result to complete to render.

In the sample HTTPie request, post the streaming request to the `/checkUrls/stream` endpoint:
```
http --stream POST localhost:8080/checkUrls/stream ...
```

URL check result objects will be streamed continuously, delimited by a newline character `\n`, as they become available.
These can then be rendered immediately. E.g. see the [sample UI](test/jquery_example/public/index.html).

### Sample Front-Ends

- For a programmatic large URL list check, see [test/large_list_check](test/large_list_check), which crawls a markdown page for URLs and checks them via the running link checker service
- For an example of a simple page to check links and display the results using jQuery using the service, see [test/jquery_example](test/jquery_example)


## Configuration

For up-to-date help, check `link-checker-service help` or `link-checker-service help <command>`.
Expand Down Expand Up @@ -134,115 +175,20 @@ regex = "google"
The names of the found patterns will be available in the URL check results.


## Development

### CI

[CI](https://github.com/siemens/link-checker-service/-/pipelines) creates executables for Linux/amd64 and Windows

### Running the Service

```
go run . serve
```

or dockerized, without installing Go:

```
docker-compose up
```

### Using a Custom Configuration

e.g. when a proxy is needed for the HTTP client, see the sample [.link-checker-service.toml](.link-checker-service.toml),
and start the server with the argument: `--config .link-checker-service.toml`

alternatively, set the client proxy via an environment variable: `LCS_PROXY=http://myproxy:8080`

### Running the Tests

```
go test -v ./...
```

### Generating Serializers

```
go generate -v ./...
```


### Load Testing

via [hey](https://github.com/rakyll/hey):

```
hey -m POST -n 10000 -c 300 -T "application/json" -t 30 -D sample_request_body.json http://localhost:8080/checkUrls
```
## Development

where the `-c 300` is the client concurrency setting, and `-n 10000` is the approximate total number of requests to fire.
see [development.md](development.md)

01.09.2020:
## Request Optimization Architecture

```
>hey -m POST -n 10000 -c 200 -T "application/json" -t 30 -D sample_request_body.json http://localhost:8080/checkUrls
Summary:
Total: 0.2867 secs
Slowest: 0.0933 secs
Fastest: 0.0002 secs
Average: 0.0052 secs
Requests/sec: 34879.9936
Total data: 3950000 bytes
Size/request: 395 bytes
Response time histogram:
0.000 [1] |
0.009 [8720] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
0.019 [988] |■■■■■
0.028 [83] |
0.037 [47] |
0.047 [57] |
0.056 [29] |
0.065 [27] |
0.075 [15] |
0.084 [11] |
0.093 [22] |
Latency distribution:
10% in 0.0004 secs
25% in 0.0011 secs
50% in 0.0032 secs
75% in 0.0060 secs
90% in 0.0109 secs
95% in 0.0146 secs
99% in 0.0485 secs
Details (average, fastest, slowest):
DNS+dialup: 0.0004 secs, 0.0002 secs, 0.0933 secs
DNS-lookup: 0.0004 secs, 0.0000 secs, 0.0262 secs
req write: 0.0000 secs, 0.0000 secs, 0.0080 secs
resp wait: 0.0043 secs, 0.0001 secs, 0.0632 secs
resp read: 0.0003 secs, 0.0000 secs, 0.0117 secs
Status code distribution:
[200] 10000 responses
```

### Request Optimization Architecture

```mermaid
graph TD
Request --> Handler[["Handler (parallel)"]]
Handler --> CachedURLChecker[[Cached URL Checker]]
CachedURLChecker --> CCLimitedURLChecker[[Concurrency-limited URL checker]]
DomainRateLimitedChecker
CCLimitedURLChecker --> DomainRateLimitedChecker[[Domain-rate-limited checker]]
DomainRateLimitedChecker --> URLCheckerClient[[HTTP URL Checker Client]]
URLCheckerClient --> URL(URL)
```
![optimization chain](docs/img/optimization-chain.svg)

Rate limiting based on IPs can be turned on in the configuration via a rate specification.
See [ulule/limiter](https://github.com/ulule/limiter).
Expand All @@ -257,29 +203,11 @@ Status code distribution:
[429] 990 responses
```

### Large Requests Using JSON Streaming

JSON Streaming can be used to optimize the client user experience, so that the client
does not have to wait for the whole check result to complete to render.

In the sample HTTPie request, post the streaming request to the `/checkUrls/stream` endpoint:
```
http --stream POST localhost:8080/checkUrls/stream ...
```

URL check result objects will be streamed continuously, delimited by a newline character `\n`, as they become available.
These can then be rendered immediately. E.g. see the [sample UI](test/jquery_example/public/index.html).

### Dependencies
## Dependencies

- Go (1.15)
- see [go.mod](go.mod)

### Samples

- [test/large_list_check](test/large_list_check) - crawls a markdown page for URLs and checks them via the running link checker service
- [test/jquery_example](test/jquery_example) - example usage of the service from a simple page to check links and display the results using jQuery

## Alternatives

the alternatives that are not URL list check web services:
Expand Down
72 changes: 72 additions & 0 deletions development.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
# Development

## Running the Tests

```
go test -v ./...
```

## Generating Serializers

```
go generate -v ./...
```

## Load Testing

via [hey](https://github.com/rakyll/hey):

```
hey -m POST -n 10000 -c 300 -T "application/json" -t 30 -D sample_request_body.json http://localhost:8080/checkUrls
```

where the `-c 300` is the client concurrency setting, and `-n 10000` is the approximate total number of requests to fire.

01.09.2020:

```
>hey -m POST -n 10000 -c 200 -T "application/json" -t 30 -D sample_request_body.json http://localhost:8080/checkUrls
Summary:
Total: 0.2867 secs
Slowest: 0.0933 secs
Fastest: 0.0002 secs
Average: 0.0052 secs
Requests/sec: 34879.9936
Total data: 3950000 bytes
Size/request: 395 bytes
Response time histogram:
0.000 [1] |
0.009 [8720] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
0.019 [988] |■■■■■
0.028 [83] |
0.037 [47] |
0.047 [57] |
0.056 [29] |
0.065 [27] |
0.075 [15] |
0.084 [11] |
0.093 [22] |
Latency distribution:
10% in 0.0004 secs
25% in 0.0011 secs
50% in 0.0032 secs
75% in 0.0060 secs
90% in 0.0109 secs
95% in 0.0146 secs
99% in 0.0485 secs
Details (average, fastest, slowest):
DNS+dialup: 0.0004 secs, 0.0002 secs, 0.0933 secs
DNS-lookup: 0.0004 secs, 0.0000 secs, 0.0262 secs
req write: 0.0000 secs, 0.0000 secs, 0.0080 secs
resp wait: 0.0043 secs, 0.0001 secs, 0.0632 secs
resp read: 0.0003 secs, 0.0000 secs, 0.0117 secs
Status code distribution:
[200] 10000 responses
```
1 change: 1 addition & 0 deletions docs/img/optimization-chain.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit cdf2189

Please sign in to comment.