Skip to content

Commit 11346a6

Browse files
Merge pull request #39 from opendatagroup/1.8-doc-updates
1.8 doc updates
2 parents c9ec780 + 9ca814d commit 11346a6

File tree

14 files changed

+983
-135
lines changed

14 files changed

+983
-135
lines changed

Archived/Product Documentation/Java Model Runner.md

+8-10
Original file line numberDiff line numberDiff line change
@@ -14,14 +14,14 @@ This page describes how to load and run models in each of these cases.
1414

1515
## Generic Java models
1616

17-
A generic Java model can execute arbitrary Java code. In order to run this model in FastScore, it must implement a particular model interface: the `IJavaModel` interface. This interface includes `begin`, `action`, and `end` methods, analogous to Python and R models.
17+
A generic Java model can execute arbitrary Java code. In order to run this model in FastScore, it must implement a particular model interface: the `FastScoreModel` interface. This interface includes `begin`, `action`, and `end` methods, analogous to Python and R models. Note that only the `action` method is required, therefore the rest of the methods will need to be overridden.
1818

1919
``` java
20-
import fastscore.IJavaModel;
20+
import fastscore.FastScoreModel;
2121

22-
public class MyModel implements IJavaModel
22+
public class MyModel implements FastScoreModel
2323
{
24-
24+
@Override
2525
public void begin()
2626
{
2727
...
@@ -32,6 +32,7 @@ public class MyModel implements IJavaModel
3232
...
3333
}
3434

35+
@Override
3536
public void end()
3637
{
3738
...
@@ -76,7 +77,7 @@ A Spark model must follow the same conformance guidelines as a generic Java mode
7677
Here is an example Spark model that assumes that the `LogisticRegressionModel` was previously created and saved under the `scalaLogisticRegressionWithBFGSModel` folder and then uploaded to FastScore as an attachment.
7778
7879
``` java
79-
import fastscore.IJavaModel;
80+
import fastscore.FastScoreModel;
8081
import org.apache.spark.SparkConf;
8182
import org.apache.spark.SparkContext;
8283
import org.apache.spark.sql.SparkSession;
@@ -86,14 +87,15 @@ import org.apache.spark.mllib.classification.LogisticRegressionModel;
8687
import org.apache.spark.mllib.linalg.Vector;
8788
import org.apache.spark.mllib.linalg.Vectors;
8889
89-
public class MLLibLRModel implements IJavaModel {
90+
public class MLLibLRModel implements FastScoreModel {
9091
9192
LogisticRegressionModel _lrModel;
9293
9394
public MLLibLRModel() {
9495
System.out.println("MLLib Linear Regression model");
9596
}
9697
98+
@Override
9799
public void begin() {
98100
SparkConf conf = new SparkConf();
99101
conf.setAppName("ML Lib LR Model");
@@ -120,10 +122,6 @@ public class MLLibLRModel implements IJavaModel {
120122
}
121123
122124
}
123-
124-
public void end() {
125-
126-
}
127125
}
128126
```
129127

Original file line numberDiff line numberDiff line change
@@ -0,0 +1,122 @@
1+
---
2+
title: "Time Streams"
3+
excerpt: "A stream that delivers timestamps instead of data"
4+
---
5+
# Time Streams
6+
7+
## Overview
8+
9+
Many analytic models have a notion of time. Such models may receive time-related
10+
values as a part of their inputs. Or, they obtain such values using standard
11+
library calls. The latter adds an external dependency to the model. FastScore
12+
cannot control the time value the model gets from the system. The time
13+
streams proposed here remove the dependency.
14+
15+
A time stream feeds timestamps to the model according to the settings in its
16+
stream descriptor. Time streams enable the following two important use cases:
17+
18+
* A 'fake' time for simulations, verification, and model training;
19+
* Periodic model activations without actual data.
20+
21+
## A time stream descriptor
22+
23+
An example of a time stream descriptor:
24+
``` json
25+
{
26+
"Transport": {
27+
"Type": "time",
28+
"Period: 2.0
29+
},
30+
"Schema": {
31+
"Type": "long",
32+
"logicalType": "timestamp-millis"
33+
}
34+
}
35+
```
36+
37+
The above stream delivers timestamps to the model every 2s.
38+
39+
A Transport element of the a stream supports the following properties:
40+
41+
Property | Type | Required | Default | Description
42+
---------|------|----------|---------|------------
43+
Type | string | Yes | | Set to "time" or "Time"
44+
TimeZero | string or null | No | null | The beginning of simulated time (iso8601)
45+
Delay | number | No | 0.0 | Wait this number of seconds before sending the first timestamp
46+
Period | number | No | 1.0 | Time between timestamps in seconds
47+
MaxCount | integer or null | No | null | Generate no more than this number of timestamps
48+
Overflow | string | No | "all" | Out-of-sync timestamps (either "skip" or "all")
49+
50+
TimeZero controls the simulated time. The difference between normal and
51+
simulated times is calucated at the stream instantiation. The model will receive
52+
the following timestamps: TimeZero + Delay, TimeZero + Delay + Period,... If
53+
TimeZero is omitted or set to null, the time stream uses the current time.
54+
55+
The number of timestamps delivered to the model can be capped using MaxCount
56+
property. After the stream generates this many timestamps it signals EOF.
57+
58+
If the model is slow it may not be able to process all timestamps on time. The
59+
stream behaviour with respect to the out-of-sync timestamps depends on the
60+
Overflow property. If Overflow is "all", stream delivers all timestamp
61+
regardless of their timeliness. This may result in batches of timestamps
62+
delivered at the same time. If Overflow is "skip", only timely timestamps are
63+
delivered. Skipped timestamps produce a warning message.
64+
65+
The Transport element may be set to "time" to assume default values for all
66+
properties.
67+
68+
As with any boundary-preserving stream, the Envelope property of a time stream
69+
must be either omitted or set to null. The Encoding property must be either
70+
omitted or set to "bert".
71+
72+
The Avro schema has a special logical type for timestamp. Or, rather two such
73+
types: one for millisecond --- and another for microsecond resolution. We only
74+
support millisecond timestamps (timestamp-millis).
75+
76+
The time stream must not use batching. A timestamp must be delivered to the
77+
model immediately without buffering. Thus, Batching must be omitted or set to
78+
null.
79+
80+
The simplest valid time stream descriptor looks as follows:
81+
``` json
82+
{
83+
"Transport": "time"
84+
}
85+
```
86+
87+
It is equivalent to the following stream descriptor:
88+
``` json
89+
{
90+
"Transport": {
91+
"Type": "time",
92+
"TimeZero": null, // normal time
93+
"Delay": 0.0, // no delay
94+
"Period": 1.0, // every 1s
95+
"MaxCount": null, // indefinite length
96+
"Overflow": "all" // deliver out-of-sync timestamps
97+
},
98+
"Envelope": null, // no envelope
99+
"Encoding": "bert", // internal encoding
100+
"Schema": {
101+
"type": "long",
102+
"logicalType": "timestamp-millis"
103+
},
104+
"Batching": null // no batching
105+
}
106+
```
107+
108+
Time streams are input only.
109+
110+
## Other considerations
111+
112+
There is a somewhat similar situation with the access to a random number
113+
generator. A model verification may need to 'replay' the sequence of random
114+
numbers used by the model. Or, the model may need cryptographically-strong
115+
random numbers from a hardware source. The mapping to the FastScore stream
116+
concept is less obvious here. The practical workaround could be to seed RNG
117+
using a timestamp provided by a simulated time stream.
118+
119+
## More info
120+
121+
TODO
122+

Getting Started/FastScore Specs/index.md

+40-39
Original file line numberDiff line numberDiff line change
@@ -9,42 +9,43 @@
99
| H2O || | CPU utilization (data deserialization) ||
1010
| Matlab || | Sensors ||
1111
| C || | Default sensors installed ||
12-
| | | | Dashboard sensor support ||
13-
| **Certified Deployment Options** | | | | |
14-
| Linux || | **Workflow, Concurrency, Scaling, etc** | |
15-
| AWS || | Single model complex analytic workflows ||
16-
| On-premise || | Multi-model complex analytic workflows ||
17-
| Private Cloud || | Single machine scaling ||
18-
| Public Cloud || | Infrastructure Scaling (multi-server, cloud, etc) ||
19-
| Azure || | Intra-engine concurrecy ||
20-
| Google Cloud || | Multi-engine concurrency ||
21-
| MacOS || | Model state persistence checkpointing ||
22-
| | | | Model state staring ||
23-
| **Data Source Types** | | | Multiple input/output streams ||
24-
| REST || | | |
25-
| Kafka || | **Third Party Orchestrators** ||
26-
| File || | Mesos/Marathon/DCOS ||
27-
| ODBC || | Swarm ||
28-
| HTTP || | Kubernetes ||
29-
| Experimental (TCP/UDP/Exec) || | | |
30-
| Kafka (Authenticated) || | **Model Management and AnalyticOps** | |
31-
| S3 (Authenticated) || | Store/Edit/Select Models ||
32-
| | | | Store/Edit/Select Streams ||
33-
| **Schema Definition Formats** | | | Store/Edit/Select Schemas ||
34-
| Avro Schema || | | |
35-
| Avro Schema Extensions (Restrictions) || | **Machine Learning Integration** | |
36-
| | | | R [ R ] ||
37-
| **Data Encoding Formats** | | | scikit-learn [ Python ] ||
38-
| Raw || | ml.lib [POJO ] ||
39-
| JSON || | H2O [POJO] ||
40-
| Avro-binary || | Tensorflow [ Python, R ] ||
41-
| UTF-8 || | | |
42-
| SOAP/RPC || | **Integration and Management Interfaces** | |
43-
| | | | RESTful API ||
44-
| **Environment Management** | | | GUI Dashboard ||
45-
| Import Policy || | CLI ||
46-
| | | | Model deploy Jupyter ||
47-
| **FastScore SDK** | | | | |
48-
| Python 2 || | **Authentication and Access Control** | |
49-
| Python 3 || | LDAP Authentication ||
50-
| Scala/Java || | Dashboard LDAP Authentication ||
12+
| Scala || | Dashboard sensor support ||
13+
| | | | | |
14+
| **Certified Deployment Options** | | | **Workflow, Concurrency, Scaling, etc** | |
15+
| Linux || | Single model complex analytic workflows ||
16+
| AWS || | Multi-model complex analytic workflows ||
17+
| On-premise || | Single machine scaling ||
18+
| Private Cloud || | Infrastructure Scaling (multi-server, cloud, etc) ||
19+
| Public Cloud || | Intra-engine concurrecy ||
20+
| Azure || | Multi-engine concurrency ||
21+
| Google Cloud || | Model state persistence checkpointing ||
22+
| MacOS || | Model state staring ||
23+
| | | | Multiple input/output streams ||
24+
| **Data Source Types** | | | | |
25+
| REST || | **Third Party Orchestrators** ||
26+
| Kafka || | Mesos/Marathon/DCOS ||
27+
| File || | Swarm ||
28+
| ODBC || | Kubernetes ||
29+
| HTTP || | | |
30+
| Experimental (TCP/UDP/Exec) || | **Model Management and AnalyticOps** | |
31+
| Kafka (Authenticated) || | Store/Edit/Select Models ||
32+
| S3 (Authenticated) || | Store/Edit/Select Streams ||
33+
| | | | Store/Edit/Select Schemas ||
34+
| **Schema Definition Formats** | | | | |
35+
| Avro Schema || | **Machine Learning Integration** | |
36+
| Avro Schema Extensions (Restrictions) || | R [ R ] ||
37+
| | | | scikit-learn [ Python ] ||
38+
| **Data Encoding Formats** | | | ml.lib [POJO ] ||
39+
| Raw || | H2O [POJO] ||
40+
| JSON || | Tensorflow [ Python, R ] ||
41+
| Avro-binary || | | |
42+
| UTF-8 || | **Integration and Management Interfaces** | |
43+
| SOAP/RPC || | RESTful API ||
44+
| | | | GUI Dashboard ||
45+
| **Environment Management** | | | CLI ||
46+
| Import Policy || | Model deploy Jupyter ||
47+
| | | | | |
48+
| **FastScore SDK** | | | **Authentication and Access Control** | |
49+
| Python 2 || | LDAP Authentication ||
50+
| Python 3 || | Dashboard LDAP Authentication ||
51+
| Scala/Java || | |

0 commit comments

Comments
 (0)