Skip to content

Commit 0b3911c

Browse files
committed
Rewrite index page
1 parent 15a5fa8 commit 0b3911c

File tree

3 files changed

+260
-256
lines changed

3 files changed

+260
-256
lines changed

README.md

Lines changed: 140 additions & 102 deletions
Original file line numberDiff line numberDiff line change
@@ -23,10 +23,20 @@ $ pip install fasttransform
2323
### Transform
2424

2525
Transform is a class that lets you create reusable data transformations.
26-
It behaves like a function, you can call it to `encode` your data. In
27-
addition it has an optional `decode` method that will reverse the
28-
function. And an optional `setup` method that can initialize some inner
29-
state.
26+
You initialize a Transform by passing in or decorating a raw function.
27+
The Transform then provides an enhanced version of that function via
28+
`Transform.encodes`, which can be used in your data pipeline.
29+
30+
It provides various conveniences:
31+
32+
- **Reversibility**. You can collect the raw function and its inverse
33+
into one transform object.
34+
- **Customized initialization** You can customize the exact behavior of
35+
a transform function on initialization.
36+
- **Type-based mulitiple dispatch**. Transforms can specialize their
37+
behavior based on the runtime types of their arguments.
38+
- **Type conversion/preservation**. Transforms help you maintain desired
39+
return types.
3040

3141
The simplest way to create a Transform is by decorating a function:
3242

@@ -36,7 +46,7 @@ from fasttransform import Transform, Pipeline
3646

3747
``` python
3848
@Transform
39-
def add_one(x: int):
49+
def add_one(x):
4050
return x + 1
4151

4252
# Usage
@@ -45,31 +55,12 @@ add_one(2)
4555

4656
3
4757

48-
Transforms are **flexible**. You can specify multiple transforms with
49-
different type annotations and it will automatically pick up the correct
50-
one.
51-
52-
``` python
53-
def inc1(x:int): return x+1
54-
def inc2(x:str): return x+"a"
55-
56-
t = Transform(enc=(inc1,inc2))
57-
58-
t(5), t('b')
59-
```
58+
### Reversibility
6059

61-
(6, 'ba')
62-
63-
If an input type does not match any of the type annotations then the
64-
original input is returned.
65-
66-
``` python
67-
add_one(2.0)
68-
```
69-
70-
2.0
71-
72-
Transforms are **reversible**, if you provide a `decode` function.
60+
To make a transform reversible, you provide the raw function and its
61+
inverse. This is useful in data pipelines where, for instance, you might
62+
want to normalize and then de-normalize numerical values, or encode to
63+
category indexes and then decode back to categories.
7364

7465
``` python
7566
def enc(x): return x*2
@@ -82,148 +73,195 @@ t(2), t.decode(2), t.decode(t(2))
8273

8374
(4, 1, 2)
8475

85-
Transforms can be **stateful**, you can initialize them with the `setup`
86-
method. This may be useful when you want to set scaling parameters based
87-
on your training split in your machine learning pipeline.
76+
### Customized initialization
77+
78+
You can customize an individual Transform instance at initialization
79+
time, so that it can depend on aggregate properties of the data set.
80+
81+
Here we define a z-score normalization Transform by defining `encodes`
82+
and `decodes` methods directly:
8883

8984
``` python
85+
import statistics
86+
9087
class NormalizeMean(Transform):
9188
def setups(self, items):
92-
self.mean = sum(items) / len(items)
89+
self.mean = statistics.mean(items)
90+
self.std = statistics.stdev(items)
9391

9492
def encodes(self, x):
95-
return x - self.mean
93+
return (x - self.mean) / self.std
9694

9795
def decodes(self, x):
98-
return x + self.mean
96+
return x * self.std + self.mean
9997

10098
normalize = NormalizeMean()
10199
normalize.setup([1, 2, 3, 4, 5])
102100
normalize.mean
103101
```
104102

105-
3.0
103+
3
106104

107-
``` python
108-
normalize(3.0)
109-
```
105+
### Type-based multiple dispatch
110106

111-
0.0
107+
Instead of providing one raw functions, you can provide multiple raw
108+
functions which differ in their parameter types. Tranform will use
109+
type-based dispatch to automatically execute the correct function.
112110

113-
Transforms are **extendedible**, this may be useful when you want to
114-
create one Transform that can handle different data types.
111+
This is handy when your inputs come in different types (eg., different
112+
image formats, different numerical types).
115113

116114
``` python
117-
@NormalizeMean
118-
def encodes(self, x:float): return x + self.mean + 5
115+
def inc1(x:int): return x+1
116+
def inc2(x:str): return x+"a"
119117

120-
@NormalizeMean
121-
def decodes(self, x:float): return x + self.mean + 5
118+
t = Transform(enc=(inc1,inc2))
122119

123-
normalize(2.0)
120+
t(5), t('b')
124121
```
125122

126-
10.0
127-
128-
Transforms try to be **type preserving** in the following order:
129-
130-
1. your function’s return type annotation
131-
2. your function’s actual input type, if it was a subtype of the return
132-
value
133-
3. if None is the return type annotation then no conversion will be
134-
done
123+
(6, 'ba')
135124

136-
Let’s illustrate this with an example of a custom `float` subtype:
125+
If an input type does not match any of the type annotations then the
126+
original input is returned.
137127

138128
``` python
139-
class FS(float):
140-
def __repr__(self): return f'FS({float(self)})'
129+
add_one(2.0)
141130
```
142131

143-
By default multiplying such a subtype with a regular `float` returns a
144-
`float`.
132+
3.0
145133

146134
``` python
147-
FS(5.0) * 5.0
135+
normalize(3.0)
148136
```
149137

150-
25.0
138+
0.0
139+
140+
### Type conversion/preservation
141+
142+
You initialize a Transform by passing in or decorating a raw function.
143+
144+
A Transform `encodes` or `decodes` will note the return type of its raw
145+
function, which may be defined explicitly or implicitly, and enhance
146+
type-handling behavior in three ways:
147+
148+
1. **Guaranteed return type**. It will always return the return type of
149+
the raw function, promoting values if necessary.
150+
151+
2. **Type Preservation**. It will return the runtime type of its
152+
argument, whenever that is a subtype of the return type.
153+
154+
3. **Opt-out conversion**. If you explicitly mark the raw function’s
155+
return type as `None`, then it will not perform any type conversion
156+
or preservation.
157+
158+
Examples help make this clear:
151159

152-
However, in Transform you can change this behavior with type
153-
annotations.
160+
#### Guaranteed return type
154161

155-
Illustration of case 1:
162+
Say you define `FS`, a subclass of `float`. The usual Python type
163+
promotion behavior means that an `FS` times a `float` is still a
164+
`float`:
156165

157166
``` python
158-
def enc(x)->FS: return x*2
159-
t = Transform(enc)
160-
t(1)
167+
class FS(float):
168+
def __repr__(self): return f'FS({float(self)})'
169+
170+
f1 = float(1)
171+
FS2 = FS(2)
172+
173+
val = f1 * FS2
174+
type(val) # => float
161175
```
162176

163-
FS(2.0)
177+
float
164178

165-
Illustration of case 2:
179+
With Transform, you can define a new multiplication operation which will
180+
be guaranteed to return a `FS`, because Transform reads the required raw
181+
function’s annotated return type:
166182

167183
``` python
168-
def enc(x): return x*2
169-
t = Transform(enc)
170-
t(FS(1))
184+
def double_FS(x)->FS: return FS(2)*x
185+
t = Transform(double_FS)
186+
val = t(1)
187+
assert isinstance(val,FS)
188+
val
171189
```
172190

173191
FS(2.0)
174192

175-
Note that in the case below, where the input is a `float` and the return
176-
type is `FS` there’s not conversion. The reason is: we can’t make sure
177-
some special information about `FS` is lost when converting to its
178-
parent class `float`.
193+
#### Type preservation
194+
195+
Let us say that we define a transform *without* any return type
196+
annotation, so that the raw function is defined only by the behavior of
197+
multiplying its argument by the float 2.0.
198+
199+
Multiplying the subtype `FS` with the float value 2 would normally
200+
return a `float`. However, Transform’s `encodes` will *preserve the
201+
runtime type of its argument*, so that it returns `FS`:
179202

180203
``` python
181-
def enc(x): return FS(x*2)
182-
t = Transform(enc)
183-
t(1.0)
204+
def double(x): return x*2.0 # no type annotation
205+
t = Transform(double)
206+
fs1 = FS(1)
207+
val = t(fs1)
208+
assert isinstance(val,FS)
209+
val # => FS(2), an FS value of 2
184210
```
185211

186212
FS(2.0)
187213

188-
Illustration of case 3:
214+
#### Opt-out conversion
215+
216+
Sometimes you don’t want Transform to do any type-based logic. You can
217+
opt-out of this system by declaring that your raw function’s return type
218+
is `None`:
189219

190220
``` python
191-
def enc(x)->None: return x*2
192-
t = Transform(enc)
193-
t(FS(1))
221+
def double_none(x) -> None: return x*2.0 # "None" returnt type means "no conversion"
222+
t = Transform(double_none)
223+
fs1 = FS(1)
224+
val = t(fs1)
225+
assert isinstance(val,float)
226+
val # => 2.0, a float of 2, because of fallback to standard Python type logic
194227
```
195228

196229
2.0
197230

198-
In the last case we see a `float` because a mutiplication of `FS` with a
199-
`float` returns a `float` and no additional type conversion is done.
200-
201231
### Pipelines
202232

203233
Transforms can be combined into larger **Pipelines**:
204234

205235
``` python
206-
p = Pipeline((t, normalize))
236+
def double(x): return x*2.0
237+
def halve(x): return x/2.0
238+
dt = Transform(double,halve)
207239

208-
p(5) # 5 * 2 - 3
209-
```
240+
class NormalizeMean(Transform):
241+
def setups(self, items):
242+
self.mean = statistics.mean(items)
243+
self.std = statistics.stdev(items)
244+
245+
def encodes(self, x):
246+
return (x - self.mean) / self.std
247+
248+
def decodes(self, x):
249+
return x * self.std + self.mean
210250

211-
7.0
212251

213-
``` python
214-
p.decode(7) # (7 + 3) / 2
252+
p = Pipeline((dt, normalize))
253+
254+
v = p(5)
255+
v
215256
```
216257

217-
10.0
258+
4.427188724235731
218259

219-
If you’re wondering the types are changing from `int` to `float` in this
220-
case:
260+
``` python
261+
p.decode(v)
262+
```
221263

222-
`self.mean` in the `NormalizeMean` transform is a `float`. So the
223-
automatic type conversion does not trigger here, as `float` is not a
224-
subtype of `int`. And that’s probably a good thing, because otherwise we
225-
might lose some information here whenever `self.mean` has some decimal
226-
value.
264+
5.0
227265

228266
### Documentation
229267

0 commit comments

Comments
 (0)