@@ -23,10 +23,20 @@ $ pip install fasttransform
2323### Transform
2424
2525Transform is a class that lets you create reusable data transformations.
26- It behaves like a function, you can call it to ` encode ` your data. In
27- addition it has an optional ` decode ` method that will reverse the
28- function. And an optional ` setup ` method that can initialize some inner
29- state.
26+ You initialize a Transform by passing in or decorating a raw function.
27+ The Transform then provides an enhanced version of that function via
28+ ` Transform.encodes ` , which can be used in your data pipeline.
29+
30+ It provides various conveniences:
31+
32+ - ** Reversibility** . You can collect the raw function and its inverse
33+ into one transform object.
34+ - ** Customized initialization** You can customize the exact behavior of
35+ a transform function on initialization.
36+ - ** Type-based mulitiple dispatch** . Transforms can specialize their
37+ behavior based on the runtime types of their arguments.
38+ - ** Type conversion/preservation** . Transforms help you maintain desired
39+ return types.
3040
3141The simplest way to create a Transform is by decorating a function:
3242
@@ -36,7 +46,7 @@ from fasttransform import Transform, Pipeline
3646
3747``` python
3848@Transform
39- def add_one (x : int ):
49+ def add_one (x ):
4050 return x + 1
4151
4252# Usage
@@ -45,31 +55,12 @@ add_one(2)
4555
4656 3
4757
48- Transforms are ** flexible** . You can specify multiple transforms with
49- different type annotations and it will automatically pick up the correct
50- one.
51-
52- ``` python
53- def inc1 (x :int ): return x+ 1
54- def inc2 (x :str ): return x+ " a"
55-
56- t = Transform(enc = (inc1,inc2))
57-
58- t(5 ), t(' b' )
59- ```
58+ ### Reversibility
6059
61- (6, 'ba')
62-
63- If an input type does not match any of the type annotations then the
64- original input is returned.
65-
66- ``` python
67- add_one(2.0 )
68- ```
69-
70- 2.0
71-
72- Transforms are ** reversible** , if you provide a ` decode ` function.
60+ To make a transform reversible, you provide the raw function and its
61+ inverse. This is useful in data pipelines where, for instance, you might
62+ want to normalize and then de-normalize numerical values, or encode to
63+ category indexes and then decode back to categories.
7364
7465``` python
7566def enc (x ): return x* 2
@@ -82,148 +73,195 @@ t(2), t.decode(2), t.decode(t(2))
8273
8374 (4, 1, 2)
8475
85- Transforms can be ** stateful** , you can initialize them with the ` setup `
86- method. This may be useful when you want to set scaling parameters based
87- on your training split in your machine learning pipeline.
76+ ### Customized initialization
77+
78+ You can customize an individual Transform instance at initialization
79+ time, so that it can depend on aggregate properties of the data set.
80+
81+ Here we define a z-score normalization Transform by defining ` encodes `
82+ and ` decodes ` methods directly:
8883
8984``` python
85+ import statistics
86+
9087class NormalizeMean (Transform ):
9188 def setups (self , items ):
92- self .mean = sum (items) / len (items)
89+ self .mean = statistics.mean(items)
90+ self .std = statistics.stdev(items)
9391
9492 def encodes (self , x ):
95- return x - self .mean
93+ return ( x - self .mean) / self .std
9694
9795 def decodes (self , x ):
98- return x + self .mean
96+ return x * self .std + self .mean
9997
10098normalize = NormalizeMean()
10199normalize.setup([1 , 2 , 3 , 4 , 5 ])
102100normalize.mean
103101```
104102
105- 3.0
103+ 3
106104
107- ``` python
108- normalize(3.0 )
109- ```
105+ ### Type-based multiple dispatch
110106
111- 0.0
107+ Instead of providing one raw functions, you can provide multiple raw
108+ functions which differ in their parameter types. Tranform will use
109+ type-based dispatch to automatically execute the correct function.
112110
113- Transforms are ** extendedible ** , this may be useful when you want to
114- create one Transform that can handle different data types.
111+ This is handy when your inputs come in different types (eg., different
112+ image formats, different numerical types) .
115113
116114``` python
117- @NormalizeMean
118- def encodes ( self , x : float ): return x + self .mean + 5
115+ def inc1 ( x : int ): return x + 1
116+ def inc2 ( x : str ): return x+ " a "
119117
120- @NormalizeMean
121- def decodes (self , x :float ): return x + self .mean + 5
118+ t = Transform(enc = (inc1,inc2))
122119
123- normalize( 2.0 )
120+ t( 5 ), t( ' b ' )
124121```
125122
126- 10.0
127-
128- Transforms try to be ** type preserving** in the following order:
129-
130- 1 . your function’s return type annotation
131- 2 . your function’s actual input type, if it was a subtype of the return
132- value
133- 3 . if None is the return type annotation then no conversion will be
134- done
123+ (6, 'ba')
135124
136- Let’s illustrate this with an example of a custom ` float ` subtype:
125+ If an input type does not match any of the type annotations then the
126+ original input is returned.
137127
138128``` python
139- class FS (float ):
140- def __repr__ (self ): return f ' FS( { float (self )} ) '
129+ add_one(2.0 )
141130```
142131
143- By default multiplying such a subtype with a regular ` float ` returns a
144- ` float ` .
132+ 3.0
145133
146134``` python
147- FS( 5 .0 ) * 5.0
135+ normalize( 3 .0 )
148136```
149137
150- 25.0
138+ 0.0
139+
140+ ### Type conversion/preservation
141+
142+ You initialize a Transform by passing in or decorating a raw function.
143+
144+ A Transform ` encodes ` or ` decodes ` will note the return type of its raw
145+ function, which may be defined explicitly or implicitly, and enhance
146+ type-handling behavior in three ways:
147+
148+ 1 . ** Guaranteed return type** . It will always return the return type of
149+ the raw function, promoting values if necessary.
150+
151+ 2 . ** Type Preservation** . It will return the runtime type of its
152+ argument, whenever that is a subtype of the return type.
153+
154+ 3 . ** Opt-out conversion** . If you explicitly mark the raw function’s
155+ return type as ` None ` , then it will not perform any type conversion
156+ or preservation.
157+
158+ Examples help make this clear:
151159
152- However, in Transform you can change this behavior with type
153- annotations.
160+ #### Guaranteed return type
154161
155- Illustration of case 1:
162+ Say you define ` FS ` , a subclass of ` float ` . The usual Python type
163+ promotion behavior means that an ` FS ` times a ` float ` is still a
164+ ` float ` :
156165
157166``` python
158- def enc (x )->FS : return x* 2
159- t = Transform(enc)
160- t(1 )
167+ class FS (float ):
168+ def __repr__ (self ): return f ' FS( { float (self )} ) '
169+
170+ f1 = float (1 )
171+ FS2 = FS(2 )
172+
173+ val = f1 * FS2
174+ type (val) # => float
161175```
162176
163- FS(2.0)
177+ float
164178
165- Illustration of case 2:
179+ With Transform, you can define a new multiplication operation which will
180+ be guaranteed to return a ` FS ` , because Transform reads the required raw
181+ function’s annotated return type:
166182
167183``` python
168- def enc (x ): return x* 2
169- t = Transform(enc)
170- t(FS(1 ))
184+ def double_FS (x )->FS : return FS(2 )* x
185+ t = Transform(double_FS)
186+ val = t(1 )
187+ assert isinstance (val,FS )
188+ val
171189```
172190
173191 FS(2.0)
174192
175- Note that in the case below, where the input is a ` float ` and the return
176- type is ` FS ` there’s not conversion. The reason is: we can’t make sure
177- some special information about ` FS ` is lost when converting to its
178- parent class ` float ` .
193+ #### Type preservation
194+
195+ Let us say that we define a transform * without* any return type
196+ annotation, so that the raw function is defined only by the behavior of
197+ multiplying its argument by the float 2.0.
198+
199+ Multiplying the subtype ` FS ` with the float value 2 would normally
200+ return a ` float ` . However, Transform’s ` encodes ` will * preserve the
201+ runtime type of its argument* , so that it returns ` FS ` :
179202
180203``` python
181- def enc (x ): return FS(x* 2 )
182- t = Transform(enc)
183- t(1.0 )
204+ def double (x ): return x* 2.0 # no type annotation
205+ t = Transform(double)
206+ fs1 = FS(1 )
207+ val = t(fs1)
208+ assert isinstance (val,FS )
209+ val # => FS(2), an FS value of 2
184210```
185211
186212 FS(2.0)
187213
188- Illustration of case 3:
214+ #### Opt-out conversion
215+
216+ Sometimes you don’t want Transform to do any type-based logic. You can
217+ opt-out of this system by declaring that your raw function’s return type
218+ is ` None ` :
189219
190220``` python
191- def enc (x )->None : return x* 2
192- t = Transform(enc)
193- t(FS(1 ))
221+ def double_none (x ) -> None : return x* 2.0 # "None" returnt type means "no conversion"
222+ t = Transform(double_none)
223+ fs1 = FS(1 )
224+ val = t(fs1)
225+ assert isinstance (val,float )
226+ val # => 2.0, a float of 2, because of fallback to standard Python type logic
194227```
195228
196229 2.0
197230
198- In the last case we see a ` float ` because a mutiplication of ` FS ` with a
199- ` float ` returns a ` float ` and no additional type conversion is done.
200-
201231### Pipelines
202232
203233Transforms can be combined into larger ** Pipelines** :
204234
205235``` python
206- p = Pipeline((t, normalize))
236+ def double (x ): return x* 2.0
237+ def halve (x ): return x/ 2.0
238+ dt = Transform(double,halve)
207239
208- p(5 ) # 5 * 2 - 3
209- ```
240+ class NormalizeMean (Transform ):
241+ def setups (self , items ):
242+ self .mean = statistics.mean(items)
243+ self .std = statistics.stdev(items)
244+
245+ def encodes (self , x ):
246+ return (x - self .mean) / self .std
247+
248+ def decodes (self , x ):
249+ return x * self .std + self .mean
210250
211- 7.0
212251
213- ``` python
214- p.decode(7 ) # (7 + 3) / 2
252+ p = Pipeline((dt, normalize))
253+
254+ v = p(5 )
255+ v
215256```
216257
217- 10.0
258+ 4.427188724235731
218259
219- If you’re wondering the types are changing from ` int ` to ` float ` in this
220- case:
260+ ``` python
261+ p.decode(v)
262+ ```
221263
222- ` self.mean ` in the ` NormalizeMean ` transform is a ` float ` . So the
223- automatic type conversion does not trigger here, as ` float ` is not a
224- subtype of ` int ` . And that’s probably a good thing, because otherwise we
225- might lose some information here whenever ` self.mean ` has some decimal
226- value.
264+ 5.0
227265
228266### Documentation
229267
0 commit comments