Skip to content

Commit 4ac13b5

Browse files
committed
chapter 04 revised
1 parent abab60a commit 4ac13b5

File tree

9 files changed

+4245
-1003
lines changed

9 files changed

+4245
-1003
lines changed

01_machine_learning_for_trading/README.md

+9
Original file line numberDiff line numberDiff line change
@@ -169,6 +169,15 @@ ML extracts signals from a wide range of market, fundamental, and alternative da
169169
- [](http://deeplearning.ai/), Andrew Ng
170170
- Andrew Ng’s introductory deep learning course
171171

172+
### ML Competitions & Trading
173+
174+
- [IEEE Investment Ranking Challenge](https://www.crowdai.org/challenges/ieee-investment-ranking-challenge)
175+
- [Investment Ranking Challenge : Identifying the best performing stocks based on their semi-annual returns](https://arxiv.org/pdf/1906.08636.pdf)
176+
- [Two Sigma Financial Modeling Challenge](https://www.kaggle.com/c/two-sigma-financial-modeling)
177+
- [Two Sigma: Using News to Predict Stock Movements](https://www.kaggle.com/c/two-sigma-financial-news)
178+
- [The Winton Stock Market Challenge](https://www.kaggle.com/c/the-winton-stock-market-challenge)
179+
- [Algorithmic Trading Challenge](https://www.kaggle.com/c/AlgorithmicTradingChallenge)
180+
172181
### Python Libraries
173182

174183
- matplotlib [docs]( <https://github.com/matplotlib/matplotlib)

04_alpha_factor_research/00_alpha_factors_in_practice/feature_engineering.ipynb

+478-444
Large diffs are not rendered by default.

04_alpha_factor_research/00_alpha_factors_in_practice/how_to_use_talib.ipynb

+38-27
Large diffs are not rendered by default.

04_alpha_factor_research/00_alpha_factors_in_practice/kalman_filter_and_wavelets.ipynb

+31-67
Large diffs are not rendered by default.

04_alpha_factor_research/README.md

+42-11
Original file line numberDiff line numberDiff line change
@@ -1,25 +1,25 @@
1-
# Chapter 04: Alpha Factor Research & Evaluation
1+
# Chapter 04: Financial Feature Engineering: How to research Alpha Factors
22

33
Alpha factors aim to predict the price movements of assets in the investment universe based on the available market, fundamental, or alternative data. A factor may combine one or several input variables, but assumes a single value for each asset every time the strategy evaluates the factor.
44

55
Trade decisions typically rely on relative values across assets. Trading strategies are often based on signals emitted by multiple factors, and we will see that machine learning (ML) models are particularly well suited to integrate the various signals efficiently to make more accurate predictions.
66

77
This chapter provides a framework for understanding how factors work and how to measure their performance, for example using the information coefficient (IC). It demonstrates how to engineer alpha factors from data using Python libraries offline and on the Quantopian platform. It also introduces the `zipline` library to backtest factors and the `alphalens` library to evaluate their predictive power. More specifically, this chapter covers:
88

9-
- How to characterize, justify and measure key types of alpha factors
10-
- How to create alpha factors using financial feature engineering
11-
- How to use `zipline` offline to test individual alpha factors
12-
- How to use `zipline` on Quantopian to combine alpha factors and identify more sophisticated signals
9+
- Which categories of factors exist, why they work and how to measure them
10+
- How to create alpha factors using numpy, pandas, and talib
11+
- How to denoise data using wavelets and the Kalman filter
12+
- How to use zipline offline and on Quantopian to test individual and multiple alpha factors
1313
- How the information coefficient (IC) measures an alpha factor's predictive performance
14-
- How to use `alphalens` to evaluate predictive performance and turnover
14+
- How to use alphalens to evaluate predictive performance and turnover using, among other metrics, the information coefficient (IC)
1515

16-
## Engineering Alpha Factor
16+
## Alpha Factors in practice: from data to signals
1717

1818
Alpha factors are transformations of market, fundamental, and alternative data that contain predictive signals. They are designed to capture risks that drive asset returns. One set of factors describes fundamental, economy-wide variables such as growth, inflation, volatility, productivity, and demographic risk. Another set consists of tradeable investment styles such as the market portfolio, value-growth investing, and momentum investing.
1919

2020
There are also factors that explain price movements based on the economics or institutional setting of financial markets, or investor behavior, including known biases of this behavior. The economic theory behind factors can be rational, where the factors have high returns over the long run to compensate for their low returns during bad times, or behavioral, where factor risk premiums result from the possibly biased, or not entirely rational behavior of agents that is not arbitraged away.
2121

22-
### Important Factor Categories
22+
### On the shoulders of giants: meet the factor establishment
2323

2424
In an idealized world, categories of risk factors should be independent of each other (orthogonal), yield positive risk premia, and form a complete set that spans all dimensions of risk and explains the systematic risks for assets in a given class. In practice, these requirements will hold only approximately.
2525

@@ -31,9 +31,36 @@ In an idealized world, categories of risk factors should be independent of each
3131
- [Anomalies and Market Efficiency](https://www.nber.org/papers/w9277.pdf) by G. William Schwert25 (Ch. 15 in Handbook of the- "Economics of Finance", by Constantinides, Harris, and Stulz, 2003)
3232
- [Investor Psychology and Asset Pricing](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=265132), by David Hirshleifer (2001)
3333

34-
### How to transform Data into Factors
34+
## Engineering alpha factors that predict returns
35+
36+
37+
38+
### How to engineer factors using pandas and NumPy
3539

3640
- The notebook [feature_engineering.ipynb](00_data/feature_engineering.ipynb) in the [data](00_data) directory illustrates how to engineer basic factors.
41+
- [Fama French](https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html) Data Library
42+
- [numpy](https://numpy.org/) website
43+
- [Quickstart Tutorial](https://numpy.org/devdocs/user/quickstart.html)
44+
- [pandas](https://pandas.pydata.org/) website
45+
- [User Guide](https://pandas.pydata.org/docs/user_guide/index.html)
46+
- [10 minutes to pandas](https://pandas.pydata.org/pandas-docs/stable/getting_started/10min.html)
47+
- [Python Pandas Tutorial: A Complete Introduction for Beginners](https://www.learndatasci.com/tutorials/python-pandas-tutorial-complete-introduction-for-beginners/)
48+
- [alphatools](https://github.com/marketneutral/alphatools) - Quantitative finance research tools in Python
49+
- [mlfinlab](https://github.com/hudson-and-thames/mlfinlab) - Package based on the work of Dr Marcos Lopez de Prado regarding his research with respect to Advances in Financial Machine Learning
50+
51+
#### How to denoise your Alpha Factors with the Kalman Filter
52+
53+
- [PyKalman](https://pykalman.github.io/) documentation
54+
- [Tutorial: The Kalman Filter](http://web.mit.edu/kirtley/kirtley/binlustuff/literature/control/Kalman%20filter.pdf)
55+
- [Understanding and Applying Kalman Filtering](http://biorobotics.ri.cmu.edu/papers/sbp_papers/integrated3/kleeman_kalman_basics.pdf)
56+
- [How a Kalman filter works, in pictures](https://www.bzarg.com/p/how-a-kalman-filter-works-in-pictures/)
57+
58+
#### How to preprocess your noisy signals using Wavelets
59+
60+
- [PyWavelets](https://pywavelets.readthedocs.io/en/latest/) - Wavelet Transforms in Python
61+
- [An Introduction to Wavelets](https://www.eecis.udel.edu/~amer/CISC651/IEEEwavelet.pdf)
62+
- [The Wavelet Tutorial](http://web.iitd.ac.in/~sumeet/WaveletTutorial.pdf)
63+
- [Wavelets for Kids](http://www.gtwavelet.bme.gatech.edu/wp/kidsA.pdf)
3764

3865
#### References
3966

@@ -44,11 +71,15 @@ In an idealized world, categories of risk factors should be independent of each
4471
- [Spearman Rank Correlation](https://statistics.laerd.com/statistical-guides/spearmans-rank-order-correlation-statistical-guide.php)
4572

4673

47-
## Seeking Signals - How to use `zipline`
74+
## From signals to trades: backtesting with `zipline`
4875

4976
The open source [zipline](http://www.zipline.io/index.html) library is an event-driven backtesting system maintained and used in production by the crowd-sourced quantitative investment fund [Quantopian](https://www.quantopian.com/) to facilitate algorithm-development and live-trading. It automates the algorithm's reaction to trade events and provides it with current and historical point-in-time data that avoids look-ahead bias.
5077

51-
- `zipline` installation: see [docs](http://www.zipline.io/index.html) and the introduction to `zipline` in [Chapter 2](../02_market_and_fundamental_data/02_data_providers/04_zipline) for more detail.
78+
### Installation
79+
80+
- The current release 1.3 has a few shortcomings such as the [dependency on benchmark data from the IEX exchange](https://github.com/quantopian/zipline/issues/2480) and limitations for importing features beyond the basic OHLCV data points.
81+
- To enable the use of `zipline`, I've provided a [patched version](https://github.com/stefan-jansen/zipline) that works for the purposes of this book.
82+
- Install by cloning the repo, `cd` into the packages' root folder and, after activating the `ml4t` environment, run `pip install -e`
5283

5384
## Separating signal and noise – how to use alphalens
5485

07_linear_models/03_preparing_the_model_data.ipynb

+1-1
Original file line numberDiff line numberDiff line change
@@ -1112,7 +1112,7 @@
11121112
"name": "python",
11131113
"nbconvert_exporter": "python",
11141114
"pygments_lexer": "ipython3",
1115-
"version": "3.6.8"
1115+
"version": "3.7.6"
11161116
},
11171117
"toc": {
11181118
"base_numbering": 1,

0 commit comments

Comments
 (0)