We are a portfolio investment company and we make investments in the emerging markets around the world. Our company profits by investing in profitable companies, buying, holding and selling company stocks based on value investing principles.
Our goal is to establish a robust intelligent system to aid our value investing efforts using stock market data. We make investment decisions and based on intrinsic value of companies and do not trade on the basis of daily market volatility. Our profit realization strategy typically involves weekly, monthly and quarterly performance of stocks we buy or hold.
You are given a set of portfolio companies trading data from emerging markets including 2020 Q1-Q2-Q3-Q4 2021 Q1 stock prices. Each company stock is provided in different sheets. Each market's operating days varies based on the country of the company and the market the stocks are exchanged. Use only 2020 data and predict with 2021 Q1 data.
Predict stock price valuations on a daily, weekly and monthly basis. Recommend BUY, HOLD, SELL decisions. Maximize capital returns, minimize losses. Ideally a loss should never happen. Minimize HOLD period.
This repository will predict the stock price for 8 different companies and each dataset will be divided into two parts, which are:
1) Analysing and preprocessing the data.
2) Stock Price forecasting with the buy, sell and hold recommendations..
To achieve this, both the analysing_report.py
and models.py
python files will be used. So, let's see what every dataset going through:
-
Data preprocessing:
- Make the index equal to the date.
- Remove special characters or string(s) like the "M" in the
vol
column, which mean the volume in millions. - Transform all the value in thousands of units to million.
-
Make the time series stationary:
- Check if the price column is stationary or not, and if it's not, the price time series will be transform to stationary by taking the difference between every data point and the previous one.
-
Add new features:
- Make new features from the existing ones by taking the mean and the standard deviation for all of the original columns for
3
,7
, and30
days and adding them as new columns. - Note: With testing different features with different models, it seems the models' performance is better with the original features only.
- Make new features from the existing ones by taking the mean and the standard deviation for all of the original columns for
-
Plotting some stat:
-
Plotting the price to notice if there are any trends and to take a general idea about the price time series, like in the following picture.
-
The year and months box plot as in the following pictures, but we need to take into our mind that, the available data in
2021
is only for the first quarter, so this could be changing with more data for the rest of the year.
and to have a clear idea, there is a figure showing every month price box plot with hue by the
year
like in the following picture.-
Finally, there are two figures showing the price time series before and after making it stationary like in the following pictures.
All of this could be done by using only one function, which is
analysing_report
and passing both of the data names which is SBER in this case, and the sheet number in the excel file which is 0 in this case.
-
The analysis_report
function output will be used in models.py
for:
- Statical and machine learning models:
- Arima.
- Sarima.
- Arimax.
- Prophet.
For all of the statical models, auto models were have been used to adjust the model parameters.
-
Deep learinig models:
-
Univariate LSTM:
- Vanilla LSTM with the price feature only with 100 hidden units.
-
Multivariate LSTM:
- Vanilla LSTM with the
open
,high
,low
,vol
andchange
features with 100 hidden units.
- Vanilla LSTM with the
-
Univariate ConvLSTM:
- ConvLSTM with only the
price
column with one layer with64
hidden units.
- ConvLSTM with only the
-
Multivariate ConvLSTM:
- ConvLSTM with
open
,high
,low
,vol
andchange
features with one layer with64
hidden units.
- ConvLSTM with
-
-
Plotting the predictions VS. real price, as well as plotting the Train VS. Val Loss scores for the deep learning models as in the following pictures.
-
Plotting the Bollinger band for the best model for each dataset and saving the prediction as well as the labels which are the
buy signal
,sell signal
andhold signal
into a new CSV file in the output folder as in the following picture.
All of this could be done by using only one function, which is models_report
and passing both of the data names which is SBER in this case, and the dataframe.
The stock market in all of the 8 companies suffers in Mar and the market correct itself around May which is make sense with the coronavirus pandemic. For the modelling, Arimax shows a great result with all of the datasets.