yaourt intel-mkl-slim
The non-slim variety of above dies, maybe because /tmp 4g drive got full with an tgz extraction?
dont know about following:
yaourt intel-opencl-sdk
https://wiki.archlinux.org/index.php/GPGPU
dell desktop:
yaourt intel-opencl-runtime
samsung laptop:
pacman -S beignet
https://neanderthal.uncomplicate.org/articles/tutorial_native.html
has hands-on examples of the above concepts. Work on things you may have not fully grasped with the below material
The following sessions address the general tooling such as using the command line, Python (NumPy, Matplotlib, Pandas, Seaborn), Jupyter Notebooks, Git (and GitHub), and sending HTTP requests. You must be comfortable with these before attending the classes. The following sessions may assist that:
(Note that Jupyter Notebook has evolved into Jupyter Lab since the sessions were recorded and we will be using the latter in the class.
b. Lecture 2 for Pandas (scraping part optional)
- The following sessions are concept refreshers on cohort prerequisites:
Study academy’s pre-course presentations and make sure you search online for any concepts that you are not familiar with.
Start Jupyter Lab:
jupyter-lab
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
You can draw a line between two sets of data
just save all the data into the database and future queries lookup to closest value and k neighbours to figure out what answer should be.
Shuffle the dataset randomly. Split the dataset into k groups For each unique group: Take the group as a hold out or test data set Take the remaining groups as a training data set Fit a model on the training set and evaluate it on the test set Retain the evaluation score and discard the model Summarize the skill of the model using the sample of model evaluation scores
High bias means more error in your predictions.
how far we are away from the right answer.
The size of the step we take is called the learning rate.
Start Jupyter Lab:
import pandas as pd
# tab separated, column 0 is an index column
the_data = pd.read_csv("mydata.tsv", sep="\t", index_col=0)
# get min, max, std, mean info about the data
the_data.describe()
# get the column names (NOTE: property not function):
the_data.columns
# general info about the data, like how much memory it uses:
the_data.info()
# retrieve the first rows of the data to have a look at it
the_data.head()
This is following the kaggle intro to machine learning tutorial
https://www.kaggle.com/dansbecker/your-first-machine-learning-model
First lets select the thing we want to predict
y = the_data.hearing_damage
Choosing features
that are considered as inputs to the predictions
We select multiple features by providing a list of column names inside
brackets.
the_data_features = ['source_intensity', 'horizontal_distance', 'vertical_distance', 'insulation']
Now lets get a pandas dataframe with just these columns:
X = the_data[the_data_features]
from sklearn.tree import DecisionTreeRegressor
# Define model. Specify a number for random_state to ensure same results each run
data_model = DecisionTreeRegressor(random_state=1)
data_model.fit(X, y)
data_model.predict( X.head())
X.head()
Lets check how good the model is:
from sklearn.metrics import mean_absolute_error
predicted_home_prices = melbourne_model.predict(X)
mean_absolute_error(y, predicted_home_prices)
Its not good to use same data to train AND test with.
from sklearn.model_selection import train_test_split
# run this script.
train_X, val_X, train_y, val_y = train_test_split(X, y, random_state = 0)
# Define model
data_model = DecisionTreeRegressor()
# Fit model
data_model.fit(train_X, train_y)
# get predicted prices on validation data
val_predictions = data_model.predict(val_X)
print(mean_absolute_error(val_y, val_predictions))
We can adjust the depth of the decision tree with a line like:
model = DecisionTreeRegressor(max_leaf_nodes=max_leaf_nodes, random_state=0)
We can test a variety of tree depths and their MAE with:
def get_mae(max_leaf_nodes, train_X, val_X, train_y, val_y):
model = DecisionTreeRegressor(max_leaf_nodes=max_leaf_nodes, random_state=0)
model.fit(train_X, train_y)
preds_val = model.predict(val_X)
mae = mean_absolute_error(val_y, preds_val)
return(mae)
for max_leaf_nodes in [5, 50, 500, 5000]:
my_mae = get_mae(max_leaf_nodes, train_X, val_X, train_y, val_y)
print("Max leaf nodes: %d \t\t Mean Absolute Error: %d" %(max_leaf_nodes, my_mae))
** random forests
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error
forest_model = RandomForestRegressor(random_state=1)
forest_model.fit(train_X, train_y)
melb_preds = forest_model.predict(val_X)
print(mean_absolute_error(val_y, melb_preds))
# initialize list of lists
data = [['tom', 10], ['nick', 15], ['juli', 14]]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['Name', 'Age'])
For now, though, we are only dealing with inference, the process of computing the output using the given structure, input, and whatever weights there are.
dge rows columns Creates a GE matrix using double precision floating point native CPU engine
dv Creates a vector using double precision floating point native CPU engine
(axpy! alpha x y): a times x plus y destructive.
multiplies elements of vector/matrix x
by scalar alpha
, then adds
the result to vector/matrix y
.
mm! - matrix-matrix multiplication
(mm! alpha a b)
(mm! alpha a b c)
(mm! alpha a b beta c)
Multiply matrix a by b. Scale by alpha. Put result into one of a/b
whichever is the GE matrix. If c
is supplied result is put there.
If scalar beta
is supplied first scale c
by it.
(mrows a)
Returns the number of rows of the matrix a
.
mv! - Matrix-Vector multiplication
(mv! m1 x1 y)
Multiplies matrix m1
, by vector x1
, and adds it to vector y
.
(ncols a)
Returns the number of columns of the matrix a
.
rk!
(rk! alpha x y a)
Multiplies vector x
with transposed vector y
, scales resulting matrix
by alpha
, add result to a
.
fmax - keep max value of each pair from 2 vectors (let [v1 (dv [1 2 3]) v2 (dv [0 2 7])] (fmax v1 v2)) ;; => [1.00 2.00 7.00]