Matplotlib gives you the ability to draw charts which can be used to visualize data.
- pyplot provides the ability to draw plots similar to the MATLAB tool
- pyplot.plot plots a graph
- pyplot.show displays figures such as a graph
- pyplot.scatter is used to draw scatter plots, a diagram that shows the relationship between two sets of data
Explore related tutorials on Microsoft Learn.
When preparing data for machine learning you need to remove duplicate rows and you may need to delete rows with missing values.
- dropna removes rows with missing values
- duplicated returns a True or False to indicate if a row is a duplicate of a previous row
- drop_duplicates returns a DataFrame with duplicate rows removed
Explore related tutorials on Microsoft Learn.
Once a model is built it can be used to predict values. You can provide new values to see where it would fall on the spectrum, and test the generated model.
- LinearRegression fits a linear model
- LinearRegression.predict is used to predict outcomes for new data based on the trained linear model
Explore related tutorials on Microsoft Learn.
There are numerous libraries available for use for data scientists. NumPy and pandas are two of the most common.
Some operations may return different data types. You can use the Python function type to determine the type of an object.
NumPy is a Python package for scientific computing that includes a array and dictionary type objects for data analysis.
- array creates an N-dimensional array object
pandas is a Python package for data analysis that includes a 1 dimensional and 2 dimensional array objects
Explore related tutorials on Microsoft Learn.
When preparing data for machine learning you may need to remove specific columns from the DataFrame.
- drop deletes specified columns from a DataFrame
Explore related tutorials on Microsoft Learn.
scikit-learn is a library of tools for predictive data analysis, which will allow you to prepare your data for machine learning and create models.
- train_test_split splits arrays into random train and test subsets
Explore related tutorials on Microsoft Learn.
Linear regression is a common algorithm for predicting values based on a given dataset.
- LinearRegression fits a linear model
- LinearRegression.fit is used to fit the linear model based on training data
Explore related tutorials on Microsoft Learn.