Skip to content

Latest commit

 

History

History
123 lines (65 loc) · 6.35 KB

File metadata and controls

123 lines (65 loc) · 6.35 KB

Week 5

Visualizing data with Matplotlib

Matplotlib gives you the ability to draw charts which can be used to visualize data.

Common tools and functions

  • pyplot provides the ability to draw plots similar to the MATLAB tool
  • pyplot.plot plots a graph
  • pyplot.show displays figures such as a graph
  • pyplot.scatter is used to draw scatter plots, a diagram that shows the relationship between two sets of data

Microsoft Learn Resources

Explore related tutorials on Microsoft Learn.

Handling duplicates and rows with missing values

When preparing data for machine learning you need to remove duplicate rows and you may need to delete rows with missing values.

Common functions

  • dropna removes rows with missing values
  • duplicated returns a True or False to indicate if a row is a duplicate of a previous row
  • drop_duplicates returns a DataFrame with duplicate rows removed

Microsoft Learn Resources

Explore related tutorials on Microsoft Learn.

Testing a model

Once a model is built it can be used to predict values. You can provide new values to see where it would fall on the spectrum, and test the generated model.

Common classes and functions

Microsoft Learn Resources

Explore related tutorials on Microsoft Learn.

NumPy vs pandas

There are numerous libraries available for use for data scientists. NumPy and pandas are two of the most common.

Some operations may return different data types. You can use the Python function type to determine the type of an object.

NumPy

NumPy is a Python package for scientific computing that includes a array and dictionary type objects for data analysis.

Common object

  • array creates an N-dimensional array object

pandas

pandas is a Python package for data analysis that includes a 1 dimensional and 2 dimensional array objects

Common objects

  • Series stores a one dimensional array
  • DataFrame stores a two-dimensional array

Microsoft Learn Resources

Explore related tutorials on Microsoft Learn.

Removing and splitting DataFrame columns

When preparing data for machine learning you may need to remove specific columns from the DataFrame.

Common functions

  • drop deletes specified columns from a DataFrame

Microsoft Learn Resources

Explore related tutorials on Microsoft Learn.

Splitting test and training data with scikit-learn

scikit-learn is a library of tools for predictive data analysis, which will allow you to prepare your data for machine learning and create models.

Common functions

Microsoft Learn Resources

Explore related tutorials on Microsoft Learn.

Train a linear regression model with scikit-learn

Linear regression is a common algorithm for predicting values based on a given dataset.

Common classes and functions

Microsoft Learn Resources

Explore related tutorials on Microsoft Learn.