Skip to content

Homework part 4 ‐ Data Analysis

Attila Ficsor edited this page Nov 20, 2024 · 4 revisions

Overview

The goal of this homework is to perform data analysis on a Smart Home dataset that includes data exploration, visualization, and deriving meaningful insights. The analysis process will involve the following steps:

  1. Preparation: Importing the necessary Python libraries and ensuring the environment is set up for analysis.
  2. Data Loading: Reading the dataset from a specified source (URL) and examining its structure, including the column names and types of data available.
  3. Data Profiling: Generating a detailed summary report to understand the dataset’s characteristics, such as missing values, distribution of variables, and basic statistics.
  4. Data Visualization: Creating various visualizations to explore the dataset in depth: Bar charts to illustrate categorical distributions (e.g., device types). Histograms to analyze the distribution of continuous variables (e.g., energy consumption). Boxplots to examine variability and patterns across different groups (e.g., device types or efficiency levels). Scatterplots to uncover relationships between key variables (e.g., energy consumption vs. usage hours). Heatmaps to visualize correlations between continuous variables in the dataset.
  5. Insights: Using the visualizations and correlations identified to understand trends, patterns, and potential relationships within the data.

Dataset and environment

The dataset contains the following features:

  • UserID: Unique identifier for each user.
  • DeviceType: Type of smart home device (e.g., Lights, Thermostat).
  • UsageHoursPerDay: Average hours per day the device is used.
  • EnergyConsumption: Daily energy consumption of the device (kWh).
  • UserPreferences: User preference for device usage (0 - Low, 1 - High).
  • MalfunctionIncidents: Number of malfunction incidents reported. – DeviceAgeMonths: Age of the device in months.
  • SmartHomeEfficiency (Target Variable): Efficiency status of the smart home device (0 - Inefficient, 1 - Efficient).

The following packages are recommended for completing the assignment:

Tasks

You can access the homework notebook here: https://colab.research.google.com/drive/1mnoRCkiEDfYvQubrLrqrEbVWSmzJsSaT?usp=sharing

Preparation: Make a personal copy (File -> Save a copy in Drive) of the Colab Notebook provided and work on the tasks within your copy. Rename the file to include your team name. Share the notebook with your teammates so you can work together (in the top right corner, click Share and either add people with their email addresses or create a link).

Tasks: The task descriptions are provided as comments within the notebook cells. Complete each task in its respective cell and add a Text cell after each task to describe your observations.

Submission: To submit the assignment, download the completed notebook file (File -> Download -> Download .ipynb) and upload it to your team's GitHub repository on the hw4 branch.

Tip: Use the Colab Notebook provided during the practical session as a reference for solving the tasks. Pay attention to labeling the axes on your charts appropriately.

Clone this wiki locally