week 1
This document provides a beginner-friendly overview of key concepts in data analysis and machine learning. Whether you're just starting with data science or need a quick refresher.
Data refers to raw facts and figures collected from observations, measurements, transactions, etc. It has no meaning unless processed.
- Example:
- Numbers:
21, 35, 47 - Text:
"Red","India","Sunny"
- Numbers:
Data Analytics is the process of collecting, cleaning, transforming, and interpreting data to discover useful insights for decision-making.
- Example:
- Finding which product sold the most last month.
- Analyzing website traffic to improve performance.
| Feature | Data Analyst | Data Scientist |
|---|---|---|
| Goal | Understand trends and patterns | Predict future outcomes |
| Focus | Reporting and visualization | Modeling and experimentation |
| Tools | Excel, SQL, Power BI, Tableau | Python, R, TensorFlow, Scikit-learn |
| Output | Dashboards, reports | Predictive models, algorithms |
| Type | Description | Example |
|---|---|---|
| Structured | Organized into tables | SQL Databases, Excel |
| Unstructured | No specific format | Images, Videos, Emails |
| Semi-Structured | Partial structure | JSON, XML, HTML logs |
| Quantitative | Numeric and measurable | Age, Salary |
| Qualitative | Descriptive or categorical | Gender, Country, Color |
| Type | Description | Example |
|---|---|---|
| Supervised Learning | Input and labeled output available | Email spam detection |
| Unsupervised Learning | Only input, no labels | Customer segmentation |
| Semi-Supervised | Few labeled and many unlabeled examples | Webpage classification |
| Reinforcement Learning | Agent learns through rewards/punishments | Game AI, robotic control |
| Type | Description |
|---|---|
| Descriptive | Summarizes past data (what happened) |
| Diagnostic | Investigates causes (why it happened) |
| Predictive | Forecasts future outcomes |
| Prescriptive | Suggests actions based on predictions |
| Exploratory | Finds unknown patterns or relationships |
| Term | Description | Python Example |
|---|---|---|
| Mean | Average value | df['col'].mean() |
| Median | Middle value in sorted data | df['col'].median() |
| Mode | Most frequently occurring value | df['col'].mode() |
Name, Age, Gender, Score
Alice, 24, Female, 85
Bob, 27, Male, 78
Charlie, , Male, 92
Diana, 23, Female,
๐ What is the method called? (Listwise Deletion)
Listwise Deletion is a method for handling missing data by removing any row that contains even a single missing value.
Pros: Simple and easy to implement
Cons: Can lead to significant data loss
๐ How to Perform Listwise Deletion in Python (Pandas)
python
Copy
Edit
import pandas as pd
# Load data
df = pd.read_csv('data.csv')
# Drop rows with any missing values
df_clean = df.dropna()
# Show cleaned data
print(df_clean)
๐ License
This project is free to use for learning and educational purposes.
week 2
from pathlib import Path
# Content for the README.md file on NumPy fundamentals
numpy_readme_content = """
# ๐งฎ NumPy Fundamentals
This document provides an overview of essential NumPy operations for beginners, including array creation, reshaping, random generation, vector operations, and memory efficiency.
---
## ๐ What is NumPy?
**NumPy** (Numerical Python) is a powerful library for numerical computing in Python. It offers fast operations on arrays and matrices, and is the foundation for most data science libraries.
---
## ๐ Basic Array Creation
```python
import numpy as np
# 1D Array
arr1 = np.array([1, 2, 3])
# 2D Array
arr2 = np.array([[1, 2], [3, 4]])
# Zeros and Ones
zeros = np.zeros((2, 3))
ones = np.ones((3, 3))
# Empty Array
empty = np.empty((2, 2))
๐ Array Generation Functions
python
Always show details
Copy
# Range of values
np.arange(0, 10, 2) # [0 2 4 6 8]
# Evenly spaced numbers
np.linspace(0, 1, 5) # [0. 0.25 0.5 0.75 1. ]
# Identity matrix
np.eye(3) # 3x3 Identity matrix
๐ Random Generation Functions
python
Always show details
Copy
# Random floats between 0 and 1
np.random.rand(3, 2)
# Random integers
np.random.randint(1, 10, size=(2, 3))
# Normal distribution
np.random.randn(4)
# Seed for reproducibility
np.random.seed(42)
๐ Array Attributes
python
Always show details
Copy
arr = np.array([[1, 2, 3], [4, 5, 6]])
arr.shape # (2, 3)
arr.ndim # 2 (dimensions)
arr.size # 6 (total elements)
arr.dtype # int64 (data type)
arr.itemsize # 8 (bytes per element)
๐ Data Types in NumPy
python
Always show details
Copy
# Specifying data types
np.array([1, 2, 3], dtype='float32')
np.array([1, 2, 3], dtype='int8')
๐ Performance and Memory Efficiency
NumPy uses contiguous memory blocks and vectorized operations, making it significantly faster and more memory-efficient than Python lists.
python
Always show details
Copy
import time
a = list(range(1000000))
b = np.arange(1000000)
# Performance comparison (list vs array operations)
๐ Vectorized Operations
python
Always show details
Copy
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
# Element-wise operations
a + b # [5 7 9]
a * b # [4 10 18]
a ** 2 # [1 4 9]
๐ Reshaping and Resizing
python
Always show details
Copy
arr = np.arange(12)
# Reshape to 3x4
arr.reshape(3, 4)
# Flatten array
arr.flatten()
# Resize (in-place)
arr.resize((2, 6))
๐ Useful Array Methods
python
Always show details
Copy
arr.sum() # Sum of all elements
arr.mean() # Mean
arr.std() # Standard deviation
arr.max() # Maximum value
arr.min() # Minimum value
arr.argmin() # Index of min
arr.argmax() # Index of max
arr.T # Transpose
๐ License
This project is free to use for educational purposes.