Data Science Learning Path
Apa itu machine learning, artificial intelligence, dan data science
Apa saja masalah-masalah yang dapat diselesaikan menggunakan machine learning?
Bidang-bidang yang terkait dengan machine learning
Apa yang perlu dikuasai untuk menjadi seorang machine learning?
Pengenalan tentang regresi (termasuk evaluation metrics, e.g: MSE dan MAE)
Regresi linear sederhana
Regresi polinomial
Regresi dengan regularisasi
Suport vector regression
Generalized linear model
Pengenalan tentang klasifikasi dan confusion matrix
Logistic regression (regresi logistik)
LDA (Linear Discriminant Analysis)
k-NN (k-Nearest Neighbors)
Naive bayes
Decision tree
Support vector machine
Neural networks
Pengenalan tentang klastering
k-means klastering
EM (Expectation-Maximization) klastering
Klastering hirarkis
Chapter 05 - Metode Kernel
Pengenalan tentang metode kernel
Kernel k-means
Kernel SVM
Kernel regresi
Chapter 06 - Data Preprocessing
Feature engineering
Transformasi data
Data cleaning
Pengurangan dimensi (PCA, LDA)
Seleksi variabel
Pengenalan tentang deep learning dan tools
Chapter 02 - Deep Learning Model
CNN (Convolutional Neural Networks): case untuk klasifikasi digit MNIST
RNN (Recurrent Neural Networks)
Generative Model: GAN (Generative Adversarial Networks) dan Autoencoder
Chapter 03 - State of the Art Model
Deep learning Object Detection: SSD, Yolo, Mask RCNN
Deep learning Image Segmentation: FCN, SegNet, Mask RCNN
Text Mining dan Natural Language Processing
Chapter 01 - Pengantar Text Mining dan NLP
Overview Text Mining dan NLP
Corpus
Dictionary
Chapter 02 - Feature Extraction
Feature extraction
Bag of words
Term Document matrix
Term frequency and Weight
TF-IDF
POS Tagging
Named Entity Recognition
Chapter 04 - Text Classification
Overview Text Classification
Binary Classification
Multiclass Classification
Multilabel Classification
Information Retrieval
Text Clustering
Document Similarity
topic modeling
Word2Vec
Skip.Gram
CBOW
Language Modeling
Natural Language Understanding
Natural Language Generation
Pengenalan tentang computer vision dan tools
Representasi image dan video di dalam komputer
Chapter 02 - Image Thresholding
Binary thresholding
Otsu thresholding
Chapter 03 - Spatial Filtering
Pengenalan tentang spatial filtering
Smoothing (averaging filter)
Sharpening
Median filter
Sobel filter
Chapter 04 - Morphological Processing
Erosion
Dilation
Morphological opening & closing
Chapter 05 - Image Analysis
Connected component analysis
Image segmentation
Object detection: case face detection
Automatic Speech Recognition
Chapter 01 - Introduction
Overview Speech Recognition
Chapter 02 - Signal Processing
MFCC
LPC
Noise Reduction
Advance Speech Recognition
Speech Recognition for Low Resource
Large Vocabulary Continuous Speech Recognition
Speaker Indentification
Speech Enhancement
Speech separation
Overview Data Visualization
Principles of Data Visualization
Overview Chart
Pie Chart
Line Chart
Bar Chart
Stacked Bar Chart
Heat Map
Bubble Chart
Area Charts
Box Plot
Whisker plot
Scatter Plot
GeoSpatial
Real Time Data Visualization
MS Excel with Analysis toolpack
Java, Python
R, Rstudio, Rattle
Weka, Knime, RapidMiner
Hadoop dist of choice
Spark, Storm
Flume, Scibe, Chukwa
Nutch, Talend, Scraperwiki
Webscraper, Flume, Sqoop
tm, RWeka, NLTK
RHIPE
D3.js, ggplot2, Shiny
IBM Languageware
Microsoft Azure, AWS, Google Cloud
Cassandra, MongoDB
Microsoft Cognitive API
Tensorflow
Git
Pengenalan Basis Data
Basic SQL
Intermediate SQL
Advance SQL
Chapter 01 - Fundamentals
Matrices, Vector & Algebra fundamentals
Hash function, binary tree, O(n)
Relational algebra, DB basics (with SQL)
Inner, Outer, Cross, theta-join
CAP theorem
Tabular data
Entropy
Data frames & series
Sharding
OLAP
Multidimensional Data model
ETL
Reporting vs BI vs Analytics
JSON and XML
NoSQL
Regex
Pick a dataset
Descriptive statistics
Exploratory data analysis
Histograms
Percentiles & outliers
Probability theory
Bayes theorem
Random variables
Cumul Dist Fn (CDF)
Continuous distributions
Skewness
ANOVA
Prob Den Fn (PDF)
Central Limit theorem
Monte Carlo method
Hypothesis Testing
p-Value
Chi2 test
Estimation
Confid Int (CI)
MLE
Kernel Density estimate
Regression
Covariance
Correlation
Pearson coeff
Causation
Least2-fit
Euclidian Distance
Measures of centralizing Data
Measures of spread Data
Python Basics
Working in excel
R setup / R studio
R basics
Expressions
Variables
IBM SPSS
Rapid Miner
Vectors
Matrices
Arrays
Factors
Lists
Data frames
Reading CSV data
Reading raw data
Subsetting data
Manipulate data frames
Functions
Factor analysis
Install PKGS
Code versioning
Data Table
Map Reduce fundamentals
Hadoop Components
HDFS
Data replications Principles
Setup Hadoop
Name & data nodes
Job & task tracker
M/R programming
Sqop: Loading data in HDFS
Flume, Scribe
SQL with Pig
DWH with Hive
Scribe, Chukwa for Weblog
Using Mahout
Zookeeper Avro
Storm: Hadoop Realtime
Rhadoop, RHIPE
RMR
Cassandra
MongoDB, Neo4j
Chapter 05 - Data Munging
Summary of data formats
Data discovery
Data sources & Acquisition
Data integration
Data fusion
Transformation & enrichment
Data survey
Google OpenRefine
How much data ?
Using ETL
Dim. and num. reduction
Normalization
Data scrubbing
Handling missing Values
Unbiased estimators
Binning Sparse Values
Feature extraction
Denoising
Sampling
Stratified sampling
PCA
Intro Python
Set Up Environment
Data Structure
Iteration & Conditional
Intro Libraries
Function
OOP
Package
Numpy