Skip to content

This repo contains all of the practical exercises I did during the Data Analytics Bootcamp @ Ironhack.

Notifications You must be signed in to change notification settings

duarteharris/ironhack-labs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

61 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Ironhack Data Analytics Bootcamp

Inspired by my collegue: Ricardo Zacarias

This repo contains all of the practical exercises I did during the Data Analytics Bootcamp @ Ironhack. The entire course lasted for 9 weeks (20-Jan, 20-March 2020) with an additional career week. It was divided into 3 modules:

  1. Git, Python and SQL;
  2. Statistics and probability;
  3. Machine Learning;

Projects

Mod/Week Project Language Libraries Topics/Methods
M1-W3 Covid-19 & Global Awareness Python regex, os, numpy, pandas, requests, getpass, GetOldTweets3, csv, tweepy, wordcloud, matplotlib, seaborn Team project with Ana Frias and Tristan Piat. We collected data from Johns Hopkins University, scrapped the Twitter activity of the World Health Organization (WHO) user account, collected data from Google trends and the Chinese National Health Center data and analysed the relationship between them.
M2-W6 CAN PRISONERS, HORSE KICKS AND A MAN CALLED FISH PREDICT FOOTBALL SCORES? Python pandas, numpy, os, statsmodels, scipy, matplotlib, seaborn; stats, poisson, chi2, chisquare, norm Used Hypothesis Testing and the Chi Squared Test to verify if the number of goals scored in a football (soccer) match fits the Poisson Distribution.
M3-W9 O_Lemma Python eli5, io, itertools, json, nltk, numpy, os, pandas, pdfminer, random, re, sklearn.metrics, sklearn.model_selection, sklearn_crfsuite, spacy, sys, tqdm Created a Residual CNN (Convolutional Neural Network) to auto-Redacted documents — identifing and replacing sensitive words in context and return to a redacted copy of the documents.

Lab Index

In the table below is an index of each exercise ordered by bootcamp module and week, a link to the exercises, the programming language, libraries used and the main topics covered or methods used by me to solve the problems.

Mod/Week Lab Language Libraries Topics/Methods
M1-W1 resolving-git-conflicts Git, Command Line, Bash - GitHub, add, commit, push, pull, merge, conflicts, pull requests
M1-W1 tuple-set-dict Python random, operator, pandas random.sample, operator.itemgetter, pd.DataFrame
M1-W1 list-comprehensions Python os, numpy, pandas os.listdir, os.path.join, pd.concat,np.array, _get_numeric_data
M1-W1 string-operations Python re, math f-strings, str.lower, str.endswith, str.join, str.split, str.replace, re.findall, re.search, bag of words
M1-W1 lambda-functions Python - functions, lambda, zip, sorted, dict.items
M1-W1 numpy Python numpy, np.random (random, rand, sample), np.ones, size, shape, np.reshape, np.transpose, np.array_equal, max, min, mean, np.empty, np.nditer,
M1-W1 functions Python iter functions, iterators, generators, yield
M1-W1 intro-pandas Python pandas, numpy pd.Series, pd.DataFrame, df.columns, subsetting, df.mean, df.max, df.median, df.sum
M1-W2 map-reduce-filter Python numpy, pandas, functools functions, map, reduce, filter
M1-W2 import-export Python pandas pd.read_csv, pd.to_csv, pd.read_excel, df.head, df.value_counts
M1-W2 dataframe-calculations Python pandas, numpy, zipfile df.shape, df.unique, str.contains, df.astype, df.isnull, df.apply, df.sort_values, df.equals, pd.get_dummies, df.corr, df.drop, pd.groupby.agg, df.quantile,
M1-W2 first-queries SQL - create db, create table, select, distinct, group by, order by, where, limit, count
M1-W2 my-sql-select SQL - aliases, inner join, left join, sum, coalesce,
M1-W2 my-sql SQL - db design, table relationships, db seeding, forward engineering schemas, one-to-many, many-to-one, many-to-many, linking tables
M1-W2 advanced-mysql SQL - temporary tables, subqueries, permanent tables
M1-W2 data-cleaning Python pandas, numpy, scipy.stats pd.rename, df.dtypes, pd.merge, df.fillna, np.abs, stats.zscore
M1-W3 api-scavenger Python, APIs, Command Line pandas, pandas.io.json curl, pd.read_json, json_normalize, pd.to_datetime
M1-W3 web-scraping Python, APIs requests, beautifulsoup, tweepy requests.get, requests.get.content, BeautifulSoup, soup.find_all, soup.tag.text, soup.tag.get, soup.tag.find, tweepy.get_user, tweepy.user_timeline, tweepy.user.statuses_count, tweepy.user.follower_count
M1-W3 advanced-regex Python re re.findall, re.sub,
M1-W3 matplotlib-seaborn Python matplotlib.pyplot, seaborn, numpy, pandas plt.plot, plt.show, plt.subplots, plt.legend, plt.bar, plt.barh, plt.pie, plt.boxplot, plt.xticks, ax.set_title, ax.set_xlabel, sns.set, sns.distplot, sns.barplot, sns.despine, sns.violinplot, sns.catplot, sns.heatmap, np.linspace, pd.select_dtypes, pd.Categorical, df.cat.codes, np.triu, sns.diverging_palette
M1-W3 pandas-deep-dive Python pandas df.describe, df.groupby.agg, df.apply
M2-W4 subsetting-and-descriptive-stats Python pandas, matplotlib, seaborn df.loc, df.groupby.agg, df.quantile, df.describe,
M2-W4 understanding-descriptive-stats Python pandas, random, matplotlib, numpy random.choice, plt.hist, plt.vlines, np.mean, np.std
M2-W4 regression-analysis Python numpy, pandas, scipy, sklearn.linear_model, matplotlib, seaborn plt.scatter, df.corr, scipy.stats.linregress, sns.heatmap, sklearn.LinearRegression, lm.fit, lm.score, lm.coef_, lm.intercept
M2-W4 advanced-pandas Python pandas, numpy, random df.isnull, df.set_index, df.reset_index, random.choices, df.lookup, pd.cut
M2-W4 mini-project1 Python pandas, numpy, matplotlib, seaborn, scipy.stats EDA, df.map, df.info, df.apply (with lambda), df.replace, df.dropna, sns.boxplot, plt.subplots_adjust, df.drop, sns.pairplot, sns.regplot, sns.jointplot, stats.linregress
M2-W4 pivot-table-and-correlation Python pandas, scipy.stats df.pivot_table(index, columns, aggfunc), stats.linregress, plt.scatter, stats.pearsonr, stats.speamanr
M2-W4 tableau Tableau - mini project: analyzed the relationship between the number of characters in the title and description of apps and umber of downloads
M2-W5 intro-probability Probability - probability space, conditional probability, contingency tables
M2-W5 reading-stats-concepts Statistics - p-values, AB testing, means and expected values
M2-W5 probability-distributions Python scipy.stats, numpy discrete: stats.binom, stats.poisson. continuous: stats.uniform, stats.norm, stats.expon, np.random.exponential, stats.rvs, stats.cdf, stats.pdf, stats.ppf
M2-W5 confidence-intervals Python scipy.stats, numpy stats.norm.interval, calculating sample sizes
M2-W5 intro-to-scipy Python scipy, numpy stats.tmean, stats.fisher_exact, scipy.interpolate, interpolate.interp1d, np.arange
M2-W5 hypothesis-testing-1 Python scipy.stats, numpy, pandas, statsmodels stats.ttest_1samp, stats.sem, stats.t.interval, pd.crosstab, statsmodels.proportions_ztest
M2-W5 hypothesis-testing-2 Python pandas, scipy.stats stats.f_oneway, stats.ttest_ind, stats.ttest_rel, pd.concat
M2-W5 mini-project2 Python pandas, numpy, scipy.stats, matplotlib stats.norm, stats.ppf, stats.t.interval, stats.pdf, np.linspace, stats.shapiro
M2-W6 two-sample-hyp-test Python pandas, scipy.stats, numpy stats.ttest_ind, stats.ttest_rel, stats.ttest_1samp, stats.chi2_contingency, np.where
M2-W6 goodfit-indeptests Python scipy.stats, numpy stats.poisson, stats.pmf, stats.chisquare, stats.norm, stats.kstest, stats.cdf, stats.chi2_contingency, stats.binom
M3-W7 intro-to-ml Python pandas, numpy, datetime, sklearn.model_selection pd.to_numeric, df.interpolate, np.where, dt.strptime, dt.toordinal, train_test_split
M3-W7 supervised-learning-feature-extraction Python pandas, numpy pd.to_numeric, df.apply, pd.to_datetime, np.where, pd.merge
M3-W7 supervised-learning Python pandas, seaborn, sklearn.model_selection, sklearn.linear_model, LogisticRegression, sklearn.neighbors, sklearn.preprocessin df.corr, sns.heatmap, df.drop, df.dropna, pd.get_dummies, train_test_split, LogisticRegression, confusion_matrix, accuracy_score, KNeighborsClassifier, RobustScaler
M3-W7 supervised-learning-sklearn Python sklearn.linear_model, sklearn.datasets, sklearn.preprocessing, sklearn.model_selection, statsmodels.api, sklearn.metrics, sklearn.feature_selection LinearRegression, load_diabetes, PolynomialFeatures, StandardScaler, train_test_split, sm.OLS, r2_score, RFE
M3-W7 unsupervised-learning Python sklearn.preprocessing, sklearn.cluster, sklearn.metrics, yellowbrick.cluster StandardScaler, KMeans, silhouette_score, KElbowVisualizer, DBSCAN
M3-W7 unsupervised-learning-and-sklearn Python sklearn.preprocessing, sklearn.cluster, mpl_toolkits.mplot3d LabelEncoder, KMeans, fig.gca(projection='3d')
M3-W8 problems-in-ml Python sklearn.metrics, sklearn.model_selection, sklearn.ensemble, sklearn.datasets, sklearn.svm, matplotlib.colors r2_score, mean_squared_error, train_test_split, RandomForestRegressor, load_boston, SVC, ListedColormap
M3-W8 imbalance Python sklearn.model_selection, sklearn.preprocessing, sklearn.linear_model, sklearn.tree, sklearn.preprocessing, sklearn.metrics train_test_split, LabelEncoder, LogisticRegression, DecisionTreeClassifier, RobustScaler, StandardScaler, PolynomialFeatures, MinMaxScaler, confusion_matrix, accuracy_score
M3-W8 deep-learning Python tensorflow, keras.models, keras.layers, keras.utils, sklearn.model_selection keras.Sequential, keras.Dense, keras.to_categorical, save_weights, load_weights
M3-W8 nlp Python re, nltk, nltk.stem, nltk.corpus, sklearn.feature_extraction.text, nltk.probability WordNetLemmatizer, stopwords, CountVectorizer, TfidfVectorizer, ConditionalFreqDist, nltk.word_tokenize, nltk.PorterStemmer, nltk.WordNetLemmatizer, nltk.NaiveBayesClassifier, nltk.classify.accuracy, classifier.show_most_informative_features

About

This repo contains all of the practical exercises I did during the Data Analytics Bootcamp @ Ironhack.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published