Inspired by my collegue: Ricardo Zacarias
This repo contains all of the practical exercises I did during the Data Analytics Bootcamp @ Ironhack. The entire course lasted for 9 weeks (20-Jan, 20-March 2020) with an additional career week. It was divided into 3 modules:
| Mod/Week | Project | Language | Libraries | Topics/Methods |
|---|---|---|---|---|
| M1-W3 | Covid-19 & Global Awareness | Python | regex, os, numpy, pandas, requests, getpass, GetOldTweets3, csv, tweepy, wordcloud, matplotlib, seaborn | Team project with Ana Frias and Tristan Piat. We collected data from Johns Hopkins University, scrapped the Twitter activity of the World Health Organization (WHO) user account, collected data from Google trends and the Chinese National Health Center data and analysed the relationship between them. |
| M2-W6 | CAN PRISONERS, HORSE KICKS AND A MAN CALLED FISH PREDICT FOOTBALL SCORES? | Python | pandas, numpy, os, statsmodels, scipy, matplotlib, seaborn; stats, poisson, chi2, chisquare, norm | Used Hypothesis Testing and the Chi Squared Test to verify if the number of goals scored in a football (soccer) match fits the Poisson Distribution. |
| M3-W9 | O_Lemma | Python | eli5, io, itertools, json, nltk, numpy, os, pandas, pdfminer, random, re, sklearn.metrics, sklearn.model_selection, sklearn_crfsuite, spacy, sys, tqdm | Created a Residual CNN (Convolutional Neural Network) to auto-Redacted documents — identifing and replacing sensitive words in context and return to a redacted copy of the documents. |
In the table below is an index of each exercise ordered by bootcamp module and week, a link to the exercises, the programming language, libraries used and the main topics covered or methods used by me to solve the problems.
| Mod/Week | Lab | Language | Libraries | Topics/Methods |
|---|---|---|---|---|
| M1-W1 | resolving-git-conflicts | Git, Command Line, Bash | - | GitHub, add, commit, push, pull, merge, conflicts, pull requests |
| M1-W1 | tuple-set-dict | Python | random, operator, pandas | random.sample, operator.itemgetter, pd.DataFrame |
| M1-W1 | list-comprehensions | Python | os, numpy, pandas | os.listdir, os.path.join, pd.concat,np.array, _get_numeric_data |
| M1-W1 | string-operations | Python | re, math | f-strings, str.lower, str.endswith, str.join, str.split, str.replace, re.findall, re.search, bag of words |
| M1-W1 | lambda-functions | Python | - | functions, lambda, zip, sorted, dict.items |
| M1-W1 | numpy | Python | numpy, | np.random (random, rand, sample), np.ones, size, shape, np.reshape, np.transpose, np.array_equal, max, min, mean, np.empty, np.nditer, |
| M1-W1 | functions | Python | iter | functions, iterators, generators, yield |
| M1-W1 | intro-pandas | Python | pandas, numpy | pd.Series, pd.DataFrame, df.columns, subsetting, df.mean, df.max, df.median, df.sum |
| M1-W2 | map-reduce-filter | Python | numpy, pandas, functools | functions, map, reduce, filter |
| M1-W2 | import-export | Python | pandas | pd.read_csv, pd.to_csv, pd.read_excel, df.head, df.value_counts |
| M1-W2 | dataframe-calculations | Python | pandas, numpy, zipfile | df.shape, df.unique, str.contains, df.astype, df.isnull, df.apply, df.sort_values, df.equals, pd.get_dummies, df.corr, df.drop, pd.groupby.agg, df.quantile, |
| M1-W2 | first-queries | SQL | - | create db, create table, select, distinct, group by, order by, where, limit, count |
| M1-W2 | my-sql-select | SQL | - | aliases, inner join, left join, sum, coalesce, |
| M1-W2 | my-sql | SQL | - | db design, table relationships, db seeding, forward engineering schemas, one-to-many, many-to-one, many-to-many, linking tables |
| M1-W2 | advanced-mysql | SQL | - | temporary tables, subqueries, permanent tables |
| M1-W2 | data-cleaning | Python | pandas, numpy, scipy.stats | pd.rename, df.dtypes, pd.merge, df.fillna, np.abs, stats.zscore |
| M1-W3 | api-scavenger | Python, APIs, Command Line | pandas, pandas.io.json | curl, pd.read_json, json_normalize, pd.to_datetime |
| M1-W3 | web-scraping | Python, APIs | requests, beautifulsoup, tweepy | requests.get, requests.get.content, BeautifulSoup, soup.find_all, soup.tag.text, soup.tag.get, soup.tag.find, tweepy.get_user, tweepy.user_timeline, tweepy.user.statuses_count, tweepy.user.follower_count |
| M1-W3 | advanced-regex | Python | re | re.findall, re.sub, |
| M1-W3 | matplotlib-seaborn | Python | matplotlib.pyplot, seaborn, numpy, pandas | plt.plot, plt.show, plt.subplots, plt.legend, plt.bar, plt.barh, plt.pie, plt.boxplot, plt.xticks, ax.set_title, ax.set_xlabel, sns.set, sns.distplot, sns.barplot, sns.despine, sns.violinplot, sns.catplot, sns.heatmap, np.linspace, pd.select_dtypes, pd.Categorical, df.cat.codes, np.triu, sns.diverging_palette |
| M1-W3 | pandas-deep-dive | Python | pandas | df.describe, df.groupby.agg, df.apply |
| M2-W4 | subsetting-and-descriptive-stats | Python | pandas, matplotlib, seaborn | df.loc, df.groupby.agg, df.quantile, df.describe, |
| M2-W4 | understanding-descriptive-stats | Python | pandas, random, matplotlib, numpy | random.choice, plt.hist, plt.vlines, np.mean, np.std |
| M2-W4 | regression-analysis | Python | numpy, pandas, scipy, sklearn.linear_model, matplotlib, seaborn | plt.scatter, df.corr, scipy.stats.linregress, sns.heatmap, sklearn.LinearRegression, lm.fit, lm.score, lm.coef_, lm.intercept |
| M2-W4 | advanced-pandas | Python | pandas, numpy, random | df.isnull, df.set_index, df.reset_index, random.choices, df.lookup, pd.cut |
| M2-W4 | mini-project1 | Python | pandas, numpy, matplotlib, seaborn, scipy.stats | EDA, df.map, df.info, df.apply (with lambda), df.replace, df.dropna, sns.boxplot, plt.subplots_adjust, df.drop, sns.pairplot, sns.regplot, sns.jointplot, stats.linregress |
| M2-W4 | pivot-table-and-correlation | Python | pandas, scipy.stats | df.pivot_table(index, columns, aggfunc), stats.linregress, plt.scatter, stats.pearsonr, stats.speamanr |
| M2-W4 | tableau | Tableau | - | mini project: analyzed the relationship between the number of characters in the title and description of apps and umber of downloads |
| M2-W5 | intro-probability | Probability | - | probability space, conditional probability, contingency tables |
| M2-W5 | reading-stats-concepts | Statistics | - | p-values, AB testing, means and expected values |
| M2-W5 | probability-distributions | Python | scipy.stats, numpy | discrete: stats.binom, stats.poisson. continuous: stats.uniform, stats.norm, stats.expon, np.random.exponential, stats.rvs, stats.cdf, stats.pdf, stats.ppf |
| M2-W5 | confidence-intervals | Python | scipy.stats, numpy | stats.norm.interval, calculating sample sizes |
| M2-W5 | intro-to-scipy | Python | scipy, numpy | stats.tmean, stats.fisher_exact, scipy.interpolate, interpolate.interp1d, np.arange |
| M2-W5 | hypothesis-testing-1 | Python | scipy.stats, numpy, pandas, statsmodels | stats.ttest_1samp, stats.sem, stats.t.interval, pd.crosstab, statsmodels.proportions_ztest |
| M2-W5 | hypothesis-testing-2 | Python | pandas, scipy.stats | stats.f_oneway, stats.ttest_ind, stats.ttest_rel, pd.concat |
| M2-W5 | mini-project2 | Python | pandas, numpy, scipy.stats, matplotlib | stats.norm, stats.ppf, stats.t.interval, stats.pdf, np.linspace, stats.shapiro |
| M2-W6 | two-sample-hyp-test | Python | pandas, scipy.stats, numpy | stats.ttest_ind, stats.ttest_rel, stats.ttest_1samp, stats.chi2_contingency, np.where |
| M2-W6 | goodfit-indeptests | Python | scipy.stats, numpy | stats.poisson, stats.pmf, stats.chisquare, stats.norm, stats.kstest, stats.cdf, stats.chi2_contingency, stats.binom |
| M3-W7 | intro-to-ml | Python | pandas, numpy, datetime, sklearn.model_selection | pd.to_numeric, df.interpolate, np.where, dt.strptime, dt.toordinal, train_test_split |
| M3-W7 | supervised-learning-feature-extraction | Python | pandas, numpy | pd.to_numeric, df.apply, pd.to_datetime, np.where, pd.merge |
| M3-W7 | supervised-learning | Python | pandas, seaborn, sklearn.model_selection, sklearn.linear_model, LogisticRegression, sklearn.neighbors, sklearn.preprocessin | df.corr, sns.heatmap, df.drop, df.dropna, pd.get_dummies, train_test_split, LogisticRegression, confusion_matrix, accuracy_score, KNeighborsClassifier, RobustScaler |
| M3-W7 | supervised-learning-sklearn | Python | sklearn.linear_model, sklearn.datasets, sklearn.preprocessing, sklearn.model_selection, statsmodels.api, sklearn.metrics, sklearn.feature_selection | LinearRegression, load_diabetes, PolynomialFeatures, StandardScaler, train_test_split, sm.OLS, r2_score, RFE |
| M3-W7 | unsupervised-learning | Python | sklearn.preprocessing, sklearn.cluster, sklearn.metrics, yellowbrick.cluster | StandardScaler, KMeans, silhouette_score, KElbowVisualizer, DBSCAN |
| M3-W7 | unsupervised-learning-and-sklearn | Python | sklearn.preprocessing, sklearn.cluster, mpl_toolkits.mplot3d | LabelEncoder, KMeans, fig.gca(projection='3d') |
| M3-W8 | problems-in-ml | Python | sklearn.metrics, sklearn.model_selection, sklearn.ensemble, sklearn.datasets, sklearn.svm, matplotlib.colors | r2_score, mean_squared_error, train_test_split, RandomForestRegressor, load_boston, SVC, ListedColormap |
| M3-W8 | imbalance | Python | sklearn.model_selection, sklearn.preprocessing, sklearn.linear_model, sklearn.tree, sklearn.preprocessing, sklearn.metrics | train_test_split, LabelEncoder, LogisticRegression, DecisionTreeClassifier, RobustScaler, StandardScaler, PolynomialFeatures, MinMaxScaler, confusion_matrix, accuracy_score |
| M3-W8 | deep-learning | Python | tensorflow, keras.models, keras.layers, keras.utils, sklearn.model_selection | keras.Sequential, keras.Dense, keras.to_categorical, save_weights, load_weights |
| M3-W8 | nlp | Python | re, nltk, nltk.stem, nltk.corpus, sklearn.feature_extraction.text, nltk.probability | WordNetLemmatizer, stopwords, CountVectorizer, TfidfVectorizer, ConditionalFreqDist, nltk.word_tokenize, nltk.PorterStemmer, nltk.WordNetLemmatizer, nltk.NaiveBayesClassifier, nltk.classify.accuracy, classifier.show_most_informative_features |