Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

a2-fedeit-Federico-Galbiati #18

Open
wants to merge 13 commits into
base: main
Choose a base branch
from
142 changes: 142 additions & 0 deletions ASSIGNMENT-README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,142 @@
# 02-DataVis-5ways

Assignment 2 - Data Visualization, 5 Ways
===

Now that you have successfully made a "visualization" of shapes and lines using d3, your next assignment is to successfully make a *actual visualization*... 5 times.

The goal of this project is to gain experience with as many data visualization libraries, languages, and tools as possible.

I have provided a small dataset about cars, `cars-sample.csv`.
Each row contains a car and several variables about it, including miles-per-gallon, manufacturer, and more.

Your goal is to use 5 different tools to make the following chart:

![ggplot2](img/ggplot2.png)

These features should be preserved as much as possible in your replication:

- Data positioning: it should be a downward-trending scatterplot as shown. Weight should be on the x-axis and MPG on the y-axis.
- Scales: Note the scales do not start at 0.
- Axis ticks and labels: both axes are labeled and there are tick marks at 10, 20, 30, etcetera.
- Color mapping to Manufacturer.
- Size mapping to Weight.
- Opacity of circles set to 0.5 or 50%.

Other features are not required. This includes:

- The background grid.
- The legends.

Note that some software packages will make it **impossible** to perfectly preserve the above requirements.
Be sure to note where these deviate.

Improvements are also welcome as part of Technical and Design achievements.

Libraries, Tools, Languages
---

You are required to use 5 different tools or libraries.
Of the 5 tools, you must use at least 3 libraries (libraries require code of some kind).
This could be `Python, R, Javascript`, or `Java, Javascript, Matlab` or any other combination.
Dedicated tools (i.e. Excel) do not count towards the language requirement.

Otherwise, you should seek tools and libraries to fill out your 5.

Below are a few ideas. Do not limit yourself to this list!
Some may be difficult choices, like Matlab or SPSS, which require large installations, licenses, and occasionally difficult UIs.

I have marked a few that are strongly suggested.

- R + ggplot2 `<- definitely worth trying`
- Excel
- d3 `<- since the rest of the class uses this, we're requiring it`
- Matplotlib
- three.js `<- well, it's a 3d library. not really recommended, but could be interesting and fun`
- p5js `<- good for playing around. not really a chart lib`
- Tableau
- Java 2d
- GNUplot `<- the CS department head uses this all the time :)`
- Vega-lite <- `<- very interesting formal visualizatio model; might be the future of the field`
- Flourish <- `<- popular in recent years`
- PowerBI
- SPSS

You may write everything from scratch, or start with demo programs from books or the web.
If you do start with code that you found, please identify the source of the code in your README and, most importantly, make non-trivial changes to the code to make it your own so you really learn what you're doing.

Tips
---

- If you're using d3, key to this assignment is knowing how to load data.
You will likely use the [`d3.json` or `d3.csv` functions](https://github.com/mbostock/d3/wiki/Requests) to load the data you found.
Beware that these functions are *asynchronous*, meaning it's possible to "build" an empty visualization before the data actually loads.

- *For web languages like d3* Don't forget to run a local webserver when you're debugging.
See this [ebook](http://chimera.labs.oreilly.com/books/1230000000345/ch04.html#_setting_up_a_web_server) if you're stuck.


Readme Requirements
---

A good readme with screenshots and structured documentation is required for this project.
It should be possible to scroll through your readme to get an overview of all the tools and visualizations you produced.

- Each visualization should start with a top-level heading (e.g. `# d3`)
- Each visualization should include a screenshot. Put these in an `img` folder and link through the readme (markdown command: `![caption](img/<imgname>)`.
- Write a paragraph for each visualization tool you use. What was easy? Difficult? Where could you see the tool being useful in the future? Did you have to use any hacks or data manipulation to get the right chart?

Other Requirements
---

0. Your code should be forked from the GitHub repo.
1. Place all code, Excel sheets, etcetera in a named folder. For example, `r-ggplot, matlab, mathematica, excel` and so on.
2. Your writeup (readme.md in the repo) should also contain the following:

- Description of the Technical achievements you attempted with this visualization.
- Some ideas include interaction, such as mousing over to see more detail about the point selected.
- Description of the Design achievements you attempted with this visualization.
- Some ideas include consistent color choice, font choice, element size (e.g. the size of the circles).

GitHub Details
---

- Fork the GitHub Repository. You now have a copy associated with your username.
- Make changes to fulfill the project requirements.
- To submit, make a [Pull Request](https://help.github.com/articles/using-pull-requests/) on the original repository.

Grading
---

Grades on a 120 point scale.
24 points will be based on your Technical and Design achievements, as explained in your readme.

Make sure you include the files necessary to reproduce your plots.
You should structure these in folders if helpful.
We will choose some at random to run and test.

**NOTE: THE BELOW IS A SAMPLE ENTRY TO GET YOU STARTED ON YOUR README. YOU MAY DELETE THE ABOVE.**

# R + ggplot2 + R Markdown

R is a language primarily focused on statistical computing.
ggplot2 is a popular library for charting in R.
R Markdown is a document format that compiles to HTML or PDF and allows you to include the output of R code directly in the document.

To visualized the cars dataset, I made use of ggplot2's `geom_point()` layer, with aesthetics functions for the color and size.

While it takes time to find the correct documentation, these functions made the effort creating this chart minimal.

![ggplot2](img/ggplot2.png)

# d3...

(And so on...)


## Technical Achievements
- **Proved P=NP**: Using a combination of...
- **Solved AI Forever**: ...

### Design Achievements
- **Re-vamped Apple's Design Philosophy**: As demonstrated in my colorscheme...
Binary file added Flourish/[email protected]
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 2 additions & 0 deletions Flourish/link.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Flourish Studio Published URL:
https://public.flourish.studio/visualisation/8490758/
1 change: 1 addition & 0 deletions R + ggplot2/help-sources.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
https://ggplot2.tidyverse.org
9 changes: 9 additions & 0 deletions R + ggplot2/plot-script.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Alternatively, install just ggplot2:
install.packages("ggplot2")
library("ggplot2")
data1 <- read.csv("/Users/federicogalbiati/Documents/GitHub/a2-DataVis-5Ways/cars-sample.csv", header=TRUE, stringsAsFactors=FALSE)

ggplot(data1, aes(x = Weight, y = MPG, colour = Manufacturer, size=Weight)) +
geom_point(alpha = 0.5)

ggsave('plot.png', width=6, height=4)
Binary file added R + ggplot2/plot.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
177 changes: 64 additions & 113 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,123 +1,37 @@
# 02-DataVis-5ways

Assignment 2 - Data Visualization, 5 Ways
===

Now that you have successfully made a "visualization" of shapes and lines using d3, your next assignment is to successfully make a *actual visualization*... 5 times.
# Plotly + Python

The goal of this project is to gain experience with as many data visualization libraries, languages, and tools as possible.
Plotly is a Python graphing library similar to Matplotlib, but steered further towards ML/AI and interactivity. It is used by companies such as NVIDIA and Tesla to perform data visualizations. Plotly also integrates natively with Dash, an interactive web visualization platform to show and interact with plots.

I have provided a small dataset about cars, `cars-sample.csv`.
Each row contains a car and several variables about it, including miles-per-gallon, manufacturer, and more.
With just three lines of code (+ library imports) and only reading the documentation I was able to recreate the plot. I first read the dataset using a Pandas dataframe. I then used the `px.scatter` function to make the plot and the `fig.show` function to show the plot. I simply had to pass the x, y, size, color, and opacity to the function.

Your goal is to use 5 different tools to make the following chart:
Pros:
- Very easy to use
- Powerful
- Can be connected to Dash for larger visualiations and interactivity

![ggplot2](img/ggplot2.png)

These features should be preserved as much as possible in your replication:

- Data positioning: it should be a downward-trending scatterplot as shown. Weight should be on the x-axis and MPG on the y-axis.
- Scales: Note the scales do not start at 0.
- Axis ticks and labels: both axes are labeled and there are tick marks at 10, 20, 30, etcetera.
- Color mapping to Manufacturer.
- Size mapping to Weight.
- Opacity of circles set to 0.5 or 50%.

Other features are not required. This includes:

- The background grid.
- The legends.

Note that some software packages will make it **impossible** to perfectly preserve the above requirements.
Be sure to note where these deviate.

Improvements are also welcome as part of Technical and Design achievements.

Libraries, Tools, Languages
---

You are required to use 5 different tools or libraries.
Of the 5 tools, you must use at least 3 libraries (libraries require code of some kind).
This could be `Python, R, Javascript`, or `Java, Javascript, Matlab` or any other combination.
Dedicated tools (i.e. Excel) do not count towards the language requirement.

Otherwise, you should seek tools and libraries to fill out your 5.

Below are a few ideas. Do not limit yourself to this list!
Some may be difficult choices, like Matlab or SPSS, which require large installations, licenses, and occasionally difficult UIs.

I have marked a few that are strongly suggested.

- R + ggplot2 `<- definitely worth trying`
- Excel
- d3 `<- since the rest of the class uses this, we're requiring it`
- Matplotlib
- three.js `<- well, it's a 3d library. not really recommended, but could be interesting and fun`
- p5js `<- good for playing around. not really a chart lib`
- Tableau
- Java 2d
- GNUplot `<- the CS department head uses this all the time :)`
- Vega-lite <- `<- very interesting formal visualizatio model; might be the future of the field`
- Flourish <- `<- popular in recent years`
- PowerBI
- SPSS

You may write everything from scratch, or start with demo programs from books or the web.
If you do start with code that you found, please identify the source of the code in your README and, most importantly, make non-trivial changes to the code to make it your own so you really learn what you're doing.

Tips
---

- If you're using d3, key to this assignment is knowing how to load data.
You will likely use the [`d3.json` or `d3.csv` functions](https://github.com/mbostock/d3/wiki/Requests) to load the data you found.
Beware that these functions are *asynchronous*, meaning it's possible to "build" an empty visualization before the data actually loads.
Cons:
- (Minor) Need to install Pandas separately to read a dataset

- *For web languages like d3* Don't forget to run a local webserver when you're debugging.
See this [ebook](http://chimera.labs.oreilly.com/books/1230000000345/ch04.html#_setting_up_a_web_server) if you're stuck.
![Plot reproduced in Plotly](plotly%20+%20python/plot.png)

# Flourish

Readme Requirements
---
I think Flourish is the best platform for complex data visualizations with no coding knowledge. The platform allows selecting a type of plot, uploading a dataset, and linking each marker property to a column of the data. Marker properties include size, color, shape, and more.

A good readme with screenshots and structured documentation is required for this project.
It should be possible to scroll through your readme to get an overview of all the tools and visualizations you produced.
Pros:
- Easy to use for people who don't know how to code
- Minimal setup needed
- Intuitive without reading any documentation

- Each visualization should start with a top-level heading (e.g. `# d3`)
- Each visualization should include a screenshot. Put these in an `img` folder and link through the readme (markdown command: `![caption](img/<imgname>)`.
- Write a paragraph for each visualization tool you use. What was easy? Difficult? Where could you see the tool being useful in the future? Did you have to use any hacks or data manipulation to get the right chart?
Cons:
- Not as customizable
- Cannot handle the dataset low-level such as using Pandas

Other Requirements
---
![Plot reproduced in FLourish](Flourish/[email protected])

0. Your code should be forked from the GitHub repo.
1. Place all code, Excel sheets, etcetera in a named folder. For example, `r-ggplot, matlab, mathematica, excel` and so on.
2. Your writeup (readme.md in the repo) should also contain the following:

- Description of the Technical achievements you attempted with this visualization.
- Some ideas include interaction, such as mousing over to see more detail about the point selected.
- Description of the Design achievements you attempted with this visualization.
- Some ideas include consistent color choice, font choice, element size (e.g. the size of the circles).

GitHub Details
---

- Fork the GitHub Repository. You now have a copy associated with your username.
- Make changes to fulfill the project requirements.
- To submit, make a [Pull Request](https://help.github.com/articles/using-pull-requests/) on the original repository.

Grading
---

Grades on a 120 point scale.
24 points will be based on your Technical and Design achievements, as explained in your readme.

Make sure you include the files necessary to reproduce your plots.
You should structure these in folders if helpful.
We will choose some at random to run and test.

**NOTE: THE BELOW IS A SAMPLE ENTRY TO GET YOU STARTED ON YOUR README. YOU MAY DELETE THE ABOVE.**

# R + ggplot2 + R Markdown
# R + ggplot2

R is a language primarily focused on statistical computing.
ggplot2 is a popular library for charting in R.
Expand All @@ -129,14 +43,51 @@ While it takes time to find the correct documentation, these functions made the

![ggplot2](img/ggplot2.png)

# d3...
# Vega-Lite
Vega-Lite allows to create complex plots by defining a JSON-style input that is compiled into a plot. Using the online portal at [https://vega.github.io/editor/#/custom/vega-lite](https://vega.github.io/editor/#/custom/vega-lite), it's possible to code and render the JSON automatically. Compared to previous solutions, Vega-Lite did require me to read multiple pages of documentation and also read a discussion on ObservableHQ. Once understood the structure of the JSON, however, it's easy to customize a plot with minimal effort.

(And so on...)
Pros:
- Powerful
- Plot encoded in JSON and easy to share

Cons:
- Need to have the dataset ready to be plotted
- Steeper learning curve than previous solutions

## Technical Achievements
- **Proved P=NP**: Using a combination of...
- **Solved AI Forever**: ...
![Plot reproduced in Vega-Lite](Vega-lite/visualization.png)

# d3.js
d3.js allows to create complex data visualizations using JavaScript. d3.js reads the csv file using a function, maps the data to object fields, and allows to visualize it using SVG components. Compared to other solutions which plot the data directly, d3.js requires to create the parts of the plot one-by-one. For example, the axis have to be added and translated to a specific position, same as the labels, ticks, and numbers. Using d3.js certainly requires to read some documentation and sample code. However, I think d3.js was the most versatile data visualization framework. Compared to the other tools, the developer has full control of the graphical components of the plot. I think this is a great option to make complex visualizations, but if trying to make something simple, other solutions allow for great time-saving.

Pros:
- Powerful
- Complete control over graphic components
- Can manipulate the data before plotting it

Cons:
- Need to read the documentation and sample code
- For simple plots it takes much longer than other data visualization platforms

![Plot reproduced in Vega-Lite](d3/d3js.png)

# Other
All design and technical achievements were completed on Python or JavaScript because they are the most versatile.
### Technical Achievements
- Managed NA values in the dataset
- The dataset contained some missing (NA) values. I could either filter them out, or make an interpolation to estimate the missing values. I imported the [Simple Statistics](https://simplestatistics.org) library to compute a linear regression on data points, and then used the model to predict the MPG values for the missing data.
- Implemented Plotly with Dash
- I connected the Plotly visualization with Dash, a web framework for interactive data visualization. With just four lines of code, I was able to make an interactive plot as seen in the GIF below.
![Dash and Plotly](plotly%20+%20python/dash.gif)

- Lighthouse tests:
- I tested the website using the Google Lighthouse tests and scored 100 on Performance, Accessibility, and Best Practices. I did this to check that I am following best practices, that the site is accessible (tests include background/foreground color contrast check), and that the dataset it read, interpolated, and processed quickly enough (performance metric).
![Lighthouse test](readme-img/d3js-lighthouse.png)

### Design Achievements
- **Re-vamped Apple's Design Philosophy**: As demonstrated in my colorscheme...
- Accessible color palette (screenshot below):
- I created a custom color palette of five colors using [https://colors.adobe.com](https://colors.adobe.com). I also validated the palette for accessibility using [https://colors.adobe.com/create/color-accessibility](https://colors.adobe.com/create/color-accessibility) to make sure it works for people with color blind or accessibility issues.
![Accessible color palette](readme-img/color-palette.png)
- Visualize NA values in the dataset
- To identify the points with a MPG value predicted by the linear regression model, I changed the stroke of those circles to make them darker. This allows to keep the same color palette and make the manufacturer easily recognizable, while warning the viewer that the point is predicted and not actual.

![Predicted point stroke](readme-img/predicted-vis.png)
4 changes: 4 additions & 0 deletions Vega-lite/help-sources.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
https://talk.observablehq.com/t/changing-colour-of-marks-based-on-data-values-vega-lite/3174
https://vega.github.io/vega-lite/examples/point_2d.html
https://vega.github.io/vega-lite/docs/size.html
https://vega.github.io/vega-lite/docs/config.html#mark-config
Binary file added Vega-lite/visualization.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading