-
Notifications
You must be signed in to change notification settings - Fork 18
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add some material for the two initial extras sections
- Loading branch information
1 parent
381cdd6
commit c9baaf6
Showing
4 changed files
with
112 additions
and
7 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,64 @@ | ||
--- | ||
|
||
layout: ots | ||
title: Alternative Approaches | ||
|
||
--- | ||
|
||
In this workshop we've mostly looked at processing unstructured data | ||
from plain text files. This is a very common and simple way to come | ||
across data, but it's not the only way and it's not always the easiest | ||
way to work with data. | ||
|
||
# Relational Databases | ||
|
||
Relational databases provide a way of storing data in tables with | ||
relationships between them. Many large organisations and websites | ||
store their data in some kind of relational database. | ||
|
||
For instance, the OpenFlights data we worked with in the | ||
[CSV](../core/csv.html) chapter is almost certainly exported from a | ||
relational database of some kind. The relational database holds all of | ||
its data in tables, for instance the "airports" table hold all the | ||
airports and the "routes" table would hold all the airline routes. | ||
|
||
However the relational database also holds *relations* between | ||
different kinds of data - for example it can know that all airline | ||
routes in the routes table contain references to a source and a | ||
destination airport, and that these airports should exist in the | ||
airports table. | ||
|
||
We often use a query language called SQL to retrieve information from | ||
a relational database. For example, here is a made-up SQL query to | ||
count the number of airports in Russia: | ||
|
||
SELECT COUNT(*) FROM airports WHERE country = 'Russia'; | ||
|
||
You can integrate SQL queries into other general purpose programming | ||
languages like Python. | ||
|
||
OpenTechSchool doesn't have specific workshops about SQL yet, although | ||
"Django 101" uses SQL for its databases. In the meantime you might | ||
want to check out Zed Shaw's book | ||
[Learn SQL The Hard Way](http://sql.learncodethehardway.org/) (free to | ||
read online.) | ||
|
||
|
||
# Pandas | ||
|
||
[Pandas](http://pandas.pydata.org/) is a suite of data analysis tools | ||
for Python, and it allows you to do more complex data modelling and | ||
analysis with Python. | ||
|
||
For this workshop we haven't needed Pandas, but if you're looking to | ||
use Python for a lot of numerical data analysis then you should look | ||
into it - there are tutorials linked from the homepage. Pandas can | ||
make complex tasks much easier to work with. | ||
|
||
Pandas also makes it easy to integrate with more complex data sources | ||
than simple text files. For example, here's [an IPython Notebook that | ||
uses Pandas to import data the Guardian published regarding the Gaza-Israel 2012 crisis](https://gistpynb.herokuapp.com/4121857). | ||
The Guardian publishes this data in the format of "Google Fusion | ||
Tables", and Pandas can read this format directly from the web. | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
--- | ||
|
||
layout: ots | ||
title: Open Data Sources | ||
|
||
--- | ||
|
||
In this workshop we a small dataset published by the | ||
[OpenFlights](http://openflights.org/) project. This data is published | ||
under the Open Database License, one of several open data licenses | ||
that grants rights to anyone who wants to use or redistribute the | ||
data. | ||
|
||
In recent years there has been a strong movement encouraging | ||
organisations to publish data openly on the web. As a result there are | ||
many public data repositories, both government and non-government, | ||
that you can source data from: | ||
|
||
|
||
* The | ||
[Google Public Data Explorer](http://www.google.com/publicdata/directory) | ||
indexes many public datasets and features an in-browser data explorer. | ||
You can also download the data to perform more in-depth analysis. | ||
|
||
* UK's Guardian Newspaper [Data Store](http://www.guardian.co.uk/data) | ||
provides a wide range of data and data-based analysis. | ||
|
||
* The [World Bank](http://data.worldbank.org/) publishes its data | ||
catalog online. | ||
|
||
* Numerous governments, including [Australia](http://data.gov.au/), | ||
the [European Union](http://ec.europa.eu/atoz_en.htm) and the | ||
[United States](http://www.data.gov/) have open dataset repositories. | ||
|
||
* Some countries are sponsoring open data "hackathons" to raise | ||
awareness and find new uses for their data, for instance | ||
[GovHack](http://www.govhack.org/) in Australia. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters