Add some material for the two initial extras sections

OpenTechSchool · Jun 4, 2013 · c9baaf6 · c9baaf6
1 parent 381cdd6
commit c9baaf6
Show file tree

Hide file tree

Showing 4 changed files with 112 additions and 7 deletions.
diff --git a/_config.yml b/_config.yml
@@ -32,9 +32,9 @@ map:
  - title: Extras
    caption: Additional workshop content
    subpages:
-   - title: The Pandas library
-     path: /extras/pandas.html
-     caption: An introduction to using Pandas for data analysis.
+   - title: Alternative Approaches
+     path: /extras/alternatives.html
+     caption: Some other ways to store and process data.
    - title: Open Data
      path: /extras/opendata.html
-     caption: The Open Data movement and some places to find open data sets.
+     caption: Some places to find open data sets.
diff --git a/extras/alternatives.md b/extras/alternatives.md
@@ -0,0 +1,64 @@
+---
+
+layout: ots
+title: Alternative Approaches
+
+---
+
+In this workshop we've mostly looked at processing unstructured data
+from plain text files. This is a very common and simple way to come
+across data, but it's not the only way and it's not always the easiest
+way to work with data.
+
+# Relational Databases
+
+Relational databases provide a way of storing data in tables with
+relationships between them. Many large organisations and websites
+store their data in some kind of relational database.
+
+For instance, the OpenFlights data we worked with in the
+[CSV](../core/csv.html) chapter is almost certainly exported from a
+relational database of some kind. The relational database holds all of
+its data in tables, for instance the "airports" table hold all the
+airports and the "routes" table would hold all the airline routes.
+
+However the relational database also holds *relations* between
+different kinds of data - for example it can know that all airline
+routes in the routes table contain references to a source and a
+destination airport, and that these airports should exist in the
+airports table.
+
+We often use a query language called SQL to retrieve information from
+a relational database. For example, here is a made-up SQL query to
+count the number of airports in Russia:
+
+    SELECT COUNT(*) FROM airports WHERE country = 'Russia';
+
+You can integrate SQL queries into other general purpose programming
+languages like Python.
+
+OpenTechSchool doesn't have specific workshops about SQL yet, although
+"Django 101" uses SQL for its databases. In the meantime you might
+want to check out Zed Shaw's book
+[Learn SQL The Hard Way](http://sql.learncodethehardway.org/) (free to
+read online.)
+
+
+# Pandas
+
+[Pandas](http://pandas.pydata.org/) is a suite of data analysis tools
+for Python, and it allows you to do more complex data modelling and
+analysis with Python.
+
+For this workshop we haven't needed Pandas, but if you're looking to
+use Python for a lot of numerical data analysis then you should look
+into it - there are tutorials linked from the homepage. Pandas can
+make complex tasks much easier to work with.
+
+Pandas also makes it easy to integrate with more complex data sources
+than simple text files. For example, here's [an IPython Notebook that
+uses Pandas to import data the Guardian published regarding the Gaza-Israel 2012 crisis](https://gistpynb.herokuapp.com/4121857).
+The Guardian publishes this data in the format of "Google Fusion
+Tables", and Pandas can read this format directly from the web.
+
+
diff --git a/extras/opendata.md b/extras/opendata.md
@@ -0,0 +1,37 @@
+---
+
+layout: ots
+title: Open Data Sources
+
+---
+
+In this workshop we a small dataset published by the
+[OpenFlights](http://openflights.org/) project. This data is published
+under the Open Database License, one of several open data licenses
+that grants rights to anyone who wants to use or redistribute the
+data.
+
+In recent years there has been a strong movement encouraging
+organisations to publish data openly on the web. As a result there are
+many public data repositories, both government and non-government,
+that you can source data from:
+
+
+* The
+[Google Public Data Explorer](http://www.google.com/publicdata/directory)
+indexes many public datasets and features an in-browser data explorer.
+You can also download the data to perform more in-depth analysis.
+
+* UK's Guardian Newspaper [Data Store](http://www.guardian.co.uk/data)
+provides a wide range of data and data-based analysis.
+
+* The [World Bank](http://data.worldbank.org/) publishes its data
+catalog online.
+
+* Numerous governments, including [Australia](http://data.gov.au/),
+the [European Union](http://ec.europa.eu/atoz_en.htm) and the
+[United States](http://www.data.gov/) have open dataset repositories.
+
+* Some countries are sponsoring open data "hackathons" to raise
+awareness and find new uses for their data, for instance
+[GovHack](http://www.govhack.org/) in Australia.
diff --git a/index.md b/index.md
@@ -29,10 +29,14 @@ workshop then that will be perfect.
 
 # Extra fun stuff
 
-* [The Pandas library](extras/pandas.html) - An introduction to using Pandas for data analysis.
+* [Alternative Approaches](extras/alternatives.html) - Other ways to store and process data (Pandas, SQL databases.)
 
-* [Open Data](extras/opendata.html) - About Open Data.
+* [Open Data Sources](extras/opendata.html) Data.
 
 # Reference material
 
-* TODO
+* [IPython NBViewer home page](http://nbviewer.ipython.org/)
+
+* [IPython Notebook gallery](https://github.com/ipython/ipython/wiki/A-gallery-of-interesting-IPython-Notebooks) 
+* [matplotlib gallery](http://matplotlib.org/gallery.html)
+