The Scipy stack includes...
- NumPy: Base N-dimensional array package
- SciPy library: Fundamental library for scientific computing
- Matplotlib: Comprehensive 2D Plotting (like ggplot2 for Python)
- IPython: Enhanced Interactive Console (notebook like R Studio or Mathematica)
- Sympy: Symbolic mathematics
- pandas: Data structures & analysis (like R for Python)
One easy sane thing to do is just install Anaconda, a big package of Python data science stuff that will manage dependencies and keep itself up-to-date reasonably well. You can keep it out of your PATH so it remains scoped / non-conflicting.
cd ~/anaconda/bin
ipython notebook
A couple fun things to try:
Pandas cookbook is a great introduction to the whole stack.
NLTK (natural language toolkit) is fun to try. Skim the book or docs.
E.g., as we were looking at on 8 May:
import nltk
nltk.download()
from nltk.book import *
text1.dispersion_plot(["whale","sea","captain","harpoon"])
text3.generate()
def lexical_richness(text):
return len(text)/len(set(text))
richness_map = [lexical_richness(x) for x in (text1, text2, text3)]
text7.collocations()
just_words = nltk.Text([x for x in text1 if x.isalpha()])
Naive Bayesian classifiers are hella cool and reasonably easy. (Discern gender from last letter in name, etc.)