idank
diff --git a/‎.gitignore
+4 b/‎.gitignore
+4
diff --git a/‎Makefile
+7 b/‎Makefile
+7
diff --git a/‎README
+38 b/‎README
+38
diff --git a/‎TODO
+40 b/‎TODO
+40
diff --git a/‎Vagrantfile
+113 b/‎Vagrantfile
+113
diff --git a/‎dump/explainshell/classifier.bson
294 KB b/‎dump/explainshell/classifier.bson
294 KB
diff --git a/‎dump/explainshell/system.indexes.bson
235 Bytes b/‎dump/explainshell/system.indexes.bson
235 Bytes
diff --git a/‎explainshell/__init__.py b/‎explainshell/__init__.py
diff --git a/‎explainshell/algo/__init__.py
+2 b/‎explainshell/algo/__init__.py
+2
diff --git a/‎explainshell/algo/classifier.py
+113 b/‎explainshell/algo/classifier.py
+113
@@ -0,0 +1,4 @@
+*.pyc
+*.swp
+.coverage
+.vagrant
@@ -0,0 +1,7 @@
+tests:
+	nosetests --with-doctest tests/ explainshell/
+
+serve:
+	python runserver.py
+
+.PHONY: tests
@@ -0,0 +1,38 @@
+explainshell.com - match command-line arguments to their help text
+
+To get a working environment that lets you run the web interface locally, you'll need to:
+
+  $ pip install -r requirements.txt
+
+  # load classifier data
+  $ mongorestore dump/explainshell && mongorestore -d explainshell_tests dump/explainshell
+  $ make tests
+  .....................................................
+  ----------------------------------------------------------------------
+  Ran 53 tests in 2.847s
+
+  OK
+
+To add a gzipped man page, use the manager:
+
+  $ python explainshell/manager.py --log info echo
+  INFO:explainshell.store:creating store, db = 'explainshell_tests', host = 'mongodb://localhost'
+  INFO:explainshell.algo.classifier:train on 994 instances
+  INFO:explainshell.manager:handling manpage echo (from /tmp/es/manpages/1/echo.1.gz)
+  INFO:explainshell.store:looking up manpage in mapping with src 'echo'
+  INFO:explainshell.manpage:executing '/tmp/es/tools/w3mman2html.cgi local=%2Ftmp%2Fes%2Fmanpages%2F1%2Fecho.1.gz'
+  INFO:explainshell.algo.classifier:classified <paragraph 3, DESCRIPTION: '-n     do not output the trailing newlin'> (0.991381) as an option paragraph
+  INFO:explainshell.algo.classifier:classified <paragraph 4, DESCRIPTION: '-e     enable interpretation of backslash escape'> (0.996904) as an option paragraph
+  INFO:explainshell.algo.classifier:classified <paragraph 5, DESCRIPTION: '-E     disable interpretation of backslash escapes (default'> (0.998640) as an option paragraph
+  INFO:explainshell.algo.classifier:classified <paragraph 6, DESCRIPTION: '--help display this help and exi'> (0.999215) as an option paragraph
+  INFO:explainshell.algo.classifier:classified <paragraph 7, DESCRIPTION: '--version'> (0.999993) as an option paragraph
+  INFO:explainshell.store:inserting mapping (alias) echo -> echo (52207a1fa9b52e42fb59df36) with score 10
+  successfully added echo
+
+
+To start up a local web server:
+
+  $ make serve
+  python runserver.py
+   * Running on http://127.0.0.1:5000/
+   * Restarting with reloader
@@ -0,0 +1,40 @@
+- add support for pipes, redirections and other shell syntax
+- add JSON api
+- handle options of the form '-a|b' and 'a or b' (gspl-padd.1.gz)
+- check parsing of sshfs
+- print stats after processing a manpage such as: read x paragraphs, classified y out x, options found in z out of y
+- merge adjacent options with an unknown between them (can be done client side) ,e.g. node -cxc
+- mine fish shell completions to enrich options that expect an arg and weren't identified as such
+- rewrite options.py to use a DFA instead of a regex
+- handle -- that cuts args
+- handle long options abbreviations
+- handle args of them form -e something=something that gets interpreted as '-e', 'something', 'something
+- add 'no options extracted' message on /explain/<foo> if foo has no options
+- collapse positional arguments in options.html (see tee)
+- fix es.js '?' font (should be the default, not courier)
+- use cdn for d3, bootstrap, jquery
+
+ongoing
+
+- check xargs example explain
+
+done
+- add a debug view that shows all manpages - accessible at /debug
+- handle multiple manpages under the same name -- this is handled in the backend by
+  returning multiple manpages from store.findmanpage
+- adding a rating when explaining without a section so some manpages have priority over others (e.g. node(1) over node(8)), and when there's a tie perhaps return the one with most matched options? (more expensive)
+- handle short options values with no space, e.g. -w32
+- handle '-' arg
+- skip empty man pages
+- handle multiple lines returned from lexgrog -- we figure out aliases from each line
+- handle links generated by w3mman2html
+- don't join adjacent paragraphs in fixer across sections
+- set classifier threshold
+- handle aliases
+- add a feature for bold words (use script -c)
+- implement fixers
+- include the option in the extracted help text from the manpage
+
+abandoned
+- handle ANSI escape sequences elegantly
+- refactor feature utility functions
@@ -0,0 +1,113 @@
+# -*- mode: ruby -*-
+# vi: set ft=ruby :
+
+Vagrant.configure("2") do |config|
+  # All Vagrant configuration is done here. The most common configuration
+  # options are documented and commented below. For a complete reference,
+  # please see the online documentation at vagrantup.com.
+
+  # Every Vagrant virtual environment requires a box to build off of.
+  config.vm.box = "precise64"
+
+  # The url from where the 'config.vm.box' box will be fetched if it
+  # doesn't already exist on the user's system.
+  config.vm.box_url = "http://files.vagrantup.com/precise64.box"
+
+  # Create a forwarded port mapping which allows access to a specific port
+  # within the machine from a port on the host machine. In the example below,
+  # accessing "localhost:8080" will access port 80 on the guest machine.
+  config.vm.network :forwarded_port, guest: 80, host: 8080
+  config.vm.network :forwarded_port, guest: 27017, host: 27027
+
+  # Create a private network, which allows host-only access to the machine
+  # using a specific IP.
+  config.vm.network :private_network, ip: "192.168.33.10"
+
+  # Create a public network, which generally matched to bridged network.
+  # Bridged networks make the machine appear as another physical device on
+  # your network.
+  # config.vm.network :public_network
+
+  # Share an additional folder to the guest VM. The first argument is
+  # the path on the host to the actual folder. The second argument is
+  # the path on the guest to mount the folder. And the optional third
+  # argument is a set of non-required options.
+  # config.vm.synced_folder "../data", "/vagrant_data"
+
+  # Provider-specific configuration so you can fine-tune various
+  # backing providers for Vagrant. These expose provider-specific options.
+  # Example for VirtualBox:
+  #
+  # config.vm.provider :virtualbox do |vb|
+  #   # Don't boot with headless mode
+  #   vb.gui = true
+  #
+  #   # Use VBoxManage to customize the VM. For example to change memory:
+  #   vb.customize ["modifyvm", :id, "--memory", "1024"]
+  # end
+  #
+  # View the documentation for the provider you're using for more
+  # information on available options.
+
+  # Enable provisioning with Puppet stand alone.  Puppet manifests
+  # are contained in a directory path relative to this Vagrantfile.
+  # You will need to create the manifests directory and a manifest in
+  # the file precise64.pp in the manifests_path directory.
+  #
+  # An example Puppet manifest to provision the message of the day:
+  #
+  # # group { "puppet":
+  # #   ensure => "present",
+  # # }
+  # #
+  # # File { owner => 0, group => 0, mode => 0644 }
+  # #
+  # # file { '/etc/motd':
+  # #   content => "Welcome to your Vagrant-built virtual machine!
+  # #               Managed by Puppet.\n"
+  # # }
+  #
+  # config.vm.provision :puppet do |puppet|
+  #   puppet.manifests_path = "manifests"
+  #   puppet.manifest_file  = "init.pp"
+  # end
+
+  # Enable provisioning with chef solo, specifying a cookbooks path, roles
+  # path, and data_bags path (all relative to this Vagrantfile), and adding
+  # some recipes and/or roles.
+  #
+  # config.vm.provision :chef_solo do |chef|
+  #   chef.cookbooks_path = "../my-recipes/cookbooks"
+  #   chef.roles_path = "../my-recipes/roles"
+  #   chef.data_bags_path = "../my-recipes/data_bags"
+  #   chef.add_recipe "mysql"
+  #   chef.add_role "web"
+  #
+  #   # You may also specify custom JSON attributes:
+  #   chef.json = { :mysql_password => "foo" }
+  # end
+
+  # Enable provisioning with chef server, specifying the chef server URL,
+  # and the path to the validation key (relative to this Vagrantfile).
+  #
+  # The Opscode Platform uses HTTPS. Substitute your organization for
+  # ORGNAME in the URL and validation key.
+  #
+  # If you have your own Chef Server, use the appropriate URL, which may be
+  # HTTP instead of HTTPS depending on your configuration. Also change the
+  # validation key to validation.pem.
+  #
+  # config.vm.provision :chef_client do |chef|
+  #   chef.chef_server_url = "https://api.opscode.com/organizations/ORGNAME"
+  #   chef.validation_key_path = "ORGNAME-validator.pem"
+  # end
+  #
+  # If you're using the Opscode platform, your validator client is
+  # ORGNAME-validator, replacing ORGNAME with your organization name.
+  #
+  # If you have your own Chef Server, the default validation client name is
+  # chef-validator, unless you changed the configuration.
+  #
+  #   chef.validation_client_name = "ORGNAME-validator"
+  #config.ssh.default.private_key_path = "/home/idan/.ssh/id_rsa"
+end
@@ -0,0 +1,2 @@
+import explainshell.algo.features
+import explainshell.util
@@ -0,0 +1,113 @@
+import itertools, collections, logging
+
+import nltk
+import nltk.metrics
+import nltk.classify
+import nltk.classify.maxent
+
+from explainshell import store, algo, options, config, util
+
+logger = logging.getLogger(__name__)
+
+def get_features(paragraph):
+    features = {}
+    ptext = paragraph.cleantext()
+    assert ptext
+
+    features['starts_with_hyphen'] = algo.features.starts_with_hyphen(ptext)
+    features['is_indented'] = algo.features.is_indented(ptext)
+    features['par_length'] = algo.features.par_length(ptext)
+    for w in ('=', '--', '[', '|', ','):
+        features['first_line_contains_%s' % w] = algo.features.first_line_contains(ptext, w)
+    features['first_line_length'] = algo.features.first_line_length(ptext)
+    features['first_line_word_count'] = algo.features.first_line_word_count(ptext)
+    features['is_good_section'] = algo.features.is_good_section(paragraph)
+    features['word_count'] = algo.features.word_count(ptext)
+    #features['bold'] = algo.features.is_first_word_bold(manpage, paragraph.text)
+    #features['has_bold'] = algo.features.has_bold(paragraph.text)
+
+    return features
+
+class classifier(object):
+    '''classify the paragraphs of a man page as having command line options
+    or not'''
+    def __init__(self, store, algo, **classifier_args):
+        self.store = store
+        self.algo = algo
+        self.classifier_args = classifier_args
+        self.classifier = None
+
+    def train(self):
+        if self.classifier:
+            return
+
+        manpages = self.store.trainingset()
+
+        # flatten the manpages so we get a list of (manpage-name, paragraph)
+        def flatten_manpages(manpage):
+            l = []
+            for para in manpage.paragraphs:
+                l.append(para)
+            return l
+        paragraphs = itertools.chain(*[flatten_manpages(m) for m in manpages])
+        training = list(paragraphs)
+
+        negids = [p for p in training if not p.is_option]
+        posids = [p for p in training if p.is_option]
+
+        negfeats = [(get_features(p), False) for p in negids]
+        posfeats = [(get_features(p), True) for p in posids]
+
+        negcutoff = len(negfeats)*3/4
+        poscutoff = len(posfeats)*3/4
+
+        trainfeats = negfeats[:negcutoff] + posfeats[:poscutoff]
+        self.testfeats = negfeats[negcutoff:] + posfeats[poscutoff:]
+
+        logger.info('train on %d instances', len(trainfeats))
+
+        if self.algo == 'maxent':
+            c = nltk.classify.maxent.MaxentClassifier
+        elif self.algo == 'bayes':
+            c = nltk.classify.NaiveBayesClassifier
+        else:
+            raise ValueError('unknown classifier')
+
+        self.classifier = c.train(trainfeats, **self.classifier_args)
+
+    def evaluate(self):
+        self.train()
+        refsets = collections.defaultdict(set)
+        testsets = collections.defaultdict(set)
+
+        for i, (feats, label) in enumerate(self.testfeats):
+            refsets[label].add(i)
+            guess = self.classifier.prob_classify(feats)
+            observed = guess.max()
+            testsets[observed].add(i)
+            #if label != observed:
+            #    print 'label:', label, 'observed:', observed, feats
+
+        print 'pos precision:', nltk.metrics.precision(refsets[True], testsets[True])
+        print 'pos recall:', nltk.metrics.recall(refsets[True], testsets[True])
+        print 'neg precision:', nltk.metrics.precision(refsets[False], testsets[False])
+        print 'neg recall:', nltk.metrics.recall(refsets[False], testsets[False])
+
+        print self.classifier.show_most_informative_features(10)
+
+    def classify(self, manpage):
+        self.train()
+        for item in manpage.paragraphs:
+
+            features = get_features(item)
+            guess = self.classifier.prob_classify(features)
+            option = guess.max()
+            certainty = guess.prob(option)
+
+            if option:
+                if certainty < config.CLASSIFIER_CUTOFF:
+                    pass
+                else:
+                    logger.info('classified %s (%f) as an option paragraph', item, certainty)
+                    item.is_option = True
+                    yield certainty, item
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,2 @@`
	`1`	`+import explainshell.algo.features`
	`2`	`+import explainshell.util`