Skip to content

Commit 7ea0acc

Browse files
committed
initial commit
Unfortunately I may or may not have had sensitive data in the old repo, so gotta start fresh.
0 parents  commit 7ea0acc

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

73 files changed

+32750
-0
lines changed

.gitignore

+4
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
*.pyc
2+
*.swp
3+
.coverage
4+
.vagrant

Makefile

+7
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
tests:
2+
nosetests --with-doctest tests/ explainshell/
3+
4+
serve:
5+
python runserver.py
6+
7+
.PHONY: tests

README

+38
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
explainshell.com - match command-line arguments to their help text
2+
3+
To get a working environment that lets you run the web interface locally, you'll need to:
4+
5+
$ pip install -r requirements.txt
6+
7+
# load classifier data
8+
$ mongorestore dump/explainshell && mongorestore -d explainshell_tests dump/explainshell
9+
$ make tests
10+
.....................................................
11+
----------------------------------------------------------------------
12+
Ran 53 tests in 2.847s
13+
14+
OK
15+
16+
To add a gzipped man page, use the manager:
17+
18+
$ python explainshell/manager.py --log info echo
19+
INFO:explainshell.store:creating store, db = 'explainshell_tests', host = 'mongodb://localhost'
20+
INFO:explainshell.algo.classifier:train on 994 instances
21+
INFO:explainshell.manager:handling manpage echo (from /tmp/es/manpages/1/echo.1.gz)
22+
INFO:explainshell.store:looking up manpage in mapping with src 'echo'
23+
INFO:explainshell.manpage:executing '/tmp/es/tools/w3mman2html.cgi local=%2Ftmp%2Fes%2Fmanpages%2F1%2Fecho.1.gz'
24+
INFO:explainshell.algo.classifier:classified <paragraph 3, DESCRIPTION: '-n do not output the trailing newlin'> (0.991381) as an option paragraph
25+
INFO:explainshell.algo.classifier:classified <paragraph 4, DESCRIPTION: '-e enable interpretation of backslash escape'> (0.996904) as an option paragraph
26+
INFO:explainshell.algo.classifier:classified <paragraph 5, DESCRIPTION: '-E disable interpretation of backslash escapes (default'> (0.998640) as an option paragraph
27+
INFO:explainshell.algo.classifier:classified <paragraph 6, DESCRIPTION: '--help display this help and exi'> (0.999215) as an option paragraph
28+
INFO:explainshell.algo.classifier:classified <paragraph 7, DESCRIPTION: '--version'> (0.999993) as an option paragraph
29+
INFO:explainshell.store:inserting mapping (alias) echo -> echo (52207a1fa9b52e42fb59df36) with score 10
30+
successfully added echo
31+
32+
33+
To start up a local web server:
34+
35+
$ make serve
36+
python runserver.py
37+
* Running on http://127.0.0.1:5000/
38+
* Restarting with reloader

TODO

+40
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
- add support for pipes, redirections and other shell syntax
2+
- add JSON api
3+
- handle options of the form '-a|b' and 'a or b' (gspl-padd.1.gz)
4+
- check parsing of sshfs
5+
- print stats after processing a manpage such as: read x paragraphs, classified y out x, options found in z out of y
6+
- merge adjacent options with an unknown between them (can be done client side) ,e.g. node -cxc
7+
- mine fish shell completions to enrich options that expect an arg and weren't identified as such
8+
- rewrite options.py to use a DFA instead of a regex
9+
- handle -- that cuts args
10+
- handle long options abbreviations
11+
- handle args of them form -e something=something that gets interpreted as '-e', 'something', 'something
12+
- add 'no options extracted' message on /explain/<foo> if foo has no options
13+
- collapse positional arguments in options.html (see tee)
14+
- fix es.js '?' font (should be the default, not courier)
15+
- use cdn for d3, bootstrap, jquery
16+
17+
ongoing
18+
19+
- check xargs example explain
20+
21+
done
22+
- add a debug view that shows all manpages - accessible at /debug
23+
- handle multiple manpages under the same name -- this is handled in the backend by
24+
returning multiple manpages from store.findmanpage
25+
- adding a rating when explaining without a section so some manpages have priority over others (e.g. node(1) over node(8)), and when there's a tie perhaps return the one with most matched options? (more expensive)
26+
- handle short options values with no space, e.g. -w32
27+
- handle '-' arg
28+
- skip empty man pages
29+
- handle multiple lines returned from lexgrog -- we figure out aliases from each line
30+
- handle links generated by w3mman2html
31+
- don't join adjacent paragraphs in fixer across sections
32+
- set classifier threshold
33+
- handle aliases
34+
- add a feature for bold words (use script -c)
35+
- implement fixers
36+
- include the option in the extracted help text from the manpage
37+
38+
abandoned
39+
- handle ANSI escape sequences elegantly
40+
- refactor feature utility functions

Vagrantfile

+113
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,113 @@
1+
# -*- mode: ruby -*-
2+
# vi: set ft=ruby :
3+
4+
Vagrant.configure("2") do |config|
5+
# All Vagrant configuration is done here. The most common configuration
6+
# options are documented and commented below. For a complete reference,
7+
# please see the online documentation at vagrantup.com.
8+
9+
# Every Vagrant virtual environment requires a box to build off of.
10+
config.vm.box = "precise64"
11+
12+
# The url from where the 'config.vm.box' box will be fetched if it
13+
# doesn't already exist on the user's system.
14+
config.vm.box_url = "http://files.vagrantup.com/precise64.box"
15+
16+
# Create a forwarded port mapping which allows access to a specific port
17+
# within the machine from a port on the host machine. In the example below,
18+
# accessing "localhost:8080" will access port 80 on the guest machine.
19+
config.vm.network :forwarded_port, guest: 80, host: 8080
20+
config.vm.network :forwarded_port, guest: 27017, host: 27027
21+
22+
# Create a private network, which allows host-only access to the machine
23+
# using a specific IP.
24+
config.vm.network :private_network, ip: "192.168.33.10"
25+
26+
# Create a public network, which generally matched to bridged network.
27+
# Bridged networks make the machine appear as another physical device on
28+
# your network.
29+
# config.vm.network :public_network
30+
31+
# Share an additional folder to the guest VM. The first argument is
32+
# the path on the host to the actual folder. The second argument is
33+
# the path on the guest to mount the folder. And the optional third
34+
# argument is a set of non-required options.
35+
# config.vm.synced_folder "../data", "/vagrant_data"
36+
37+
# Provider-specific configuration so you can fine-tune various
38+
# backing providers for Vagrant. These expose provider-specific options.
39+
# Example for VirtualBox:
40+
#
41+
# config.vm.provider :virtualbox do |vb|
42+
# # Don't boot with headless mode
43+
# vb.gui = true
44+
#
45+
# # Use VBoxManage to customize the VM. For example to change memory:
46+
# vb.customize ["modifyvm", :id, "--memory", "1024"]
47+
# end
48+
#
49+
# View the documentation for the provider you're using for more
50+
# information on available options.
51+
52+
# Enable provisioning with Puppet stand alone. Puppet manifests
53+
# are contained in a directory path relative to this Vagrantfile.
54+
# You will need to create the manifests directory and a manifest in
55+
# the file precise64.pp in the manifests_path directory.
56+
#
57+
# An example Puppet manifest to provision the message of the day:
58+
#
59+
# # group { "puppet":
60+
# # ensure => "present",
61+
# # }
62+
# #
63+
# # File { owner => 0, group => 0, mode => 0644 }
64+
# #
65+
# # file { '/etc/motd':
66+
# # content => "Welcome to your Vagrant-built virtual machine!
67+
# # Managed by Puppet.\n"
68+
# # }
69+
#
70+
# config.vm.provision :puppet do |puppet|
71+
# puppet.manifests_path = "manifests"
72+
# puppet.manifest_file = "init.pp"
73+
# end
74+
75+
# Enable provisioning with chef solo, specifying a cookbooks path, roles
76+
# path, and data_bags path (all relative to this Vagrantfile), and adding
77+
# some recipes and/or roles.
78+
#
79+
# config.vm.provision :chef_solo do |chef|
80+
# chef.cookbooks_path = "../my-recipes/cookbooks"
81+
# chef.roles_path = "../my-recipes/roles"
82+
# chef.data_bags_path = "../my-recipes/data_bags"
83+
# chef.add_recipe "mysql"
84+
# chef.add_role "web"
85+
#
86+
# # You may also specify custom JSON attributes:
87+
# chef.json = { :mysql_password => "foo" }
88+
# end
89+
90+
# Enable provisioning with chef server, specifying the chef server URL,
91+
# and the path to the validation key (relative to this Vagrantfile).
92+
#
93+
# The Opscode Platform uses HTTPS. Substitute your organization for
94+
# ORGNAME in the URL and validation key.
95+
#
96+
# If you have your own Chef Server, use the appropriate URL, which may be
97+
# HTTP instead of HTTPS depending on your configuration. Also change the
98+
# validation key to validation.pem.
99+
#
100+
# config.vm.provision :chef_client do |chef|
101+
# chef.chef_server_url = "https://api.opscode.com/organizations/ORGNAME"
102+
# chef.validation_key_path = "ORGNAME-validator.pem"
103+
# end
104+
#
105+
# If you're using the Opscode platform, your validator client is
106+
# ORGNAME-validator, replacing ORGNAME with your organization name.
107+
#
108+
# If you have your own Chef Server, the default validation client name is
109+
# chef-validator, unless you changed the configuration.
110+
#
111+
# chef.validation_client_name = "ORGNAME-validator"
112+
#config.ssh.default.private_key_path = "/home/idan/.ssh/id_rsa"
113+
end

dump/explainshell/classifier.bson

294 KB
Binary file not shown.

dump/explainshell/system.indexes.bson

235 Bytes
Binary file not shown.

explainshell/__init__.py

Whitespace-only changes.

explainshell/algo/__init__.py

+2
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
import explainshell.algo.features
2+
import explainshell.util

explainshell/algo/classifier.py

+113
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,113 @@
1+
import itertools, collections, logging
2+
3+
import nltk
4+
import nltk.metrics
5+
import nltk.classify
6+
import nltk.classify.maxent
7+
8+
from explainshell import store, algo, options, config, util
9+
10+
logger = logging.getLogger(__name__)
11+
12+
def get_features(paragraph):
13+
features = {}
14+
ptext = paragraph.cleantext()
15+
assert ptext
16+
17+
features['starts_with_hyphen'] = algo.features.starts_with_hyphen(ptext)
18+
features['is_indented'] = algo.features.is_indented(ptext)
19+
features['par_length'] = algo.features.par_length(ptext)
20+
for w in ('=', '--', '[', '|', ','):
21+
features['first_line_contains_%s' % w] = algo.features.first_line_contains(ptext, w)
22+
features['first_line_length'] = algo.features.first_line_length(ptext)
23+
features['first_line_word_count'] = algo.features.first_line_word_count(ptext)
24+
features['is_good_section'] = algo.features.is_good_section(paragraph)
25+
features['word_count'] = algo.features.word_count(ptext)
26+
#features['bold'] = algo.features.is_first_word_bold(manpage, paragraph.text)
27+
#features['has_bold'] = algo.features.has_bold(paragraph.text)
28+
29+
return features
30+
31+
class classifier(object):
32+
'''classify the paragraphs of a man page as having command line options
33+
or not'''
34+
def __init__(self, store, algo, **classifier_args):
35+
self.store = store
36+
self.algo = algo
37+
self.classifier_args = classifier_args
38+
self.classifier = None
39+
40+
def train(self):
41+
if self.classifier:
42+
return
43+
44+
manpages = self.store.trainingset()
45+
46+
# flatten the manpages so we get a list of (manpage-name, paragraph)
47+
def flatten_manpages(manpage):
48+
l = []
49+
for para in manpage.paragraphs:
50+
l.append(para)
51+
return l
52+
paragraphs = itertools.chain(*[flatten_manpages(m) for m in manpages])
53+
training = list(paragraphs)
54+
55+
negids = [p for p in training if not p.is_option]
56+
posids = [p for p in training if p.is_option]
57+
58+
negfeats = [(get_features(p), False) for p in negids]
59+
posfeats = [(get_features(p), True) for p in posids]
60+
61+
negcutoff = len(negfeats)*3/4
62+
poscutoff = len(posfeats)*3/4
63+
64+
trainfeats = negfeats[:negcutoff] + posfeats[:poscutoff]
65+
self.testfeats = negfeats[negcutoff:] + posfeats[poscutoff:]
66+
67+
logger.info('train on %d instances', len(trainfeats))
68+
69+
if self.algo == 'maxent':
70+
c = nltk.classify.maxent.MaxentClassifier
71+
elif self.algo == 'bayes':
72+
c = nltk.classify.NaiveBayesClassifier
73+
else:
74+
raise ValueError('unknown classifier')
75+
76+
self.classifier = c.train(trainfeats, **self.classifier_args)
77+
78+
def evaluate(self):
79+
self.train()
80+
refsets = collections.defaultdict(set)
81+
testsets = collections.defaultdict(set)
82+
83+
for i, (feats, label) in enumerate(self.testfeats):
84+
refsets[label].add(i)
85+
guess = self.classifier.prob_classify(feats)
86+
observed = guess.max()
87+
testsets[observed].add(i)
88+
#if label != observed:
89+
# print 'label:', label, 'observed:', observed, feats
90+
91+
print 'pos precision:', nltk.metrics.precision(refsets[True], testsets[True])
92+
print 'pos recall:', nltk.metrics.recall(refsets[True], testsets[True])
93+
print 'neg precision:', nltk.metrics.precision(refsets[False], testsets[False])
94+
print 'neg recall:', nltk.metrics.recall(refsets[False], testsets[False])
95+
96+
print self.classifier.show_most_informative_features(10)
97+
98+
def classify(self, manpage):
99+
self.train()
100+
for item in manpage.paragraphs:
101+
102+
features = get_features(item)
103+
guess = self.classifier.prob_classify(features)
104+
option = guess.max()
105+
certainty = guess.prob(option)
106+
107+
if option:
108+
if certainty < config.CLASSIFIER_CUTOFF:
109+
pass
110+
else:
111+
logger.info('classified %s (%f) as an option paragraph', item, certainty)
112+
item.is_option = True
113+
yield certainty, item

0 commit comments

Comments
 (0)