This is now called version 0.99.0.

Lots of changes happened since the last version, see the CHANGELOG for details. git-svn-id: https://pet.opendfki.de/repos/pet/main@222 4200e16c-5112-0410-ac55-d7fb557a720a
delph-in · Oct 4, 2004 · 1804dc6 · 1804dc6
1 parent 2ea8791
commit 1804dc6
Show file tree

Hide file tree

Showing 136 changed files with 10,638 additions and 4,701 deletions.
diff --git a/BUGS b/BUGS
@@ -0,0 +1,9 @@
+- the bound on the number of inflection rules (setting max-inflections) does 
+  not work
+- flop is not able to dump cyclic structures
+- Berthold: packing vs. Relativsaetze (what is the exact error?)
+
+- wrong/no characterization when unfilling is used
+  :-( No clean way to implement this; in fact, characterization should be 
+  implemented in the grammar, not in the processing engine.
+
diff --git a/CHANGELOG b/CHANGELOG
@@ -0,0 +1,88 @@
+v0.99.0
+  Since this is the first entry, change descriptions are quite coarse
+  grained. This should maybe change in the future.
+
+  - Added doxygen compatible comments to most of the .h files, and some
+    comments to source files too.
+  - New input/lexical processing stage to allow more modularization and
+    flexible exchange of tokenization, morphology, etc.
+  - "japanese multiword bug" fixed
+  - Application of inflection and lexical rules can now be completed before any
+    syntactical processing takes place (which might be beneficial for 
+    chart dependencies in german)
+  - fixed bug in acyclic transitive reduction in the boost version of flop
+  - expansion failures (flop) now report the failure path
+
+  - XML input mode (also as an replacement for the whiteboard version)
+  - first version of fragmentary results in case of parse failure, maybe needs
+    more flexibility for better heuristics.
+  - activation of packing without restrictor setting does no longer lead to a
+    segmentation fault; packing is simply not activated.
+  - translation of iso chars to isomorphix in YY input mode
+  - incr(tsdb[]) file dump mode
+  - version string now included in flop and cheap binaries. version number is
+    printed with usage information
+  - printer for hierarchies in VCG tool style, can be used in cheap and flop
+  - support for dynamic symbols
+  - dag_expand now does the job correctly using a scheme similar to delta
+    expansion.
+  - moved the whole agenda code into the .h file with the hope of some positive
+    inlining effects (and, besides, to get rid of another file).
+  - more flexible restrictor functionality
+
+  - lots of minor cleanup issues
+  - first attempt to CHANGELOG, TODO, BUGS, version.h 
+
+Done previously (from old ToDo file, partially redundant)
+
++- XML input mode
+  + complete DTD specification (Uli S. and me did this)
+  + build SAX parser
+  + supersedes integration of bernd's (whiteboard) version
+
++ fragmentary results in case of parse failure (v1.0)
+  Fragmentsuche/-ausgabe im Falle von Parse-failures
+
++ integrate ecls LISP (seems to work now)
+  + unfilling in PET leads to wrong / incomplete results (German grammar)
+    re-expansion (dag_expand) gefixt
+  + packing/unpacking does characterization too
+
++ leda -> boost migration done and checked
+
++ CFROM/CTO fix: toplevel errors
+
++ bei packing ohne packing-restrictor: segfault, jetzt: Warning & disable
+
++ Nullfehler bei MRS muss Ausgabe produzieren
++ YY-mode macht kein translate-iso-chars
+
++ Schreiben von TSDB-Tabellen aus PET
+  + Erzeugen von item, parse und result tabellen, wenn PET in der HOG laeuft.
+    yy.cpp ausschlachten: TSDBFILEAPI !!
+  + Optionsbeschreibung einbauen
++ Counts fuer lexikalische Ambiguitaet
+
++ correct sorting of results according to score
++ -results=n option to get only the best n results
++ fullform-morphology gibt beim Drucken den Stem mit raus
++ Restricting the number of inflection rule applications
+
++ positions and counts for YY and XML tokenizer
++ perforce main branch auf den neuesten Stand bringen:
+  raus:
+     cheap:
+        agenda.cpp inputchart.cpp/h inputtoken.cpp/h chartpositions.h
+        tokenizer.cpp/h parser.cpp/h mrs.cpp/h  
+     common:
+        errors.cpp
+
+  neu:
+     cheap:
+        xmlparser* xml-tokenizer* pic-handler* pic-states.h lexparser.*
+     common:
+        hashing.h vcg_print.h version.h
+
++ japanese multiword bug (requires input chart redesign)
++? implement mrs/rmrs code - processor interface ?Is this implemented or not?
+
diff --git a/Makefile b/Makefile
@@ -0,0 +1,17 @@
+all: flop cheap doc
+
+flop:
+	cd flop
+	make flop
+
+cheap:
+	cd cheap
+	make cheap
+
+doc: flopdoc cheapdoc
+
+flopdoc:
+	doxygen doxyconfig.flop
+
+cheapdoc:
+	doxygen doxyconfig.cheap
diff --git a/README b/README
@@ -32,20 +32,16 @@ Binaries for Linux are provided. To run the binaries you need the
 Compiling it
 ============
 
-If you want to compile yourself, you additionally need the LEDA
-library (for flop). The build system used for PET is Jam from perforce
-(which is also included in Boost). gcc version 2.95.3. is the only
-tested compiler for flop, for cheap gcc 2.95.3 and gcc 3.2.2 both work
-fine. cheap has also been compiled using Borland C++ under Windows and
-KCC for Linux and Solaris, so porting it to a new compiler should not
-be too hard.
+If you want to compile yourself, you additionally need the boost
+library (for flop). The build system used for PET is gnu `make'.
+gcc/g++ version > 3.1.2 are known to work fine. 
 
-For unit tests and source code documentation you can optionally use
-cppunit and doxygen. Note that so far only small parts of the system
-take advantage of this.
+Doxygen compatible documentation is included in most header files and some of 
+the source files. Call 'make' in the root directory to build the documentation.
 
-A version of the preprocessor which does not depend on LEDA (but uses the
-free Boost library instead) will be available shortly.
+For unit tests and source code documentation you can optionally use
+cppunit. Note that so far only small parts of the system take advantage of
+this.
 
 External Components
 ===================
@@ -67,6 +63,9 @@ http://cppunit.sourceforge.net/
 doxygen:
 http://www.doxygen.org/
 
+When using (optional) XML input mode: Apache xerces C++ library:
+http://xml.apache.org/xerces-c/
+
 Layout of the sources
 =====================
 

diff --git a/TODO b/TODO
@@ -1,50 +1,86 @@
+- build process: auto(config|make)
+
+- cheap dynamic library / API
+
+- flop returns zero even in the presence of errors like non-unique feature
+  introduction
+- Better error handling in flop for use with external applications
+- emacs compatible error messages for flop ?
+
+- Documentation
+  - flop & cheap user doc
+  - missing header file documentation (oe, please help here, if possible)
+    itsdb.h, extdict.h, psqllex.h, tsdb++.h
+
+- unreleased memory? (see valgrind-errors-15-apr-04)
+
+- more flexible way to do selection of generic entries, e.g., based only on a 
+  (highly scored) subset of POS, or combined clues from morphology
+
+- more flexible heuristics / better selection of partial results
+
+- separate switches for unification and subsumption quickcheck computation
+
 - cleaning up:
   - option handling
-  - YY references; split yy.cpp module into seperate modules
-  - runtime selection of online-morphology vs full-forms
+  - YY references; split yy.cpp module into separate modules
+    + yy_tokenizer removed from yy.cpp
+    - server mode still unused, yy.cpp/h should become socket.cpp/h
+  + runtime selection of online-morphology vs full-forms
   - logging / debugging info: get rid of global verbosity,
-    implement some central logging facility
-
-- build process
-  - autoconfig
-  - version.h mechanism
+    implement some central logging facility (take log4cxx)
 
 - complete lexical database (postgres) integration
-- integrate silo
 
-- integrate ecls LISP
-- implement mrs/rmrs code - processor interface
-
-- leda -> boost migration
-
-- lsl completion - minimal
+- integrate silo
 
-- integrate bernd's (whiteboard) version
+- lsl completion - minimal ?? What does that mean ?
 
 - scoring:
   - offline scoring
-  - simplified model for compatibility 
+  - simplified model for compatibility ??  What does that mean ?
 
 - packing:
   - fix & integrate subsumption quickcheck
     currently, it gives incorrect results for non-existing paths
-  - generalise restrictor
+  +- generalise restrictor: new restrictor interface implemented
   - simplify/optimise subsume
   - subtype caching
   - re-enable unfilling as far as possible 
 
-- documentation
-
-- japanese multiword bug (requires input chart redesign)
-
 - defaults
 
 - generator
 
 - whenever dag_get_path_value is called, structure should be filled, at least
   under that path.
 
-- apply chart dependencies after lexical processing
++ apply chart dependencies after lexical processing
+  - chart dependencies after lex lookup (1) AND lex processing (2)
+  -+ still to be tested, Berthold will try it
 
 - extend chart dependencies to allow a dependency to be conditioned on
-  a specified path-value pair.  
+  a specified path-value pair. chart dependencies could take a variety of
+  forms: (OP could be unifies, subsumes, is_subsumed_by, equals)
+  - val(path1) OP val(path2)
+  - val(path1) OP const1 && val(path2) OP const2
+  - val(path1) OP const1 && val(path1) OP val(path2)
+  - val(path1) OP val(path2) && val(path2) OP const2  (??)
+
+- restrictors that are paths instead of features
+
+refactoring:
+- make tAgenda a template
+- make the unification engine(s) more modular
+- better decoupling of the dag allocation mechanism
+- replace item print routines by item printers where possible
+
+- diagnostic messages for errors in the MRS construction
+- performance loss compared zu ~kiefer/duo/public/pet-730.tgz is 30% --
+  because of the data structures in the chart that are necessary for packing,
+  like _Cp_span? this has to be checked.
+
+- performance loss flop Leda vs. flop boost: seems to stem from a huge amount 
+  of minor page faults. How can the code be found that is responsible for this
+  behaviour ?
+