Skip to content

Commit

Permalink
This is now called version 0.99.0.
Browse files Browse the repository at this point in the history
       Lots of changes happened since the last version, see the CHANGELOG for
       details.


git-svn-id: https://pet.opendfki.de/repos/pet/main@222 4200e16c-5112-0410-ac55-d7fb557a720a
  • Loading branch information
kiefer committed Oct 4, 2004
1 parent 2ea8791 commit 1804dc6
Show file tree
Hide file tree
Showing 136 changed files with 10,638 additions and 4,701 deletions.
9 changes: 9 additions & 0 deletions BUGS
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
- the bound on the number of inflection rules (setting max-inflections) does
not work
- flop is not able to dump cyclic structures
- Berthold: packing vs. Relativsaetze (what is the exact error?)

- wrong/no characterization when unfilling is used
:-( No clean way to implement this; in fact, characterization should be
implemented in the grammar, not in the processing engine.

88 changes: 88 additions & 0 deletions CHANGELOG
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
v0.99.0
Since this is the first entry, change descriptions are quite coarse
grained. This should maybe change in the future.

- Added doxygen compatible comments to most of the .h files, and some
comments to source files too.
- New input/lexical processing stage to allow more modularization and
flexible exchange of tokenization, morphology, etc.
- "japanese multiword bug" fixed
- Application of inflection and lexical rules can now be completed before any
syntactical processing takes place (which might be beneficial for
chart dependencies in german)
- fixed bug in acyclic transitive reduction in the boost version of flop
- expansion failures (flop) now report the failure path

- XML input mode (also as an replacement for the whiteboard version)
- first version of fragmentary results in case of parse failure, maybe needs
more flexibility for better heuristics.
- activation of packing without restrictor setting does no longer lead to a
segmentation fault; packing is simply not activated.
- translation of iso chars to isomorphix in YY input mode
- incr(tsdb[]) file dump mode
- version string now included in flop and cheap binaries. version number is
printed with usage information
- printer for hierarchies in VCG tool style, can be used in cheap and flop
- support for dynamic symbols
- dag_expand now does the job correctly using a scheme similar to delta
expansion.
- moved the whole agenda code into the .h file with the hope of some positive
inlining effects (and, besides, to get rid of another file).
- more flexible restrictor functionality

- lots of minor cleanup issues
- first attempt to CHANGELOG, TODO, BUGS, version.h

Done previously (from old ToDo file, partially redundant)

+- XML input mode
+ complete DTD specification (Uli S. and me did this)
+ build SAX parser
+ supersedes integration of bernd's (whiteboard) version

+ fragmentary results in case of parse failure (v1.0)
Fragmentsuche/-ausgabe im Falle von Parse-failures

+ integrate ecls LISP (seems to work now)
+ unfilling in PET leads to wrong / incomplete results (German grammar)
re-expansion (dag_expand) gefixt
+ packing/unpacking does characterization too

+ leda -> boost migration done and checked

+ CFROM/CTO fix: toplevel errors

+ bei packing ohne packing-restrictor: segfault, jetzt: Warning & disable

+ Nullfehler bei MRS muss Ausgabe produzieren
+ YY-mode macht kein translate-iso-chars

+ Schreiben von TSDB-Tabellen aus PET
+ Erzeugen von item, parse und result tabellen, wenn PET in der HOG laeuft.
yy.cpp ausschlachten: TSDBFILEAPI !!
+ Optionsbeschreibung einbauen
+ Counts fuer lexikalische Ambiguitaet

+ correct sorting of results according to score
+ -results=n option to get only the best n results
+ fullform-morphology gibt beim Drucken den Stem mit raus
+ Restricting the number of inflection rule applications

+ positions and counts for YY and XML tokenizer
+ perforce main branch auf den neuesten Stand bringen:
raus:
cheap:
agenda.cpp inputchart.cpp/h inputtoken.cpp/h chartpositions.h
tokenizer.cpp/h parser.cpp/h mrs.cpp/h
common:
errors.cpp

neu:
cheap:
xmlparser* xml-tokenizer* pic-handler* pic-states.h lexparser.*
common:
hashing.h vcg_print.h version.h

+ japanese multiword bug (requires input chart redesign)
+? implement mrs/rmrs code - processor interface ?Is this implemented or not?

17 changes: 17 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
all: flop cheap doc

flop:
cd flop
make flop

cheap:
cd cheap
make cheap

doc: flopdoc cheapdoc

flopdoc:
doxygen doxyconfig.flop

cheapdoc:
doxygen doxyconfig.cheap
23 changes: 11 additions & 12 deletions README
Original file line number Diff line number Diff line change
Expand Up @@ -32,20 +32,16 @@ Binaries for Linux are provided. To run the binaries you need the
Compiling it
============

If you want to compile yourself, you additionally need the LEDA
library (for flop). The build system used for PET is Jam from perforce
(which is also included in Boost). gcc version 2.95.3. is the only
tested compiler for flop, for cheap gcc 2.95.3 and gcc 3.2.2 both work
fine. cheap has also been compiled using Borland C++ under Windows and
KCC for Linux and Solaris, so porting it to a new compiler should not
be too hard.
If you want to compile yourself, you additionally need the boost
library (for flop). The build system used for PET is gnu `make'.
gcc/g++ version > 3.1.2 are known to work fine.

For unit tests and source code documentation you can optionally use
cppunit and doxygen. Note that so far only small parts of the system
take advantage of this.
Doxygen compatible documentation is included in most header files and some of
the source files. Call 'make' in the root directory to build the documentation.

A version of the preprocessor which does not depend on LEDA (but uses the
free Boost library instead) will be available shortly.
For unit tests and source code documentation you can optionally use
cppunit. Note that so far only small parts of the system take advantage of
this.

External Components
===================
Expand All @@ -67,6 +63,9 @@ http://cppunit.sourceforge.net/
doxygen:
http://www.doxygen.org/

When using (optional) XML input mode: Apache xerces C++ library:
http://xml.apache.org/xerces-c/

Layout of the sources
=====================

Expand Down
82 changes: 59 additions & 23 deletions TODO
Original file line number Diff line number Diff line change
@@ -1,50 +1,86 @@
- build process: auto(config|make)

- cheap dynamic library / API

- flop returns zero even in the presence of errors like non-unique feature
introduction
- Better error handling in flop for use with external applications
- emacs compatible error messages for flop ?

- Documentation
- flop & cheap user doc
- missing header file documentation (oe, please help here, if possible)
itsdb.h, extdict.h, psqllex.h, tsdb++.h

- unreleased memory? (see valgrind-errors-15-apr-04)

- more flexible way to do selection of generic entries, e.g., based only on a
(highly scored) subset of POS, or combined clues from morphology

- more flexible heuristics / better selection of partial results

- separate switches for unification and subsumption quickcheck computation

- cleaning up:
- option handling
- YY references; split yy.cpp module into seperate modules
- runtime selection of online-morphology vs full-forms
- YY references; split yy.cpp module into separate modules
+ yy_tokenizer removed from yy.cpp
- server mode still unused, yy.cpp/h should become socket.cpp/h
+ runtime selection of online-morphology vs full-forms
- logging / debugging info: get rid of global verbosity,
implement some central logging facility

- build process
- autoconfig
- version.h mechanism
implement some central logging facility (take log4cxx)

- complete lexical database (postgres) integration
- integrate silo

- integrate ecls LISP
- implement mrs/rmrs code - processor interface

- leda -> boost migration

- lsl completion - minimal
- integrate silo

- integrate bernd's (whiteboard) version
- lsl completion - minimal ?? What does that mean ?

- scoring:
- offline scoring
- simplified model for compatibility
- simplified model for compatibility ?? What does that mean ?

- packing:
- fix & integrate subsumption quickcheck
currently, it gives incorrect results for non-existing paths
- generalise restrictor
+- generalise restrictor: new restrictor interface implemented
- simplify/optimise subsume
- subtype caching
- re-enable unfilling as far as possible

- documentation

- japanese multiword bug (requires input chart redesign)

- defaults

- generator

- whenever dag_get_path_value is called, structure should be filled, at least
under that path.

- apply chart dependencies after lexical processing
+ apply chart dependencies after lexical processing
- chart dependencies after lex lookup (1) AND lex processing (2)
-+ still to be tested, Berthold will try it

- extend chart dependencies to allow a dependency to be conditioned on
a specified path-value pair.
a specified path-value pair. chart dependencies could take a variety of
forms: (OP could be unifies, subsumes, is_subsumed_by, equals)
- val(path1) OP val(path2)
- val(path1) OP const1 && val(path2) OP const2
- val(path1) OP const1 && val(path1) OP val(path2)
- val(path1) OP val(path2) && val(path2) OP const2 (??)

- restrictors that are paths instead of features

refactoring:
- make tAgenda a template
- make the unification engine(s) more modular
- better decoupling of the dag allocation mechanism
- replace item print routines by item printers where possible

- diagnostic messages for errors in the MRS construction
- performance loss compared zu ~kiefer/duo/public/pet-730.tgz is 30% --
because of the data structures in the chart that are necessary for packing,
like _Cp_span? this has to be checked.

- performance loss flop Leda vs. flop boost: seems to stem from a huge amount
of minor page faults. How can the code be found that is responsible for this
behaviour ?

Loading

0 comments on commit 1804dc6

Please sign in to comment.