LS-GRAM developed grammatical resources for nine European languages: Danish, Dutch, English, French, German, Greek, Italian, Portuguese, and Spanish on the basis of the Advanced Language Engineering Platform (ALEP). Grammar development was based on corpus investigations in order to determine a realistic coverage of grammars.The LS-GRAM project addressed the need for high-quality, large-scale grammatical resources as a basis for advanced NLP-technology.

The general goal of LS-GRAM was to narrow down the gap between the scientific concept of unification grammar and real industrial applications. LS-GRAM thus was an exercise in large-scale grammar engineering, accommodating methodological requirements as well as building the prototype of a `realistic'system where `realistic' is defined by a set of requirements regarding efficiency, completeness of modules and coverage of grammars.

Results that were achieved in the Project are: There are lingware modules for nine EU-languages, the majority of them including a text handling component covering a broad range of phenomena. There are two-level morphology and word structure components for most languages (covering phenomena related to inflectional morphology), as well as syntactic components and components of semantic interpretation for nine languages, implemented on the same platform according to the same principles, and thus standardized to a great extent. All modules cover core linguistic phenomena; some of them come quite close to test corpus coverage. All this comes with a huge body of test suites and very detailed documentation.Within the project, also some tools and devices related to the project were developed or improved. Tools were integrated with ALEP, paths defined for reuse of other resources (e.g. lexical resources), and small-scale demonstrators developed.

Some examples:

CELEX to ALEP compiler: This is a tool that converts lexical resources for Dutch from CELEX to ALEP macros.

Railway Messages Information Extraction Tool: This is a tagger written in Perl which converts Dutch Railway Teletext into a format processable by ProFIT. The semantic output is presented in a table.

GramCheck: This is a grammar checker based on the ALEP software. It is developed for Spanish using Spanish LS-GRAM grammatical resources.

Interactive tagger: This is a tagger integrated into the ALEP system. It allows for intervention by the user after text handling. (Classification of expressions as proper names e.g.).

MPRO: This is a tagger (not statistics-based) and a search tool for syntactic structures in German texts. It also has basic German-English MT functionalities: Bilingual glossary and MT based on shallow syntactic analysis.


January 1994 - July 1996