Module mwe

Module mwe 

Source
Expand description

Multi-Word Expression (MWE) processing.

Post-tokenization pipeline that collapses multi-token sequences into single semantic units (e.g., “fire engine” → FireEngine).

§How It Works

The MWE pipeline runs between lexing and parsing:

  1. Build a trie from known multi-word expressions
  2. Scan the token stream for matches using apply_mwe_pipeline
  3. Replace matched sequences with single tokens

§Supported MWE Types

  • Compound nouns: “fire engine”, “ice cream”
  • Phrasal verbs: “look up”, “give in”
  • Fixed phrases: “in order to”, “as well as”

§Key Functions

Structs§

MweTarget
MweTrie

Functions§

apply_mwe_pipeline
Apply MWE collapsing to a token stream. Matches on lemmas (not raw strings) to handle morphological variants.
build_mwe_trie
Build the MWE trie from lexicon data.