Module lexer

Module lexer 

Source
Expand description

Two-stage lexer for LOGOS natural language input.

The lexer transforms natural language text into a token stream suitable for parsing. It operates in two stages:

§Stage 1: Line Lexer

The LineLexer handles structural concerns:

  • Indentation: Tracks indent levels, emits Indent/Dedent tokens
  • Block boundaries: Identifies significant whitespace
  • Content extraction: Passes line content to Stage 2

§Stage 2: Word Lexer

The Lexer performs word-level tokenization:

  • Vocabulary lookup: Identifies words via the lexicon database
  • Morphological analysis: Handles inflection (verb tenses, plurals)
  • Ambiguity resolution: Uses priority rules for ambiguous words

§Ambiguity Rules

When a word matches multiple lexicon entries, priority determines the token:

  1. Quantifiers over nouns (“some” → Quantifier, not Noun)
  2. Determiners over adjectives (“the” → Determiner, not Adjective)
  3. Verbs over nouns for -ing/-ed forms (“running” → Verb)

§Example

Input:  "Every cat sleeps."
Output: [Quantifier("every"), Noun("cat"), Verb("sleeps"), Period]

Structs§

Lexer
LineLexer
Stage 1 Lexer: Handles only lines, indentation, and structural tokens. Treats all other text as opaque Content for the Stage 2 WordLexer.

Enums§

LexerMode
LineToken
Tokens emitted by the LineLexer (Stage 1). Handles structural tokens (Indent, Dedent, Newline) while treating all other content as opaque for Stage 2 word classification.