RFC 005 - OLiA Mapper Extraction And Graph-Based Intent Inference Completion
Status
- Status: requested
- Scope: completion of the OLiA architecture in the
ontobdc runintent-resolution flow - Primary surface:
resolution_to_parsed,resolution_to_validated, RDF parsed-intent semantics, and graph-based intent inference
Purpose
This RFC proposes the completion of the current OLiA integration already present in the runtime.
The goal is to move from a partially embedded OLiA usage model to a clearer architectural boundary with:
- a dedicated linguistic mapper
- a more explicit semantic contract for parsed intent items
- graph-based inference for intent classification during
PARSED -> VALIDATED
This RFC builds directly on the implemented baseline documented in:
Context
The current runtime already uses OLiA in meaningful ways.
Today, the system already:
- maps selected spaCy
POSandPronType=Intsignals to OLiA URIs - persists those URIs as
rdf:typeinside nestedparsedIntentnodes incontext.rdf - restores those URIs during RDF round-trip
- uses OLiA-derived signals in
resolution_to_validatedto filter query and action capabilities
That baseline is valuable, but it remains incomplete because:
- the mapping logic is still embedded in
ResolutionToParsedCapability - the semantic contract is only partial
- capability filtering still depends on direct in-memory heuristics rather than graph-native inference
- the current design does not yet provide a reusable OLiA abstraction for future languages
Motivation
The project should avoid hardcoded interrogative word lists and language-specific textual shortcuts as more languages are added.
The current OLiA baseline already points in the right direction, but the architecture is not yet complete enough to serve as the universal linguistic layer envisioned by the design.
The missing pieces matter because they affect:
- extensibility across languages
- reuse of the mapping logic outside a single capability
- auditability of linguistic inference rules
- the ability to evolve from local heuristics to graph-native semantic queries
Proposal
Complete the OLiA integration in three coordinated steps:
- extract a dedicated
OliaLinguisticMapper - formalize an OLiA-first parsed-intent contract
- replace direct validation heuristics with graph-based intent inference
Proposed Step 1 - Extract OliaLinguisticMapper
The current inline MAPPER_OLIA and _resolve_olia_token() logic should be extracted into a dedicated component.
Suggested responsibility of OliaLinguisticMapper:
- receive a spaCy token
- inspect
POSand morphological features - resolve the best matching OLiA URI
- expose a stable, reusable contract for future parsers and languages
The first version may still remain spaCy-oriented, but the mapping logic should no longer be buried inside ResolutionToParsedCapability.
This extraction should make it easier to:
- expand mapping coverage
- test linguistic mapping independently
- reuse the mapper from multiple runtime surfaces
Proposed Step 2 - Formalize The Semantic Contract
The parsed intent model should evolve from a mixed textual structure with optional semantic typing into a clearer semantic contract.
This does not require deleting all raw textual fields immediately.
However, the decision-making contract should become OLiA-first.
That means:
- OLiA typing becomes the primary semantic signal for intent reasoning
- raw text remains auxiliary for debugging, traceability, and human inspection
- the RDF representation documents semantic class membership explicitly and consistently
The project may keep fields such as:
itemTextitemPositemLemmaitemHead
but runtime reasoning should progressively stop depending on those raw values when a semantic alternative is available.
Proposed Step 3 - Graph-Based Intent Inference
The current PARSED -> VALIDATED reasoning should move away from direct in-memory heuristics over token dictionaries.
The target is graph-based inference over the parsed-intent structure already materialized in RDF.
The first completion version should support reasoning such as:
- detect whether a parsed item is typed as
olia:InterrogativePronoun - detect whether that item is linked to the root structure in a query-relevant way
- infer whether the intent is query-oriented or action-oriented using semantic graph patterns
The query mechanism may be implemented through:
- SPARQL over the loaded graph
- or an equivalent graph-query abstraction with the same semantic effect
The important contract is not the syntax itself.
The important contract is that intent classification becomes graph-native and ontology-oriented instead of ad hoc dictionary inspection.
Proposed Runtime Boundary
The intended runtime shape becomes:
- raw user input
- spaCy parsing
OliaLinguisticMapperassigns semantic classes- parsed intent is materialized in RDF with explicit OLiA typing
- the
PARSED -> VALIDATEDlayer infers query versus action semantics through graph-native reasoning - capability filtering consumes that inference result
This RFC does not propose moving that reasoning into ContextIntentEvaluatorAdapter as it currently exists.
That adapter is presently a lifecycle-state detector.
The target boundary is the logic that decides candidate capability class during the PARSED -> VALIDATED transition, whether that remains in resolution_to_validated or is delegated to a dedicated query service.
Constraints
The completed OLiA architecture should:
- remain deterministic
- remain local and auditable
- avoid language-specific hardcoded word lists as the primary decision mechanism
- keep graph semantics inspectable in
context.rdf - remain compatible with the current nested RDF materialization model
- be testable independently at mapper level and transition-inference level
It should not:
- introduce opaque statistical inference as the primary linguistic decision layer
- bypass the deterministic capability-selection flow
- depend on remote linguistic services
- turn raw text into the main source of truth when OLiA typing is available
Expected Impact
If implemented, this RFC would likely affect:
src/ontobdc/run/plugin/capability/resolution_to_parsed.pysrc/ontobdc/run/plugin/capability/resolution_to_validated.pysrc/ontobdc/run/adapter/machine.py- a new dedicated OLiA mapping module or adapter
Likely new or updated tests:
- mapper-level tests for
POSand morphology to OLiA URI resolution - RDF round-trip tests for expanded OLiA coverage
- validation tests proving graph-based query versus action inference
- language-extensibility tests that avoid hardcoded interrogative word lists
Likely documentation impact:
- SPEC009_olia_semantic_intent_typing.md
docs/documentation/spec/SPEC006_run_cli_context_resolution.md- future test documentation for parser and validator coverage
Correlation With SPEC009
SPEC009 documents what already exists.
This RFC proposes what should be added or refactored next.
The main correlation is:
SPEC009: inline OLiA mapping inResolutionToParsedCapability-
RFC005: extract that logic intoOliaLinguisticMapper -
SPEC009: parsed items carry both raw text and OLiAuri -
RFC005: keep both if needed, but make OLiA the primary reasoning contract -
SPEC009: capability filtering uses direct URI and head-text heuristics -
RFC005: move to graph-native semantic inference -
SPEC009:ContextIntentEvaluatorAdapteronly detects materialized runtime state RFC005: keep state evaluation separate and evolve thePARSED -> VALIDATEDreasoning boundary instead
The intended migration rule is:
- treat
SPEC009as the baseline that must remain true until code changes land - update
SPEC009after each accepted part of this RFC becomes current behavior
Open Questions
- Should graph-native reasoning be implemented directly with SPARQL, or wrapped behind a query adapter?
- What is the minimum OLiA class coverage required before the mapper is considered reusable across languages?
- Should dependency heads remain textual, or should they also evolve toward semantic node references?
- Should the inferred intent class be materialized explicitly in
context.rdf, or remain derived at runtime? - Which additional OLiA classes are most valuable after interrogative pronouns and verbs?
Follow-Up
If accepted, the next step should be to define:
- the interface of
OliaLinguisticMapper - the first expanded mapping table and test corpus
- the graph query contract for query-versus-action inference
- the migration plan from direct heuristics to graph-native reasoning
- the exact RDF invariants that must hold after the refactor