org.ek9lang.assist.QuestionMatcher

public class QuestionMatcher extends Object

BM25F-based fuzzy matcher for Q&A questions. Uses field-weighted BM25 scoring across five zones (keywords, question text, alternate phrasings, answer text, migration context) with Levenshtein-based fuzzy matching integrated as fractional term frequency. IDF (inverse document frequency) ensures rare, discriminating terms contribute more to the score than common ones.

Nested Class Summary

Nested Classes

Modifier and Type

Class

Description

static final record

QuestionMatcher.ScoredQuestion

A question paired with its BM25F match score and question-text relevance.
Constructor Summary

Constructors

Constructor

Description

QuestionMatcher()

QuestionMatcher(ConceptRegistry conceptRegistry)
Method Summary

Modifier and Type

Method

Description

(package private) static double

applyRelevanceBoost(double score, int relevance, int inputTokenCount)

Boost BM25F score based on how many original query tokens appear in the question title.

(package private) double

bm25fScore(Map<String,Double> weightedTokens, QuestionAndAnswer candidate, Map<String,Double> idfMap, double[] avgFieldLengths)

Compute BM25F score for a query against a candidate question.

List<QuestionAndAnswer>

findBestMatches(String input, QuestionRegistry registry)

Find the best matching questions using default max results (3).

List<QuestionAndAnswer>

findBestMatches(String input, QuestionRegistry registry, int maxResults)

Find the best matching questions for the given input text.

List<QuestionAndAnswer>

findByCategory(String input, QuestionRegistry registry)

Find a category whose name fuzzy-matches the input.

List<QuestionMatcher.ScoredQuestion>

findScoredMatches(String input, QuestionRegistry registry, int maxResults)

Find best matches with scores, so callers can distinguish best from also-relevant.

(package private) static double

fuzzyScore(int levenshteinCost)

(package private) int

questionRelevance(Set<String> inputTokens, QuestionAndAnswer candidate)

Count how many input tokens match words in the canonical question text.

Methods inherited from class Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Details
- QuestionMatcher
  
  public QuestionMatcher()
- QuestionMatcher
  
  public QuestionMatcher(ConceptRegistry conceptRegistry)
Method Details
- findBestMatches
  
  public List<QuestionAndAnswer> findBestMatches(String input, QuestionRegistry registry, int maxResults)
  
  Find the best matching questions for the given input text. Uses BM25F scoring with field weights, IDF, and concept-based query expansion.
- findBestMatches
  
  public List<QuestionAndAnswer> findBestMatches(String input, QuestionRegistry registry)
  
  Find the best matching questions using default max results (3).
- findScoredMatches
  
  public List<QuestionMatcher.ScoredQuestion> findScoredMatches(String input, QuestionRegistry registry, int maxResults)
  
  Find best matches with scores, so callers can distinguish best from also-relevant. Higher score means better match (BM25F convention).
- applyRelevanceBoost
  
  static double applyRelevanceBoost(double score, int relevance, int inputTokenCount)
  
  Boost BM25F score based on how many original query tokens appear in the question title. Prevents concept expansion from drowning specific matches under broad generic Q&As. A query "dynamic class" matching a title "How do dynamic classes work?" gets boosted over "Naming conventions for types" which matches only via concept expansion.
- bm25fScore
  
  double bm25fScore(Map<String,Double> weightedTokens, QuestionAndAnswer candidate, Map<String,Double> idfMap, double[] avgFieldLengths)
  
  Compute BM25F score for a query against a candidate question. Higher score means better match. Scores each query token across five weighted fields (keywords, question text, alternates, answer text, migration context), combines with IDF weighting, applies BM25 term frequency saturation, and scales by concept ring weight.
- questionRelevance
  
  int questionRelevance(Set<String> inputTokens, QuestionAndAnswer candidate)
  
  Count how many input tokens match words in the canonical question text. Used as a tiebreaker: questions whose title directly mentions the query terms are more relevant than questions where those terms appear only as secondary keywords.
- findByCategory
  
  public List<QuestionAndAnswer> findByCategory(String input, QuestionRegistry registry)
  
  Find a category whose name fuzzy-matches the input. Returns the matching category's questions, or empty list if no match.
- fuzzyScore
  
  static double fuzzyScore(int levenshteinCost)

Class QuestionMatcher

Nested Class Summary

Constructor Summary

Method Summary

Methods inherited from class Object

Constructor Details

QuestionMatcher

QuestionMatcher

Method Details

findBestMatches

findBestMatches

findScoredMatches

applyRelevanceBoost

bm25fScore

questionRelevance

findByCategory

fuzzyScore