Class QuestionMatcher
java.lang.Object
org.ek9lang.assist.QuestionMatcher
BM25F-based fuzzy matcher for Q&A questions.
Uses field-weighted BM25 scoring across five zones (keywords, question text,
alternate phrasings, answer text, migration context) with Levenshtein-based
fuzzy matching integrated as fractional term frequency.
IDF (inverse document frequency) ensures rare, discriminating terms
contribute more to the score than common ones.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic final recordA question paired with its BM25F match score and question-text relevance. -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescription(package private) static doubleapplyRelevanceBoost(double score, int relevance, int inputTokenCount) Boost BM25F score based on how many original query tokens appear in the question title.(package private) doublebm25fScore(Map<String, Double> weightedTokens, QuestionAndAnswer candidate, Map<String, Double> idfMap, double[] avgFieldLengths) Compute BM25F score for a query against a candidate question.findBestMatches(String input, QuestionRegistry registry) Find the best matching questions using default max results (3).findBestMatches(String input, QuestionRegistry registry, int maxResults) Find the best matching questions for the given input text.findByCategory(String input, QuestionRegistry registry) Find a category whose name fuzzy-matches the input.findScoredMatches(String input, QuestionRegistry registry, int maxResults) Find best matches with scores, so callers can distinguish best from also-relevant.(package private) static doublefuzzyScore(int levenshteinCost) (package private) intquestionRelevance(Set<String> inputTokens, QuestionAndAnswer candidate) Count how many input tokens match words in the canonical question text.
-
Constructor Details
-
QuestionMatcher
public QuestionMatcher() -
QuestionMatcher
-
-
Method Details
-
findBestMatches
public List<QuestionAndAnswer> findBestMatches(String input, QuestionRegistry registry, int maxResults) Find the best matching questions for the given input text. Uses BM25F scoring with field weights, IDF, and concept-based query expansion. -
findBestMatches
Find the best matching questions using default max results (3). -
findScoredMatches
public List<QuestionMatcher.ScoredQuestion> findScoredMatches(String input, QuestionRegistry registry, int maxResults) Find best matches with scores, so callers can distinguish best from also-relevant. Higher score means better match (BM25F convention). -
applyRelevanceBoost
static double applyRelevanceBoost(double score, int relevance, int inputTokenCount) Boost BM25F score based on how many original query tokens appear in the question title. Prevents concept expansion from drowning specific matches under broad generic Q&As. A query "dynamic class" matching a title "How do dynamic classes work?" gets boosted over "Naming conventions for types" which matches only via concept expansion. -
bm25fScore
double bm25fScore(Map<String, Double> weightedTokens, QuestionAndAnswer candidate, Map<String, Double> idfMap, double[] avgFieldLengths) Compute BM25F score for a query against a candidate question. Higher score means better match. Scores each query token across five weighted fields (keywords, question text, alternates, answer text, migration context), combines with IDF weighting, applies BM25 term frequency saturation, and scales by concept ring weight. -
questionRelevance
Count how many input tokens match words in the canonical question text. Used as a tiebreaker: questions whose title directly mentions the query terms are more relevant than questions where those terms appear only as secondary keywords. -
findByCategory
Find a category whose name fuzzy-matches the input. Returns the matching category's questions, or empty list if no match. -
fuzzyScore
static double fuzzyScore(int levenshteinCost)
-