Class QuestionMatcher

java.lang.Object
org.ek9lang.assist.QuestionMatcher

public class QuestionMatcher extends Object
BM25F-based fuzzy matcher for Q&A questions. Uses field-weighted BM25 scoring across five zones (keywords, question text, alternate phrasings, answer text, migration context) with Levenshtein-based fuzzy matching integrated as fractional term frequency. IDF (inverse document frequency) ensures rare, discriminating terms contribute more to the score than common ones.
  • Constructor Details

    • QuestionMatcher

      public QuestionMatcher()
    • QuestionMatcher

      public QuestionMatcher(ConceptRegistry conceptRegistry)
  • Method Details

    • findBestMatches

      public List<QuestionAndAnswer> findBestMatches(String input, QuestionRegistry registry, int maxResults)
      Find the best matching questions for the given input text. Uses BM25F scoring with field weights, IDF, and concept-based query expansion.
    • findBestMatches

      public List<QuestionAndAnswer> findBestMatches(String input, QuestionRegistry registry)
      Find the best matching questions using default max results (3).
    • findScoredMatches

      public List<QuestionMatcher.ScoredQuestion> findScoredMatches(String input, QuestionRegistry registry, int maxResults)
      Find best matches with scores, so callers can distinguish best from also-relevant. Higher score means better match (BM25F convention).
    • applyRelevanceBoost

      static double applyRelevanceBoost(double score, int relevance, int inputTokenCount)
      Boost BM25F score based on how many original query tokens appear in the question title. Prevents concept expansion from drowning specific matches under broad generic Q&As. A query "dynamic class" matching a title "How do dynamic classes work?" gets boosted over "Naming conventions for types" which matches only via concept expansion.
    • bm25fScore

      double bm25fScore(Map<String,Double> weightedTokens, QuestionAndAnswer candidate, Map<String,Double> idfMap, double[] avgFieldLengths)
      Compute BM25F score for a query against a candidate question. Higher score means better match. Scores each query token across five weighted fields (keywords, question text, alternates, answer text, migration context), combines with IDF weighting, applies BM25 term frequency saturation, and scales by concept ring weight.
    • questionRelevance

      int questionRelevance(Set<String> inputTokens, QuestionAndAnswer candidate)
      Count how many input tokens match words in the canonical question text. Used as a tiebreaker: questions whose title directly mentions the query terms are more relevant than questions where those terms appear only as secondary keywords.
    • findByCategory

      public List<QuestionAndAnswer> findByCategory(String input, QuestionRegistry registry)
      Find a category whose name fuzzy-matches the input. Returns the matching category's questions, or empty list if no match.
    • fuzzyScore

      static double fuzzyScore(int levenshteinCost)