Overview of search modes
SoundMinds.ai uses three complementary approaches to match customer questions with answers in your knowledge base:
- Semantic Search: Understands the meaning behind questions
- BM25 Search: Matches specific keywords and phrases
- Hybrid Search: Combines both for optimal results
By default, your chatbot uses hybrid search with a 70/30 weighting that balances semantic understanding with keyword precision.
Tip: The hybrid approach works well for most use cases. Understanding how each mode works helps you optimize your knowledge base content.
Semantic search explained
Semantic search understands the meaning of questions, not just the words used. It converts questions and answers into mathematical representations called embeddings that capture semantic relationships.
How semantic search works:
- When you generate embeddings, your Q&A pairs are converted into vector representations
- When a customer asks a question, it's converted to the same vector format
- The system finds Q&A pairs with vectors "close" to the question vector
- Closeness is measured by cosine similarity (0 to 1, higher is better)
Example of semantic matching:
If your knowledge base has:
Q: What are your business hours?
A: We're open Monday-Friday, 9am to 5pm EST.
Semantic search successfully matches questions like:
- "When do you open?"
- "What time do you close?"
- "Are you open on weekends?"
- "Can I reach you after 6pm?"
None of these contain the exact phrase "business hours," but semantic search understands they're asking the same thing.
Strength: Semantic search excels at handling paraphrases, synonyms, and questions phrased differently than your knowledge base content.
Important: Semantic search requires embeddings. Always click "Generate Embeddings" after adding or updating Q&A pairs.
BM25 keyword search explained
BM25 (Best Match 25) is a keyword-based ranking algorithm that finds Q&A pairs containing specific terms from the customer's question.
How BM25 works:
- Extracts keywords from the customer's question
- Searches your knowledge base for exact keyword matches
- Scores matches based on term frequency and rarity
- Prioritizes rare, specific terms over common words
Example of BM25 matching:
If a customer asks: "Do you support OAuth2 authentication?"
BM25 will prioritize Q&A pairs containing the specific terms:
- "OAuth2" (rare, technical term = high importance)
- "authentication" (specific domain term = medium importance)
- "support" (common word = lower importance)
When BM25 excels:
- Technical terms: Product codes, model numbers, technical jargon
- Exact phrases: Company names, feature names, specific terminology
- Acronyms: API, SSO, GDPR, etc.
- Proper nouns: Location names, brand names
Strength: BM25 ensures specific, technical terms get matched precisely, even if semantic models might generalize them.
BM25 limitations:
- Doesn't understand synonyms (e.g., "cost" vs "price")
- Can't handle paraphrasing
- Misses questions that express the same meaning with different words
Hybrid search: Best of both
Hybrid search combines semantic and BM25 scores to balance understanding and precision. This is the default mode for SoundMinds.ai chatbots.
Default weighting:
- 70% Semantic Score: Prioritizes meaning and intent
- 30% BM25 Score: Ensures specific keywords aren't missed
Why hybrid search works better:
Scenario 1: Customer asks "What's the fee for returns?"
- Semantic: Understands this is about return costs (matches "return policy" Q&A)
- BM25: Boosts results containing "returns" keyword
- Hybrid: Returns the correct "Return Policy" answer with high confidence
Scenario 2: Customer asks "Do you integrate with OAuth2?"
- Semantic: Understands this is about authentication integrations
- BM25: Strongly boosts the Q&A containing "OAuth2" (rare, specific term)
- Hybrid: Ensures the OAuth2-specific answer ranks higher than generic auth answers
Best Practice: Hybrid search works well for most knowledge bases. You can adjust the weights if your use case heavily favors either semantic understanding or keyword precision. See our Tuning Search Weights guide.
Understanding search scores
When you test questions in the Knowledge Base > Test tab, you'll see three scores for each matched Q&A pair:
Score types:
- Semantic Score (0.0 - 1.0): How well the meaning matches
- BM25 Score (varies): Keyword matching strength (normalized for display)
- Combined Score (0.0 - 1.0): Weighted blend (70% semantic + 30% BM25)
Interpreting scores:
- 0.85 - 1.0: Excellent match (high confidence answer)
- 0.70 - 0.84: Good match (likely correct answer)
- 0.50 - 0.69: Fair match (may need clarification)
- Below 0.50: Weak match (consider adding a new Q&A pair)
Tip: Your chatbot's minimum confidence threshold determines which scores trigger direct answers vs. fallback responses. See Troubleshooting Search for details.
Example from test results:
Question: "How do I reset my password?"
Match #1:
Question: "What is the password reset process?"
Semantic: 0.92 (strong semantic match)
BM25: 0.78 (good keyword overlap: "password", "reset")
Combined: 0.89 (0.92 × 0.7 + 0.78 × 0.3 = 0.89)
Status: ✓ High confidence answer
When each mode excels
Semantic search is best for:
- General customer support questions
- Questions phrased in many different ways
- Natural, conversational language
- Questions using synonyms or paraphrases
BM25 is best for:
- Technical documentation
- Product-specific terminology
- Model numbers, SKUs, product codes
- Industry-specific acronyms and jargon
Hybrid search excels at:
- Mixed content (general + technical)
- Broad customer support use cases
- Balancing precision with flexibility
- Most real-world chatbot applications
Recommendation: Start with hybrid search (70/30 default). Monitor your chatbot's performance and adjust weights only if you notice consistent issues with specific question types.
Testing your search modes
Use the Knowledge Base > Test tab to understand how search modes rank your Q&A pairs.
Testing best practices:
- Test variations: Try different ways customers might phrase the same question
- Check all three scores: See how semantic, BM25, and combined scores differ
- Test edge cases: Try questions with technical terms, acronyms, or slang
- Review rankings: Ensure the best match appears first
What to look for:
- High semantic, low BM25: Question uses different words but same meaning
- Low semantic, high BM25: Exact keyword match, but possibly different intent
- High combined score: Strong match on both dimensions (best outcome)
- Low combined score: May need a new Q&A pair or better phrasing
Pro Tip: Save common test questions as a checklist. Re-test after adding new Q&A pairs to ensure you haven't negatively impacted existing matches.