Tuning Search Weights for Better Answers

When to adjust weights

The default 70/30 (semantic/BM25) weighting works well for most chatbots. Consider adjusting weights only if you notice consistent patterns of incorrect matches.

Signs you might need to adjust weights:

Technical terms are consistently missed: Your chatbot confuses specific technical questions
Exact keyword matches are ignored: Questions with product codes or model numbers don't match correctly
Overly broad matching: Questions match answers that are semantically similar but contextually wrong
Paraphrases fail: Customers rephrase questions and get no answer, even though you have the information

Important: Always test with real customer questions before adjusting weights. A few isolated issues may indicate knowledge base problems, not weight problems.

Understanding default weights

SoundMinds.ai uses a 70% semantic + 30% BM25 weighting by default. This balance prioritizes understanding while ensuring specific keywords aren't ignored.

How weights affect scoring:

Combined Score = (Semantic Score × 0.7) + (BM25 Score × 0.3)

Example:
  Semantic Score: 0.85
  BM25 Score: 0.60
  Combined: (0.85 × 0.7) + (0.60 × 0.3) = 0.595 + 0.18 = 0.775

What the default weighting means:

70% semantic: Meaning and intent matter most
30% BM25: Specific keywords provide a significant boost
A Q&A pair with strong keyword matches can overtake one with slightly better semantic similarity
Prevents purely semantic matching from ignoring important technical terms

Tip: The 70/30 split works well for general customer support. Only adjust if you have data showing consistent issues with a specific question type.

Increasing semantic weight

Increase the semantic weight (e.g., 80/20 or 85/15) when your knowledge base focuses on conversational questions with varied phrasing.

When to use higher semantic weight:

General customer support: FAQs, policies, procedures
Non-technical content: Questions about hours, locations, pricing, returns
Varied customer language: Customers phrase the same question many different ways
Minimal jargon: Few product codes, model numbers, or technical acronyms

Example scenario:

Problem: Your chatbot has trouble matching varied phrasings of the same question.

Knowledge base Q&A:

Q: What is your return policy?
A: We accept returns within 30 days of purchase...

Customer questions that should match:

"Can I send this back?"
"What if I don't like it?"
"How do I get a refund?"

Solution: Increase semantic weight to 80/20 or 85/15 so meaning takes precedence over exact keyword matches.

Recommended weights for general support: 80% semantic + 20% BM25

Trade-offs:

Benefit: Better handling of paraphrases and synonyms
Risk: May ignore important specific terms or product codes

Increasing BM25 weight

Increase the BM25 weight (e.g., 50/50 or 40/60) when your knowledge base contains technical content with specific terminology that must match exactly.

When to use higher BM25 weight:

Technical documentation: API references, integration guides, system specs
Product catalogs: SKUs, model numbers, part codes
Compliance content: Specific regulatory terms, legal language
Industry jargon: Medical, legal, financial, or technical fields with specialized vocabulary

Example scenario:

Problem: Customers asking about specific products get generic answers.

Knowledge base has multiple similar Q&As:

Q: How do I configure OAuth2 authentication?
A: [OAuth2-specific instructions]

Q: How do I configure SAML authentication?
A: [SAML-specific instructions]

Q: How do I configure API key authentication?
A: [API key-specific instructions]

Customer asks: "How do I set up OAuth2?"

With 70/30 weighting, semantic similarity might match all three equally well (they're all about authentication setup).

Solution: Increase BM25 weight to 50/50 or 40/60 so the exact term "OAuth2" strongly boosts the correct answer.

Recommended weights for technical content: 50% semantic + 50% BM25 (balanced) or 40% semantic + 60% BM25 (keyword-focused)

Trade-offs:

Benefit: Ensures specific technical terms and product codes match precisely
Risk: May miss paraphrases or questions using different terminology

Important: High BM25 weights work best when your Q&A questions include the exact technical terms customers use. Ensure your knowledge base uses consistent terminology.

Testing weight adjustments

Before changing weights in production, test your adjustments thoroughly using the Knowledge Base test tool.

Testing methodology:

Collect real questions: Gather 20-30 actual customer questions from your chatbot logs or support tickets
Test with current weights: Run each question and record which Q&A pair matches and with what score
Adjust weights: Change the semantic/BM25 balance in your chatbot settings
Re-test questions: Run the same questions and compare results
Evaluate improvements: Count how many questions now match better vs. worse

What to measure:

Correct matches improved: Questions that now match the right answer
Correct matches degraded: Questions that were correct but now match wrong
Score changes: Combined scores that increased or decreased significantly
Ranking changes: Cases where the best answer moved from 1st to 2nd place (or vice versa)

Success Criteria: Only deploy new weights if you see net improvement (more questions fixed than broken) and no critical regressions.

Example test comparison:

Question: "Do you support OAuth2?"

70/30 Weighting (current):
  Match #1: "What authentication methods are supported?" (0.78)
  Match #2: "How do I configure OAuth2?" (0.75) ← Should be #1

50/50 Weighting (proposed):
  Match #1: "How do I configure OAuth2?" (0.82) ← Fixed!
  Match #2: "What authentication methods are supported?" (0.71)

Result: Improvement ✓

Common use cases

E-commerce / Retail (80/20 - semantic heavy):

Customers ask about shipping, returns, sizing in many ways
Conversational language varies widely
Few technical terms to match

SaaS Product Support (70/30 - default balanced):

Mix of conversational and technical questions
Some feature names and UI elements to match exactly
Balance between flexibility and precision

API Documentation (50/50 or 40/60 - keyword heavy):

Heavy use of technical terms and endpoint names
Developers use specific, precise language
Need to distinguish between similar endpoints

Healthcare / Medical (60/40 - moderate semantic):

Medical terminology must match precisely
Patients use varied language for symptoms
Balance between professional terms and lay language

Manufacturing / Parts Catalog (40/60 - keyword heavy):

SKUs, model numbers, part codes critical
Exact product identification required
Technical specifications use precise terminology

Monitoring performance

After adjusting weights, monitor your chatbot's performance to ensure the changes have the desired effect.

Metrics to track:

Match rate: Percentage of questions that find a Q&A match above the confidence threshold
Average confidence score: Mean combined score across all matched questions
Fallback rate: How often the chatbot uses fallback messages vs. direct answers
Customer satisfaction: Ratings or feedback on answer quality

Warning signs:

Match rate drops: Fewer questions matching above threshold
Increased fallbacks: More "I don't know" responses
Lower satisfaction: Customers report irrelevant answers
Support tickets increase: More questions escalate to human agents

Important: If any metrics significantly worsen after adjusting weights, revert to the previous configuration and reassess your testing methodology.

Continuous improvement:

Review chatbot analytics monthly
Collect questions that matched poorly
Determine if weight adjustments or knowledge base improvements are needed
Test changes in a controlled way
Deploy only improvements that show clear benefits

Pro Tip: Most search quality issues come from knowledge base problems (missing Q&A pairs, poor phrasing, outdated content) rather than weight settings. Always improve your knowledge base before tuning weights.

When to adjust weights

Signs you might need to adjust weights:

Understanding default weights

How weights affect scoring:

What the default weighting means:

Increasing semantic weight

When to use higher semantic weight:

Example scenario:

Trade-offs:

Increasing BM25 weight

When to use higher BM25 weight:

Example scenario:

Trade-offs:

Testing weight adjustments

Testing methodology:

What to measure:

Example test comparison:

Common use cases

E-commerce / Retail (80/20 - semantic heavy):

SaaS Product Support (70/30 - default balanced):

API Documentation (50/50 or 40/60 - keyword heavy):

Healthcare / Medical (60/40 - moderate semantic):

Manufacturing / Parts Catalog (40/60 - keyword heavy):

Monitoring performance

Metrics to track:

Warning signs:

Continuous improvement:

Related Guides

Search Modes Explained: Semantic, BM25, and Hybrid

Improving Search Quality with Knowledge Base Hygiene

Hybrid Search Troubleshooting Checklist