Why hygiene matters

A clean, well-maintained knowledge base is essential for high-quality search results. Poor hygiene leads to:

  • Duplicate matches: Multiple similar Q&A pairs compete, diluting confidence scores
  • Outdated answers: Customers receive incorrect information about current policies or features
  • Inconsistent terminology: Search can't find answers because terms don't match customer language
  • Bloated knowledge base: Too many low-quality entries reduce overall search precision

Key Principle: Quality over quantity. A focused knowledge base with 50 well-written Q&A pairs performs better than 500 poorly-maintained ones.

Writing effective questions

Questions act as semantic anchors for matching. Write them using language your customers actually use.

Best practices for questions:

  • Use customer language: Match how customers phrase questions, not internal jargon
  • Keep questions specific: One question per Q&A pair, avoid combining multiple topics
  • Natural phrasing: Write as a real customer would ask, not as a documentation title
  • Include key terms: Use the specific nouns and verbs customers search for

Good question examples:

✓ "How do I reset my password?"
✓ "What are your shipping costs to Canada?"
✓ "Can I upgrade my subscription mid-month?"
✓ "Do you offer a student discount?"

Poor question examples:

✗ "Password Reset Procedure" (title format, not a question)
✗ "Shipping and Returns Policy" (too broad, combines topics)
✗ "Account credential recovery methodology" (jargon-heavy)
✗ "How do I reset password / change email / update profile?" (multiple questions)

Pro Tip: Review actual customer questions from your support tickets or chatbot logs. Use their exact phrasing when creating Q&A pairs.

Handling question variations:

For questions with common variations, you have two options:

  • Single Q&A with inclusive phrasing: "How do I reset or change my password?"
  • Multiple Q&As pointing to same answer: Separate "How do I reset my password?" and "How do I change my password?" entries (both with identical answers)

For minor variations (synonyms), trust semantic search. For major variations (different intents), create separate Q&A pairs.

Writing effective answers

Answers should be concise, actionable, and complete. Avoid forcing customers to ask follow-up questions.

Best practices for answers:

  • Start with the answer: Lead with the key information, then provide details
  • Be concise: 2-4 sentences ideal; if longer, use bullet points
  • Include actionable steps: Tell customers what to do, not just what the policy is
  • Avoid jargon: Use plain language unless technical terms are necessary
  • Provide context: Include limitations, requirements, or important caveats
  • Link to details: If needed, reference where to find more comprehensive information

Good answer example:

Q: How do I reset my password?

A: Click "Forgot Password" on the login page and enter your email. 
You'll receive a reset link within 5 minutes. If you don't see it, 
check your spam folder. Password reset links expire after 24 hours.

Poor answer example:

Q: How do I reset my password?

A: We offer multiple credential recovery mechanisms via our 
authentication portal. Users can initiate a password reset workflow 
through the forgot password functionality. Email validation is 
required. See our Security Best Practices documentation for more 
information on password policies and account security measures.

Why the second example is poor: Too formal, uses jargon, buries the answer, includes irrelevant information.

Security Note: Never include personal information (emails, phone numbers, account IDs) in answers. Use placeholders like "your registered email" or "your account dashboard."

Formatting tips:

  • Use bullet points for lists or steps
  • Bold key terms or actions
  • Break long answers into short paragraphs
  • Include relevant URLs for detailed documentation

Managing duplicates

Duplicates confuse search by presenting multiple similar matches. The AI-Validated Import feature helps detect them, but regular audits are essential.

Types of duplicates:

  • Exact duplicates: Identical question and answer (easy to detect and remove)
  • Semantic duplicates: Different wording, same meaning (e.g., "How do I cancel?" and "How do I close my account?")
  • Overlapping answers: Different questions with similar or conflicting answers

How to handle duplicates:

  1. Exact duplicates: Delete all but one
  2. Semantic duplicates with same answer:
    • Keep both if they represent genuinely different phrasings
    • Merge into one Q&A with broader question phrasing
  3. Semantic duplicates with different answers: This indicates a problem:
    • If answers conflict, determine which is correct and delete the other
    • If both are correct, clarify the questions to distinguish them

Example of semantic duplicates to merge:

Before (duplicates):
  Q1: "Do you ship internationally?"
  A1: "Yes, we ship to over 50 countries..."

  Q2: "Can I order from outside the US?"
  A2: "Yes, we ship to over 50 countries..."

After (merged):
  Q: "Do you ship internationally?"
  A: "Yes, we ship to over 50 countries..."

Semantic search will handle the variation "Can I order from outside the US?" without needing a separate entry.

Detection Tip: Sort your Q&A pairs alphabetically or by category to spot duplicates visually. Also test similar questions to see if multiple entries rank highly.

Organizing with categories

Categories help you manage and maintain your knowledge base but don't directly affect search quality. Use them for organization and reporting.

Category best practices:

  • Consistent naming: Use standardized category names across your knowledge base
  • Logical grouping: Group related Q&A pairs together for easier maintenance
  • Not too granular: 5-15 categories work well for most knowledge bases
  • Not too broad: Avoid dumping everything into "General"

Example category structure:

  • Account Management (login, password, profile)
  • Billing & Payments (pricing, invoices, refunds)
  • Product Features (how-to, capabilities)
  • Shipping & Returns (delivery, returns policy)
  • Technical Support (troubleshooting, errors)
  • Integrations (API, third-party tools)

Maintenance Benefit: Categories make it easier to audit specific topics when policies change. For example, quickly review all "Billing & Payments" entries when pricing updates.

Removing outdated content

Outdated Q&A pairs damage customer trust and can create support issues when incorrect information is provided.

Signs content is outdated:

  • References discontinued products or features
  • Mentions old pricing, policies, or business hours
  • Describes UI elements that have changed
  • Contains broken links or references to retired documentation

Options for handling outdated content:

  1. Update the answer: Keep the question, revise the answer to reflect current information
  2. Disable the Q&A: Set status to "disabled" to keep for reference but remove from search
  3. Delete permanently: Remove if completely irrelevant

Important: After updating answers, always regenerate embeddings so the semantic search reflects the new content.

Triggers for content review:

  • Product launches or updates
  • Policy changes (pricing, returns, terms)
  • Seasonal changes (business hours, shipping delays)
  • Feature deprecations or retirements

Embeddings maintenance

Embeddings enable semantic search, but they must be regenerated whenever you change Q&A content.

When to regenerate embeddings:

  • After adding new Q&A pairs
  • After editing questions or answers
  • After bulk imports
  • After deleting Q&A pairs (optional, but keeps embeddings database clean)

How to regenerate embeddings:

  1. Navigate to Knowledge Base for your chatbot
  2. Click "Generate Embeddings" button
  3. Wait for the process to complete (takes longer with more Q&A pairs)
  4. Test affected questions to verify search quality

Critical: Without regenerating embeddings, your edits won't be reflected in semantic search. The chatbot will continue using the old embeddings, matching against outdated content.

Embedding best practices:

  • Batch multiple edits, then regenerate once (more efficient than regenerating after each change)
  • Schedule embedding regeneration during low-traffic periods if your knowledge base is very large
  • Test search results after regenerating to catch any unexpected changes

Monitoring quality

Proactive monitoring helps you identify knowledge base issues before they impact customer experience.

Key metrics to track:

  • Match rate: Percentage of questions that successfully match a Q&A pair
  • Fallback rate: How often the chatbot can't find a confident match
  • Low-confidence matches: Questions that match, but with scores below 0.70
  • Unanswered questions: Questions that receive no match or use fallback responses
  • Customer satisfaction: Ratings or feedback on answer quality

Review chatbot analytics:

  1. Check your chatbot dashboard weekly or monthly
  2. Identify the most common unanswered questions
  3. Add Q&A pairs to cover frequently asked but unanswered questions
  4. Review low-scoring matches to improve question phrasing

Pro Tip: Export your unanswered questions monthly. Look for patterns or themes that indicate missing knowledge base content.

Quality indicators:

Healthy knowledge base:

  • 80%+ of questions match with confidence > 0.70
  • Fallback rate < 20%
  • Customer satisfaction rating > 4.0/5.0

Knowledge base needs attention:

  • Match rate < 60%
  • Fallback rate > 30%
  • Many low-confidence matches (scores 0.50-0.69)

Maintenance schedule

Regular maintenance keeps your knowledge base effective and prevents quality degradation over time.

Weekly tasks:

  • Review unanswered questions from chatbot analytics
  • Add 2-5 new Q&A pairs for common unanswered questions
  • Regenerate embeddings if you made changes

Monthly tasks:

  • Review match rate and fallback rate trends
  • Audit low-confidence matches (scores 0.50-0.69)
  • Check for duplicate Q&A pairs
  • Update answers that reference time-sensitive information

Quarterly tasks:

  • Full knowledge base audit (review all Q&A pairs)
  • Remove or update outdated content
  • Reorganize categories if needed
  • Test 20-30 common questions to validate search quality

After major changes:

  • Product launches → Add Q&A for new features
  • Policy updates → Update affected Q&A pairs
  • Rebranding → Update terminology across knowledge base
  • Always regenerate embeddings after bulk updates

Pro Tip: Schedule recurring calendar reminders for weekly and monthly maintenance. Consistent, small improvements are better than infrequent large overhauls.