Hallucination Guardrail
The Hallucination Guardrail is a safety feature that helps ensure the accuracy and reliability of assistant responses by detecting when quoted text from your data sources may have been altered or doesn’t match the original source material.
What It Does
The guardrail system:
- Reviews quoted text from your data sources for accuracy
- Compares quotes against the original source material
- Flags potential alterations in quoted content, including sentence structure changes
- Provides source citations at the end of responses when quotes are present
- Helps maintain trust in assistant-generated content
How It Works
Quote-Based Detection
The Hallucination Guardrail only works when the assistant quotes text from your data sources:
- Identifies quoted passages in assistant responses (similar to sentences or paragraphs)
- Compares quotes against the original source material
- Flags differences if the quoted text doesn’t match exactly
- Detects structural changes in sentence construction
- Provides source citations at the end of responses
When It Cannot Check
The guardrail cannot detect hallucinations when:
- No quotes are present in the response
- The assistant synthesizes information from multiple sources without quoting
- Responses contain general summaries rather than specific quoted text
Source Citations
How Citations Work
When the assistant quotes text from your data sources, DataQI automatically:
- Reviews each quote against the original source material
- Displays source citations at the end of responses
- Lists the specific files where quoted information was found
- Links quotes to their source documents
When Citations Appear
Source citations are shown when:
- Quoted text is present in the response (sentences or paragraphs)
- Specific information is directly referenced from your documents
- The assistant cites exact passages from your data sources
When Citations Don’t Appear
No source citations will be displayed when:
- No quotes are used in the response
- Information is synthesized from multiple sources without direct quotes
- General knowledge is used instead of your specific data
- Summaries are provided rather than specific quoted content
Tip: Encouraging More Citations
To see more source citations in responses:
- Ask for specific quotes: “What does the policy document say about vacation time?”
- Request filename references: Include in your assistant prompts to mention source files
- Use direct questions: “Quote the section about safety requirements”
- Encourage quotation: Modify prompts to ask assistants to quote relevant passages
Visual Indicators
- Warning banners in chat and document writer
- Highlighted text showing potentially inaccurate content
- Clear messaging about what has been flagged and why
- Consistent presentation across all interfaces
Enabling the Guardrail
For New Assistants
- During assistant creation, go to Advanced Settings
- Find the Hallucination Guardrail option
- Toggle it ON to enable
- Save your assistant configuration
For Existing Assistants
- Go to the assistant’s Manage page
- Navigate to Advanced Settings
- Toggle the Hallucination Guardrail setting
- Changes take effect immediately
What Gets Flagged
Quote Alterations
The system flags quoted text when it differs from the original source, including:
- Modified quoted text that differs from the original source
- Changed sentence structure in quoted passages
- Added or removed words within quoted content
- Paraphrased quotes that don’t match the exact source text
- Number format changes (e.g., “15” vs “fifteen”)
- Word substitutions or reordering within quotes
Examples of Flagged Content
Example 1:
- Original source: “The policy allows 15 days of vacation annually”
- Flagged quote: “The policy allows fifteen days of vacation per year”
- Reason: Changed number format and “annually” to “per year”
Example 2:
- Original source: “Sales increased by 20% in Q3”
- Flagged quote: “Q3 saw a 20% increase in sales”
- Reason: Restructured sentence order
Example 3:
- Original source: “All employees must complete safety training”
- Flagged quote: “Every employee must complete safety training”
- Reason: Changed “All” to “Every”
What Doesn’t Get Flagged
- Synthesized information without quotes
- General summaries that don’t quote specific text
- Responses without source citations
- Paraphrased content not presented as quotes
User Experience
In Chat
- Warning banner appears above potentially problematic responses
- Highlighted text shows specific areas of concern
- Clear explanation of why content was flagged
- Option to proceed with awareness of potential issues
In Document Writer
- Inline warnings highlight problematic sections
- Sidebar indicators show flagged content
- Review mode to address flagged areas before finalizing
- Clear visual distinction between verified and flagged content
Configuration Options
Sensitivity Levels
- High: Flags even minor differences in quoted text
- Medium: Balanced detection of quote alterations (default)
- Low: Flags only significant changes to quoted content
Response Actions
- Show warnings only: Display flags when quotes don’t match but allow responses
- Block responses: Prevent responses with mismatched quotes
- Require review: Force manual review of flagged quote differences
Best Practices
Encouraging Source Citations
To maximize the effectiveness of the Hallucination Guardrail and source citations:
- Customize assistant prompts to encourage quoting specific information
- Ask for filename references in responses to promote source tracking
- Request specific quotes when you need exact information from documents
- Use questions that naturally lead to quoting (e.g., “What does the policy say about…?”)
Understanding Limitations
- Synthesized responses won’t show source citations or guardrail checks
- General questions may not trigger quote-based responses
- Summary requests typically don’t include specific quotes
For Different Use Cases
- Compliance content: Encourage quoting for exact policy language
- Reference materials: Ask for specific quotes when accuracy is critical
- General assistance: Accept that some responses won’t include citations
Examples
Response With Citations and Guardrail Check
User Question: “What is our company’s vacation policy?”
Assistant Response (with quotes and source citations):
According to your employee handbook: “Employees receive 15 days of paid vacation for the first 5 years of service, then 20 days thereafter. All vacation requests must be submitted at least 2 weeks in advance.”
Sources:
- Employee Handbook 2024.pdf
Response Without Citations or Guardrail Check
User Question: “What are the specifications for our main product?”
Assistant Response (no quotes, synthesized information):
Based on the information in your product documentation, the main product features include advanced security protocols, cloud-based storage, and multi-platform compatibility. The system supports up to 1000 concurrent users and integrates with major enterprise software.
Note: No source citations available as this response synthesizes information from multiple sources without direct quotes.
Flagged Response Example
User Question: “What does our safety policy say about equipment?”
Assistant Response (with hallucination warning):
⚠️ Warning: The quoted text below may not match the original source exactly.
Your safety manual states: “All employees must wear protective equipment at all times in the facility.”
Sources:
- Safety Manual v3.pdf
Troubleshooting
No Source Citations Appearing
- Encourage quoting by asking for specific policy language or exact wording
- Modify assistant prompts to request filename references
- Ask direct questions about what documents say rather than general topics
- Use phrases like “What does [document] say about…?”
Too Many False Flags on Quotes
- Review source document quality - ensure text is clear and not corrupted
- Check for OCR errors in scanned documents that might cause mismatches
- Lower sensitivity if minor paraphrasing is acceptable
Quotes Not Being Detected as Different
- Increase sensitivity to catch subtle changes
- Review flagged examples to understand detection patterns
- Ensure source documents are properly indexed and searchable
Performance Issues
- Reduce sensitivity if processing is slow
- Optimize data sources for faster analysis
- Consider disabling for high-volume, low-risk use cases
Security and Compliance
Data Privacy
- No external data sharing - analysis happens locally
- Your data stays private - no information sent to external services
- Compliance-friendly - helps meet accuracy requirements
Audit Trail
- Track flagged responses for compliance reporting
- Monitor guardrail effectiveness over time
- Generate reports on potential issues