Hallucination Guardrail

The Hallucination Guardrail is a safety feature that helps ensure the accuracy and reliability of assistant responses by detecting when quoted text from your data sources may have been altered or doesn’t match the original source material.

What It Does

The guardrail system:

Reviews quoted text from your data sources for accuracy
Compares quotes against the original source material
Flags potential alterations in quoted content, including sentence structure changes
Provides source citations at the end of responses when quotes are present
Helps maintain trust in assistant-generated content

How It Works

Quote-Based Detection

The Hallucination Guardrail only works when the assistant quotes text from your data sources:

Identifies quoted passages in assistant responses (similar to sentences or paragraphs)
Compares quotes against the original source material
Flags differences if the quoted text doesn’t match exactly
Detects structural changes in sentence construction
Provides source citations at the end of responses

When It Cannot Check

The guardrail cannot detect hallucinations when:

No quotes are present in the response
The assistant synthesizes information from multiple sources without quoting
Responses contain general summaries rather than specific quoted text

Source Citations

How Citations Work

When the assistant quotes text from your data sources, DataQI automatically:

Reviews each quote against the original source material
Displays source citations at the end of responses
Lists the specific files where quoted information was found
Links quotes to their source documents

When Citations Appear

Source citations are shown when:

Quoted text is present in the response (sentences or paragraphs)
Specific information is directly referenced from your documents
The assistant cites exact passages from your data sources

When Citations Don’t Appear

No source citations will be displayed when:

No quotes are used in the response
Information is synthesized from multiple sources without direct quotes
General knowledge is used instead of your specific data
Summaries are provided rather than specific quoted content

Tip: Encouraging More Citations

To see more source citations in responses:

Ask for specific quotes: “What does the policy document say about vacation time?”
Request filename references: Include in your assistant prompts to mention source files
Use direct questions: “Quote the section about safety requirements”
Encourage quotation: Modify prompts to ask assistants to quote relevant passages

Visual Indicators

Warning banners in chat and document writer
Highlighted text showing potentially inaccurate content
Clear messaging about what has been flagged and why
Consistent presentation across all interfaces

Enabling the Guardrail

For New Assistants

During assistant creation, go to Advanced Settings
Find the Hallucination Guardrail option
Toggle it ON to enable
Save your assistant configuration

For Existing Assistants

Go to the assistant’s Manage page
Navigate to Advanced Settings
Toggle the Hallucination Guardrail setting
Changes take effect immediately

What Gets Flagged

Quote Alterations

The system flags quoted text when it differs from the original source, including:

Modified quoted text that differs from the original source
Changed sentence structure in quoted passages
Added or removed words within quoted content
Paraphrased quotes that don’t match the exact source text
Number format changes (e.g., “15” vs “fifteen”)
Word substitutions or reordering within quotes

Examples of Flagged Content

Example 1:

Original source: “The policy allows 15 days of vacation annually”
Flagged quote: “The policy allows fifteen days of vacation per year”
Reason: Changed number format and “annually” to “per year”

Example 2:

Original source: “Sales increased by 20% in Q3”
Flagged quote: “Q3 saw a 20% increase in sales”
Reason: Restructured sentence order

Example 3:

Original source: “All employees must complete safety training”
Flagged quote: “Every employee must complete safety training”
Reason: Changed “All” to “Every”

What Doesn’t Get Flagged

Synthesized information without quotes
General summaries that don’t quote specific text
Responses without source citations
Paraphrased content not presented as quotes

User Experience

In Chat

Warning banner appears above potentially problematic responses
Highlighted text shows specific areas of concern
Clear explanation of why content was flagged
Option to proceed with awareness of potential issues

In Document Writer

Inline warnings highlight problematic sections
Sidebar indicators show flagged content
Review mode to address flagged areas before finalizing
Clear visual distinction between verified and flagged content

Configuration Options

Sensitivity Levels

High: Flags even minor differences in quoted text
Medium: Balanced detection of quote alterations (default)
Low: Flags only significant changes to quoted content

Response Actions

Show warnings only: Display flags when quotes don’t match but allow responses
Block responses: Prevent responses with mismatched quotes
Require review: Force manual review of flagged quote differences

Best Practices

Encouraging Source Citations

To maximize the effectiveness of the Hallucination Guardrail and source citations:

Customize assistant prompts to encourage quoting specific information
Ask for filename references in responses to promote source tracking
Request specific quotes when you need exact information from documents
Use questions that naturally lead to quoting (e.g., “What does the policy say about…?”)

Understanding Limitations

Synthesized responses won’t show source citations or guardrail checks
General questions may not trigger quote-based responses
Summary requests typically don’t include specific quotes

For Different Use Cases

Compliance content: Encourage quoting for exact policy language
Reference materials: Ask for specific quotes when accuracy is critical
General assistance: Accept that some responses won’t include citations

Examples

Response With Citations and Guardrail Check

User Question: “What is our company’s vacation policy?”

Assistant Response (with quotes and source citations):

According to your employee handbook: “Employees receive 15 days of paid vacation for the first 5 years of service, then 20 days thereafter. All vacation requests must be submitted at least 2 weeks in advance.”

Sources:

Employee Handbook 2024.pdf

Response Without Citations or Guardrail Check

User Question: “What are the specifications for our main product?”

Assistant Response (no quotes, synthesized information):

Based on the information in your product documentation, the main product features include advanced security protocols, cloud-based storage, and multi-platform compatibility. The system supports up to 1000 concurrent users and integrates with major enterprise software.

Note: No source citations available as this response synthesizes information from multiple sources without direct quotes.

Flagged Response Example

User Question: “What does our safety policy say about equipment?”

Assistant Response (with hallucination warning):

⚠️ Warning: The quoted text below may not match the original source exactly.

Your safety manual states: “All employees must wear protective equipment at all times in the facility.”

Sources:

Safety Manual v3.pdf

Troubleshooting

No Source Citations Appearing

Encourage quoting by asking for specific policy language or exact wording
Modify assistant prompts to request filename references
Ask direct questions about what documents say rather than general topics
Use phrases like “What does [document] say about…?”

Too Many False Flags on Quotes

Review source document quality - ensure text is clear and not corrupted
Check for OCR errors in scanned documents that might cause mismatches
Lower sensitivity if minor paraphrasing is acceptable

Quotes Not Being Detected as Different

Increase sensitivity to catch subtle changes
Review flagged examples to understand detection patterns
Ensure source documents are properly indexed and searchable

Performance Issues

Reduce sensitivity if processing is slow
Optimize data sources for faster analysis
Consider disabling for high-volume, low-risk use cases

Security and Compliance

Data Privacy

No external data sharing - analysis happens locally
Your data stays private - no information sent to external services
Compliance-friendly - helps meet accuracy requirements

Audit Trail

Track flagged responses for compliance reporting
Monitor guardrail effectiveness over time
Generate reports on potential issues