Semantic Duplicates Detection
Automated notification system that detects high-risk duplicate past answers and alerts configured users.
Overview
The Semantic Duplicates project introduces an automated notification system that detects high-risk duplicate past answers within a company's data and alerts configured users. Rather than requiring manual checks, the system runs in the background and proactively notifies teams when potential duplicates are found.
How It Works
Configuration
To enable duplicate detection, navigate to the company settings and open the Notifications section. From here, you can configure how you'd like to receive notifications about high-risk duplicates. You can add as many email addresses as needed — notifications will be sent to all of them.
Processing Schedule
A cron job runs at midnight on Sundays, chosen deliberately as one of the quietest periods of the week since the duplicate detection process is computationally intensive. The job only runs for companies that have explicitly configured notifications. This ensures we're not processing data unnecessarily or sending unexpected notifications to companies that haven't opted in.
Notification Channels
Duplicate detection emails are sent via SendGrid from hello@getvera.ai with the subject line "Duplicate Past Answers Detected." Each email includes:
- A summary of how many high-risk duplicates were found (e.g., "We found 3 high-risk duplicates in your group")
- A detailed breakdown of each duplicate, including risk level, category, and similarity percentages
- Direct links to view the original past answer (opens the actual past asset in a new tab)
- Direct links to view the duplicates for side-by-side comparison
Slack
Slack notifications are also supported. Users who have registered to be notified will be personally tagged in the message. Each Slack notification includes:
- Who added the notification
- The date of detection
- Relevant labels
- The source of the duplicate and where it has been used
- Truncated details (if there are many duplicates) with links to view the full originals
Current Status
The feature has been shipped (Released on the 5th of February) and communicated via a changelog email.
Design Decisions
- Opt-in only: Notifications are only sent to companies that have explicitly configured them, avoiding confusion for companies that haven't set up the feature.
- Weekend processing: The heavy computation runs during low-traffic hours (Sunday midnight) to minimise impact on system performance.
- Customer-driven iteration: The initial approach is intentionally focused. Future improvements will be guided by customer feedback rather than speculative feature additions.
Future Considerations
- Additional UI features such as a "Check Past Answers" or "Check for Duplicates" button for on-demand scanning
- Golden answers and merge functionality (moved to a separate project to avoid scope creep)
- Further enhancements based on customer feedback and usage patterns