Before running evaluations, you’ll need a dataset with existing mappings. See Mapping Schemas to create mappings first.
Why evaluate mappings
Manual validation tests specific cases you define, but it can’t anticipate every issue. AI evaluation analyzes your mappings holistically, looking for: Transformation gaps: Cases your transformation doesn’t handle, like unexpected input values or edge cases. Pattern mismatches: Source data that doesn’t align with the target attribute’s expected format. Quality indicators: Signals that suggest a mapping might produce incorrect results under certain conditions. Evaluations complement manual validation—they surface issues you might not have thought to test for.Running an evaluation
Using the UI
- Navigate to Rosetta Stone → Normalized Datasets
- Click on a dataset to open its detail page
- On the Evaluate tab, click Evaluate Mappings
- A confidence gradient bar showing the distribution across quality tiers
- Confidence level cards with counts for High, Medium, Low, and Not Scored
- Individual mapping cards with scores and AI feedback
Understanding evaluation results
The confidence gradient
The gradient bar provides a visual summary of your dataset’s mapping quality:| Color | Score Range | Meaning |
|---|---|---|
| Green | 80-100% | High confidence—mappings are likely correct |
| Yellow | 50-79% | Medium confidence—review recommended |
| Red | 0-49% | Low confidence—manual review required |
| Gray | Not scored | Mappings that couldn’t be evaluated |
Confidence level cards
Each card shows the count of mappings in that tier: High confidence (80-100%): These mappings show strong alignment between source columns and target attributes. The AI found consistent patterns and appropriate transformations. You can generally trust these, though spot-checking critical data flows is still recommended. Medium confidence (50-79%): The mapping is probably correct but has characteristics worth reviewing. Common reasons include partial transformation coverage or ambiguous column names. Low confidence (0-49%): The AI identified significant concerns. These mappings require human verification before relying on them. Not scored: Mappings that couldn’t be evaluated, typically system-generated mappings for internal columns.Individual mapping feedback
Click on any mapping card to see detailed AI feedback:- Score breakdown: Factors contributing to the confidence score
- Identified issues: Specific concerns the AI found
- Recommendations: Suggested improvements
- Sample analysis: How the mapping performs on actual data samples
Filtering mappings
By confidence level
Click any confidence level card to filter the mapping list:- Click the Low card to see only low-confidence mappings
- Click the Medium card to see medium-confidence mappings
- Click All to clear the filter
By mapping type
Filter by how the mapping was created:| Type | Description |
|---|---|
| System | Auto-generated for internal columns (always correct) |
| AI | Created from AI suggestions |
| User | Manually created by users |
| Lineage | Inherited from parent datasets |
Understanding low confidence scores
When a mapping scores low, the AI provides specific feedback. Common issues include:Missing transformation cases
The transformation doesn’t handle all values in the source data:Type mismatch concerns
The source data contains values that might not convert correctly:Ambiguous column mapping
The column name doesn’t clearly indicate its content:Edge case failures
The transformation fails for certain input patterns:Refreshing stale evaluations
Evaluation results can become outdated when:- You modify transformation expressions
- The source data changes significantly
- Significant time has passed since the last evaluation
Recognizing stale evaluations
The UI indicates when results may be stale:- Last evaluated timestamp shows when evaluation ran
- A Stale badge appears if mappings changed since evaluation
- Confidence scores may show a warning indicator
Re-running evaluations
To refresh results:- Navigate to the dataset’s Evaluate tab
- Click Evaluate Mappings to run a new evaluation
- The new results replace the previous evaluation
- Modifying any transformation expression
- Accepting or rejecting AI suggestions
- Observing unexpected query results
- Major updates to source data

