SayPro Table of Contents
Toggle
📋 1. Report Overview
Purpose: To document actions taken to ensure the accuracy, completeness, and consistency of raw M&E data before analysis.
Field | Description |
---|
Report Name | June 2025 Data Cleaning and Validation Report (SCLMR-1) |
Reporting Officer | [Name of M&E Analyst or Data Officer] |
Reporting Period | 01–30 June 2025 |
Data Sources | Youth Surveys, Attendance Registers, Beneficiary Registration Forms |
Programs Covered | ICT Skills, Job Placement, Mental Health Awareness |
🧹 2. Data Cleaning Actions
Issue Type | Description | Affected Records | Resolution | Notes |
---|
Missing Values | 63 records had blank gender field | 63 | Imputed from registration data | All fixed |
Inconsistent Date Format | Multiple formats (dd/mm/yyyy vs yyyy-mm-dd) | 124 | Standardized to ISO (yyyy-mm-dd) | Applied Excel transformation |
Duplicate Entries | Same name/ID repeated | 21 | Removed duplicates based on timestamp | Retained earliest entry |
Invalid Age Entries | Ages below 10 or above 35 in youth database | 12 | Flagged for verification | Still pending site confirmation |
Text Errors | Typo in region names (e.g., “Limpop” instead of “Limpopo”) | 8 | Corrected via lookup table | Automated rule applied |
Outlier Values | “Years unemployed” > 20 | 3 | Flagged, confirmed as correct | Not removed |
Mismatched IDs | Attendance sheet IDs not found in registration data | 19 | Linked manually using names | Records matched |
✔️ 3. Validation Checks Performed
Check | Description | Result |
---|
Uniqueness Check | Ensured each Youth ID is unique | ✅ Passed |
Completeness | All mandatory fields completed | ⚠️ 98% complete |
Range Validation | Age, income, hours trained | ✅ Passed |
Categorical Accuracy | Gender, region, program type match options | ✅ Passed |
Logic Consistency | If “Job placement = Yes” → “Income > 0” | ⚠️ 6 inconsistencies |
Date Consistency | No future or implausible past dates | ✅ Passed |
Referral Status Linkage | Valid match to referral logs | ⚠️ 5 unmatched entries |
Location Consistency | Coordinates matched regions | ✅ 100% accurate |
📈 4. Summary of Changes
- Total Records Cleaned: 1,247
- Duplicates Removed: 21
- Manual Corrections Made: 47
- Fields Auto-Corrected by Script: 382
- Pending Issues for Follow-Up: 9
- Quality Score (Post-cleaning): 94%
📎 5. Notes & Recommendations
- Implement validation checks during data entry (e.g., dropdowns in mobile forms).
- Conduct field staff training on consistent spelling for regions and program types.
- Build auto-formatting scripts in Excel for dates and ID fields.
- Improve linkage between attendance logs and registration IDs.
- Integrate real-time quality checks in KoBoToolbox forms.
📤 6. Attachments (linked or referenced)
- ✔️ Cleaned Dataset:
June_Cleaned_YouthSurvey_2025.xlsx
- ✔️ Cleaning Log:
June_Cleaning_Log.csv
- ✔️ Data Quality Dashboard:
SayPro_DQ_Summary_June2025.pdf
Leave a Reply
You must be logged in to post a comment.