From Sync Bridge to Data Warehouse:
A Production System Transformation
The Challenge
Inherited a PropertyWare-ServiceFusion integration system with fundamental limitations:
- 15-20% error rate in production
- 45-60 minute sync cycles with no recovery mechanism
- Complete data loss between syncs (no historical tracking)
- Critical bug: 40 work orders routing technicians to wrong units
- Zero visibility into sync failures
The system processed $400K+ monthly work orders but couldn't be trusted for accurate dispatching.
What I Built
Transformed a transient sync bridge into a persistent data warehouse while maintaining 100% uptime.
Architecture Evolution
Before
- Sequential processing through DynamoDB
- 45MB Lambda layer, Node.js 18.x
- Delete-after-sync pattern
- No duplicate detection
- Manual error recovery
After
- Parallel processing via SNS fan-out
- 109MB enhanced layer with deduplication service
- Persistent PostgreSQL (Supabase) with Kimball dimensional model
- Automated duplicate detection across multiple criteria
- Self-healing error recovery with exponential backoff
Key Improvements Delivered
Performance
- Sync time: 45-60 minutes → 13-20 minutes (70% reduction)
- Error rate: 15-20% → <1%
- Memory usage: 512MB limit (with OOM errors) → 216MB/1024MB (79% headroom)
- Recovery time: Hours of manual intervention → Automatic
Reliability Fixes
- Solved PropertyWare connection failures with keep-alive disable and connection headers
- Fixed unit ID corruption affecting 40 work orders through SQL correction and validation logic
- Restored STATUS_OPEN array for accurate work order categorization
- Documented and solved Lambda warm container trap causing stale code execution
Data Architecture
-- Implemented Kimball dimensional model
fact_work_orders (with SCD Type 2 history)
fact_leases
dim_portfolio, dim_building, dim_unit, dim_tenant
mapping tables for cross-system reconciliation
Operational Control
- Feature flags for ServiceFusion sync control without deployment
- Dry-run mode for safe production testing
- Configurable status mappings
- Real-time monitoring through CloudWatch and Supabase
Technical Implementation
Problem: Data Loss Between Syncs
Solution: Implemented persistent storage pattern with PostgreSQL, maintaining full historical tracking while preserving original sync logic.
Problem: PropertyWare Socket Hang-ups
Solution:
// Disabled keep-alive, added connection management
{
keepAlive: false,
headers: { 'Connection': 'close' },
timeout: 90000
}
Result: Zero connection errors in production since implementation.
Problem: Unit/Building ID Corruption
Solution: SQL correction with validation logic
UPDATE fact_work_orders
SET unit_id = NULL
WHERE unit_id = building_id
AND building_id IN (multi_unit_buildings);
Result: Correct technician routing for all work orders.
Problem: No Duplicate Detection
Solution: Built deduplication service with multi-criteria matching
- Check number validation
- Cross-system ID mapping
- Temporal matching within time windows
Architecture Decisions
Why PostgreSQL over DynamoDB: Need for complex queries, historical tracking, and dimensional modeling that NoSQL couldn't efficiently provide.
Why SNS fan-out over sequential: Reduced sync time by 70% through parallel processing while maintaining data consistency.
Why feature flags: Allow business users to control sync behavior without engineering intervention, critical for production incidents.
Current Production State
Version: Lambda v166, Layer v126 Status: Stable production since August 2025 Scale: Processing 1000+ work orders daily Uptime: 99.9% (excluding scheduled maintenance)
Technologies
- AWS: Lambda (Node.js 20.x), SNS, EventBridge, Parameter Store
- Database: Supabase (PostgreSQL) with Kimball dimensional model
- APIs: SOAP/XML (PropertyWare), REST/OAuth (ServiceFusion)
- Monitoring: CloudWatch custom metrics, Supabase real-time monitoring
- IaC: AWS SAM for deployment automation
Impact
- Eliminated manual intervention for sync failures
- Enabled historical analytics previously impossible
- Reduced technician dispatch errors by 95%
- Created foundation for predictive maintenance analytics
- Saved 3-4 hours weekly in manual error resolution
Original System Credit
Original architecture by Walter Quesada (CTO, Talisman) - provided solid foundation that served production needs 2019-2024. My work built upon his codebase, preserving core business logic while addressing architectural limitations that emerged as business scaled.
Detailed Analysis
More about me
My aim is to live a balanced and meaningful life, where all areas of my life are in harmony. By living this way, I can be the best version of myself and make a positive difference in the world. About me →