From Sync Bridge to Data Warehouse

From Sync Bridge to Data Warehouse

← Go back

From Sync Bridge to Data Warehouse:

A Production System Transformation

The Challenge

Inherited a PropertyWare-ServiceFusion integration system with fundamental limitations:

  • 15-20% error rate in production
  • 45-60 minute sync cycles with no recovery mechanism
  • Complete data loss between syncs (no historical tracking)
  • Critical bug: 40 work orders routing technicians to wrong units
  • Zero visibility into sync failures

The system processed $400K+ monthly work orders but couldn't be trusted for accurate dispatching.

What I Built

Transformed a transient sync bridge into a persistent data warehouse while maintaining 100% uptime.

Architecture Evolution

Before

  • Sequential processing through DynamoDB
  • 45MB Lambda layer, Node.js 18.x
  • Delete-after-sync pattern
  • No duplicate detection
  • Manual error recovery

After

  • Parallel processing via SNS fan-out
  • 109MB enhanced layer with deduplication service
  • Persistent PostgreSQL (Supabase) with Kimball dimensional model
  • Automated duplicate detection across multiple criteria
  • Self-healing error recovery with exponential backoff

Key Improvements Delivered

Performance

  • Sync time: 45-60 minutes → 13-20 minutes (70% reduction)
  • Error rate: 15-20% → <1%
  • Memory usage: 512MB limit (with OOM errors) → 216MB/1024MB (79% headroom)
  • Recovery time: Hours of manual intervention → Automatic

Reliability Fixes

  • Solved PropertyWare connection failures with keep-alive disable and connection headers
  • Fixed unit ID corruption affecting 40 work orders through SQL correction and validation logic
  • Restored STATUS_OPEN array for accurate work order categorization
  • Documented and solved Lambda warm container trap causing stale code execution

Data Architecture

-- Implemented Kimball dimensional model
fact_work_orders (with SCD Type 2 history)
fact_leases
dim_portfolio, dim_building, dim_unit, dim_tenant
mapping tables for cross-system reconciliation

Operational Control

  • Feature flags for ServiceFusion sync control without deployment
  • Dry-run mode for safe production testing
  • Configurable status mappings
  • Real-time monitoring through CloudWatch and Supabase

Technical Implementation

Problem: Data Loss Between Syncs

Solution: Implemented persistent storage pattern with PostgreSQL, maintaining full historical tracking while preserving original sync logic.

Problem: PropertyWare Socket Hang-ups

Solution:

// Disabled keep-alive, added connection management
{
    keepAlive: false,
    headers: { 'Connection': 'close' },
    timeout: 90000
}

Result: Zero connection errors in production since implementation.

Problem: Unit/Building ID Corruption

Solution: SQL correction with validation logic

UPDATE fact_work_orders
SET unit_id = NULL
WHERE unit_id = building_id
AND building_id IN (multi_unit_buildings);

Result: Correct technician routing for all work orders.

Problem: No Duplicate Detection

Solution: Built deduplication service with multi-criteria matching

  • Check number validation
  • Cross-system ID mapping
  • Temporal matching within time windows

Architecture Decisions

Why PostgreSQL over DynamoDB: Need for complex queries, historical tracking, and dimensional modeling that NoSQL couldn't efficiently provide.

Why SNS fan-out over sequential: Reduced sync time by 70% through parallel processing while maintaining data consistency.

Why feature flags: Allow business users to control sync behavior without engineering intervention, critical for production incidents.

Current Production State

Version: Lambda v166, Layer v126 Status: Stable production since August 2025 Scale: Processing 1000+ work orders daily Uptime: 99.9% (excluding scheduled maintenance)

Technologies

  • AWS: Lambda (Node.js 20.x), SNS, EventBridge, Parameter Store
  • Database: Supabase (PostgreSQL) with Kimball dimensional model
  • APIs: SOAP/XML (PropertyWare), REST/OAuth (ServiceFusion)
  • Monitoring: CloudWatch custom metrics, Supabase real-time monitoring
  • IaC: AWS SAM for deployment automation

Impact

  • Eliminated manual intervention for sync failures
  • Enabled historical analytics previously impossible
  • Reduced technician dispatch errors by 95%
  • Created foundation for predictive maintenance analytics
  • Saved 3-4 hours weekly in manual error resolution

Original System Credit

Original architecture by Walter Quesada (CTO, Talisman) - provided solid foundation that served production needs 2019-2024. My work built upon his codebase, preserving core business logic while addressing architectural limitations that emerged as business scaled.

Detailed Analysis

More about me

My aim is to live a balanced and meaningful life, where all areas of my life are in harmony. By living this way, I can be the best version of myself and make a positive difference in the world. About me →