Why Batch Processing Falls Short
Traditional ETL pipelines run on nightly schedules. For franchise operators making staffing decisions at 7 AM for the lunch rush, yesterday's data is not good enough. We needed a pipeline that could process events as they happen.
Our requirements were straightforward: sub-second latency from source event to queryable graph node, with exactly-once semantics to prevent duplicate transactions in financial reporting.
The Streaming Architecture
We settled on an architecture built around event sourcing. Every data change from every connector is captured as an immutable event before being transformed and written to the graph.
- Every data change from every tool is captured as an immutable event the moment it happens — no nightly batch jobs, no stale dashboards
- Events are deduplicated before they reach the graph, so a double-fired webhook from a POS will never inflate your revenue numbers
- Transformations run atomically: a transaction, its line items, and the employee who rang it up all land in the graph together, not in pieces
- If a connector misses a window or a vendor API goes down, the pipeline replays from the last known-good state — you never lose a day of data
Handling Connector Diversity
Every franchise tool has its own API patterns — REST, webhooks, SFTP drops, even email parsing for some legacy systems. We built an abstraction layer that normalizes all of these into a consistent event stream.
- Webhook receivers for modern APIs (Toast, Square, HubSpot)
- Polling adapters for REST-only systems with configurable intervals
- File watchers for SFTP-based integrations common in legacy franchise tech
- Email parsers for automated report digestion
The hardest part was not the streaming infrastructure. It was convincing ourselves that a franchise POS system from 2008 could be integrated gracefully into a real-time pipeline.
Performance at Scale
In production, our pipeline processes an average of 2.3 million events per day across all connected franchise systems, with a P99 latency of 340ms from source to graph. The system auto-scales based on event volume, handling the lunch and dinner rush spikes without manual intervention.




