Data Engineer
Requirements:
- Python proficiency - ETL, data pipelines, pandas, SQLAlchemy
- PostgreSQL depth: materialised views, audit tables, schema migrations, multi-tenant isolation, connection pooling
- ETL design and implementation: ID-based vs. timestamp-based change tracking, idempotency, fault-tolerant resumption, operational runbooks
- Data modelling: normalised schemas supporting both operational queries and analytical aggregations
- AWS fundamentals: RDS, S3, EventBridge, CloudWatch, Secrets Manager, Site-to-Site VPN
- HIPAA awareness: PHI boundaries, data isolation, compliant pipeline design
- Ability to debug and rewrite legacy Python codebases
Nice-to-Have:
- Multi-tenant SaaS data architecture experience (per-tenant DB patterns, cross-tenant aggregation)
- Power BI or equivalent BI tooling (Tableau, Metabase, Superset) — connecting data sources, semantic layers, and dashboard design
- Healthcare data domain knowledge: prior authorisation workflows, CPT/HCPCS codes, payer/provider relationships, EDI 278/275
- PostgreSQL to AWS RDS migration experience
- FastAPI
- Python scheduling frameworks (Prefect, custom schedulers) — note: Airflow explicitly rejected as over-engineered
- SOC 2 or HITRUST environment experience
Responsibilities:
- ETL Rebuild:
- Rewrite the replication pipeline from 20 legacy PostgreSQL instances to the unified data lake
- Replace timestamp-based extraction with ID-based replication using the audit_logged_actions table
- Build operational controls: add/remove org (with full HIPAA-compliant data purge), pause/resume per tenant
- Implement audit trail tracking processed IDs per source database
- Write requirements documentation for team-wide maintainability
- Data Lake Stabilisation & Data Modelling:
- Own V-Master data lake schema, materialised views, and metadata structures
- Maintain cross-client metadata: lines of business, ~150-status bucketing, payer definitions, provider data
About the Project
Our client is a private equity-backed healthcare technology company undergoing a major platform modernisation initiative.
The engagement focuses on stabilising and improving existing automation systems while simultaneously building a next-generation AI-powered platform. This transformation is designed to increase development velocity, improve system reliability, and establish a scalable technology foundation for future growth.
The environment is fast-paced, product-driven, and centred around AI-native engineering practices.