Senior Data Engineer
If you want to know about the requirements for this role, read on for all the relevant information.
£576 per day Umbrella
London – Hybrid (2 / 3 days per week)
6 Month Contract
Our client is currently searching for a Senior Data engineer to support with their team in London.
Responsibilities
• Execute migration of raw and derived datasets between on-prem and cloud data locations (e.g. GCP, Azure, AWS). Datasets magnitude vary between small scale (Gb) up to large scale (Tb).
• Ensure consistency between the data ingested and the data manifests.
• Organise raw and derived data into appropriate hierarchies.
• Collaborate with AI/ML engineers and product managers to
o Develop data pipelines for incoming batch data and update existing pipelines where necessary.
o Design and implement well decoupled, modularized, reusable, and scalable scripts and code for the retrieval and pre-processing of large-scale histopathology images into the AI/ML pipeline (i.e. each one with order of magnitude of gigabytes)
• Document data flows and ingestion pipelines, data use and re-use
• Implement data flows to connect operational systems, data for analytics and business intelligence (BI) systems (e.g. Power-BI)
• Ensure completion of requisite documentation i.e. ingestion form and any related IHD documentation
• Track & report completion of data migration to AIML & Onyx stakeholders and raise blockers preventing migration.
[Non-comp path requirements]
• Migrate ML pipelines from on-prem HPC solutions to the cloud.
• Migrate ML pipelines between cloud environments and across cloud computing providers.
• Optimise and parallelise said ML pipelines for scalability, speed and cost efficiency.
Experience:
• 5+ years of work experience as a professional data/software engineer.
• Machine learning experience / background
• CICD experience
• Expert level and industrial experience in design, development and deployment of data engineering pipelines.
• Advanced programming expertise in Python and in developing and delivering robust software solutions.
• Advanced programming expertise in SQL and/or similar database languages.
• Experience with cloud platforms, such as Google Cloud Platform, Azure, AWS (preference GCP)
• Experience in handling big data at scale.
• Experience with large-size images and data formats for computational pathology would be a plus (e.g. .svs, .tiff, .h5).
• Experience with business intelligence platforms, e.g. Power-BI