Airfinity is seeking a skilled and proactive Data Engineer to join its Data Science team, supporting with predictive analytics by preparing and optimising large, complex datasets for infectious disease modelling and forecasting. This role is integral to delivering high-quality, model-ready data that enables our data scientists to develop insights across pharmaceuticals and infectious diseases. The Data Engineer will work collaboratively with the data science, engineering and data teams to ensure robust data pipelines and workflows, ultimately facilitating data-driven decisions that help governments, pharmaceutical companies, and NGOs safeguard public health. This role offers the opportunity to contribute to Airfinity’s mission by ensuring high data integrity and availability, paving the way for groundbreaking health predictions and insights.
Key Responsibilities:
- Design, build, and maintain reliable and efficient data pipelines to process, clean, and enrich raw data from diverse sources, ensuring it is ready for analysis by the data science team.
- Collaborate with data scientists to understand their needs, facilitating data flow that optimises model development and predictive analytics across infectious disease indicators and vaccine data.
- Implement data quality checks, validation processes, and monitoring to ensure the accuracy, consistency, and timeliness of data assets.
- Work closely with data team and software engineering teams to streamline the integration of new data sources into the data infrastructure.
- Develop and optimise databases, data lakes, and other data storage solutions to support high-volume data processing.
- Document and maintain best practices for data engineering processes, encouraging a scalable, maintainable, and repeatable data ecosystem.
- Contribute to the design and implementation of ETL (Extract, Transform, Load) pipelines to align with evolving business needs.
- Identify opportunities for automation and data processing efficiencies, contributing to faster data delivery cycles.
- Provide technical support for data-related issues, troubleshooting data discrepancies and pipeline failures as needed.
- Proven experience in data engineering, with a focus on preparing data for analytical and predictive purposes. 3-5 years experience.
- Proficiency in SQL and a deep understanding of relational and non-relational databases.
- Experience with Python for data manipulation and ETL processes; familiarity with other programming languages is a plus.
- Knowledge of data warehousing and big data technologies (e.g., Spark, Hadoop) and experience working with cloud-based solutions (e.g., AWS, Azure, or Google Cloud).
- Familiarity with version control systems (e.g., Git) and CI/CD pipelines to support data engineering practices.
- Ability to manage large datasets and build scalable data architecture that supports rapid access and retrieval.
- Excellent problem-solving skills, with a proactive approach to ensuring data quality and pipeline reliability.
- Strong communication skills, with the ability to collaborate effectively with data scientists, analysts, and cross-functional teams.
- An innovative, detail-oriented mindset with a passion for using data to improve public health outcomes.
- Fluent conversational business English.
- The opportunity to work with a pioneering team in predictive health analytics and contribute to a mission with real-world impact.
- A collaborative and fast-paced environment with a focus on innovation.
- The chance to work closely with a multidisciplinary team of data scientists, analysts, and researchers.