Jobs

Data Engineer


Job details
  • Oh
  • 3 days ago

About Us

 

We're building the future of uncensored AI infrastructure & products. Our technology powers hyper-immersive experiences and enables the ownership of personalized, interoperable AI characters, unlocking vast monetization opportunities across our ecosystem and beyond.

 

We are initially focused on the Creator and Social-Fi landscapes, building interoperable 'superModel' characters powered by our advanced proprietary multi-modal, uncensored AI models. These superModels can be first experienced on our platform, OhChat, with additional platform integrations in the works.


OhChat, has gained 70,000 users across 174 countries in a matter of weeks. The site allows users to enjoy hyper-immersive experiences with digital AI characters, enabling real-time interactions and uncensored exchanges with original characters as well as ‘digital twins’ who are based on both celebrities and real-world creators, launched in partnership with them.


Website: https://chat.oh.xyz/


Job Overview


As a Data Engineer at Oh, you will play a crucial role in building and optimizing our data pipeline and infrastructure. You’ll be responsible for data collection, particularly large-scale image scraping, and managing structured and unstructured datasets for training generative AI models. You will work closely with machine learning engineers and developers to ensure data quality, availability, and scalability.


Key Responsibilities


  • Data Pipeline Development: Design, build, and maintain data pipelines to support the collection, ingestion, and processing of large-scale image, video, and audio datasets.
  • Data Scraping and Collection: Develop and optimize web scraping scripts to collect high-quality multimedia datasets
  • Data Storage and Management: Implement efficient storage solutions for large volumes of structured and unstructured data, ensuring data accessibility and scalability.
  • ETL Processes: Develop and manage ETL processes to transform raw data into formats suitable for model training.
  • Data Quality Assurance: Ensure data quality and consistency across different sources. Implement monitoring tools and workflows to maintain data accuracy and relevance.
  • Documentation: Maintain clear documentation of data sources, scraping processes, and pipeline workflows for team reference and reproducibility.


Required Skills & Qualifications


  • Programming Languages: Proficiency in either Python or JavaScript for data scraping, ETL, and pipeline development.
  • Web Scraping: Experience with web scraping tools and libraries (e.g., BeautifulSoup, Scrapy).
  • Data Storage and Processing: Experience with databases (SQL and NoSQL, such as PostgreSQL, MongoDB) and cloud storage (e.g., AWS S3, RedShift).
  • Data Pipeline and Workflow Orchestration: Familiarity with data pipeline tools such as Apache Airflow, Prefect, or Luigi.
  • Data Transformation: Strong knowledge of data transformation and processing techniques (e.g., Pandas, Dask for Python).
  • Data Quality Control: Experience with data quality monitoring tools (e.g. dbt, Great Expectations).
  • Version Control: Proficient in using Git for version control, as well as data versioning tools (e.g., DVC)
  • Pipeline Monitoring: Strong experience implementing and owning pipeline monitoring stacks (e.g., Sentry, Grafana, AWS CloudWatch)
  • Testing and code quality: Extensive experience with common frameworks for unit, behavioural, integration, and end-to-end testing (e.g., Pytest, Behave, Postman) and general code quality tools and principles (e.g., Ruff, MyPy, Bandit, Black).


Preferred Qualifications


  • Experience in Generative AI Data Collection: Understanding of the types of data needed for training generative AI models (e.g., GANs, LLMs, diffusion models).
  • Knowledge of ML/DL Basics: Familiarity with machine learning concepts, particularly around data needs for training and evaluation in the context of generative models.
  • Familiarity with Blockchain: Though not mandatory, a keen interest in the blockchain ecosystem and data sources is an advantage.
  • Data Governance: Understanding of legal and ethical implications of data collection, including copyright and privacy concerns.
  • Experience with Image and Video Processing: Familiarity with libraries for image processing (e.g., OpenCV, PIL) and video data handling is a plus.
  • Big Data Experience: Familiarity with big data tools and frameworks (e.g., Spark, Hadoop) is a plus.
  • DevOps:Some experience with common DevOps tools (e.g. CI/CD pipelines, Terraform/CDK, Docker) and best practices are a bonus.


As part of our team, you’ll enjoy:


  • The hustle of a startup with the impact of a global business
  • Tremendous opportunity to join a business pioneering the future of AI
  • Working with an extraordinary team of smart, creative, fun and highly motivated people
  • Flexible working hours, including remote working
  • Modern, uplifting work environment
  • Pension scheme
  • Generous starting salary

 

Sign up for our newsletter

The latest news, articles, and resources, sent to your inbox weekly.

Similar Jobs

Data Engineer

This hybrid, 12 month FTC, offers a great balance of home and office working. You’ll join your colleagues in your local office at least 2 days a week.As the UK’s largest fibre-only network, and its only proven wholesale challenger, we’re busy setting new standards for what digital infrastructure can and...

Irlam

Data Engineer - London - AWS - 60,000 + Benefits

Data Engineer - London - AWS - Up to 60,000 + BenefitsExciting opportunity to work with a forward thinking company who offer their employers the chance to work with cutting edge tech, grow their skills with fully-costed training programs, maintain a healthy work-life balance with modern working arrangements - up...

City of London

Data Engineer - London - GCP - £60,000 + Benefits

Data Engineer - London - GCP - Up to £60,000 + BenefitsDo you want to work with cutting edge technology? Grow or develop your skills in AI? - A space that could soon dominate the industry. All while working for a company who value your well-being as well as your...

City of London

Data Engineer

Role Title: Data EngineerDuration: 3 monthsLocation: Telford, hybridRate: up to £510.30 p/d Umbrella inside IR35Clearance required: SC is preferred but not essentialRole purpose / summaryOur client is seeking an experienced data engineer to join the data team, contributing to the development of data pipelines for a bespoke custom trade communication...

Telford

Data Engineer Role - (FTC) - Hybrid

Data focused tech business within the automotive industry.Role - Data EngineerType - Fixed term contract (6 months)Salary - £65,000 - £75,000Location - Hybrid, 2 days per week in the office (Victoria, London)Spec -PURPOSE OF POST:To implement scalable and efficient data models, databases, and processing systemsTo build robust, fault-tolerant data pipelines...

London

Data Engineer

Data EngineerLocation: DublinSalary: €(phone number removed)HybridReperio are working with a consultancy firm who are seeking a Data Engineer to join their growing Data team in Dublin. You will help to build, maintain, and scale their data infrastructure. As a core member of the data team, you will play a critical...

Dublin