As a Data Engineer Intern you will develop and optimize ETL/ELT pipelines to move data from HR source systems to our Data Warehouse. Key projects include managing data orchestration with Astronomer (Airflow), querying large datasets across S3 using Trino/Athena, and refining the DWH schema for better performance.
Orchestration: Basic understanding of workflow tools like Airflow/Astronomer to schedule and monitor data jobs.
Cloud Storage: Experience navigating and managing data objects within AWS S3.
Processing: Familiarity with using Trino or Athena for distributed SQL querying and data exploration.
Advanced SQL: Proficiency in writing complex queries, including Common Table Expressions (CTEs) and window functions.
Schema Design: Understanding Data warehousing schemas
Scripting: Basic Python skills for interacting with APIs and automating data movements.
Nice to have
Experience with dbt