Minimum Qualifications
- 5+ years of relevant experience as a data engineer.
- 2+ years of relevant experience as senior/staff/principal data engineer.
- Excellent knowledge of Python, SQL, and Apache Spark.
- Excellent knowledge of at least 1 real-time data processing framework such as Spark Streaming, Flink, or Kafka.
- Demonstrated ability to design and build high-volume batch and streaming pipelines.
- Demonstrated ability to design and build scalable data infrastructure.
- Working experience with designing and implementing data quality checks and alerting systems.
- Working experience in optimizing SQL queries in OLAP cluster (e.g. data partitioning, bucketing, z-ordering, indexing).
- You are coachable. Able to own mistakes, reflect, and take feedback with maturity and a willingness to improve.
- Strong written and verbal communication skills.
- Bachelor’s degree in a technical field or equivalent work experience.
Preferred Qualifications
- Working experience in developing a Python library used by internal teams. Including best practices for development and deployment.
- You have a good knowledge of Terraform, CloudFormation, or other IaaC tools.
- You have built data products that have scaled on AWS or another cloud.
- You have experience with modern data tools such as Airflow, dbt, Kafka, Trino, Databricks, Looker, or similar.
- You have experience with different databases or SQL engines and understand their trade-offs (e.g. Trino, Druid, PostgreSQL, MongoDB, etc.).
- You have worked with sensitive data and ensured secure access through a data governance tool.
- You have worked on building a Data platform that enables stakeholders to self-serve.
Responsibilities
- Improve data pipeline logic to ensure reliability, scalability, and cost efficiency (Python, Spark, Airflow).
- Ensure fast and reliable execution of analytical queries by building robust OLAP cluster (Trino, Terraform, Databricks SQL warehouse).
- Design and implement a data governance policy to minimize security risks (Unity Catalog, Apache Ranger).
- Own design and development of Data engineering’s internal library (Python, Spark, dbt).
- Enable internal teams to build real-time applications on top of the Data lakehouse. (Spark streaming, Kafka)
- Automate common data requests and unlock self-service (Retool, Flask).
- Ensure high data quality through automated tests and data contracts (dbt, Great expectations).
- Improve deployment process for various applications (Buddy).
- Collaborate with other data engineers, analysts, and business users to design effective solutions.
- Guide junior engineers and set engineering standards for the team.
- Minimize detection and recovery time from incidents and ensure the team meets important metrics and SLOs.
- Research innovative technologies and integrate them into our data infrastructure.
- Do whatever it takes to make Xendit succeed.