As a Data Engineer, you will play a crucial role in designing, developing, and maintaining the data architecture, infrastructure, and systems required for efficient data processing, storage, and retrieval. Your primary responsibility will be to ensure the availability and reliability of data for use by data analysts, data scientists, and other stakeholders within the organization.
Job Requirements:
- Experience as a Data Engineer, Data Warehouse Engineer, or ETL Developer for a minimum 2 years of experience
Programming Languages
- Python: Widely used for data processing, ETL (Extract, Transform, Load) tasks, and scripting.
- Java/Scala: Commonly used for distributed computing frameworks like Apache Spark.
Big Data Technologies
- Apache Hadoop: Distributed storage and processing framework.
- Apache Spark: In-memory data processing for large-scale data processing and analytics.
- Apache Kafka: Distributed streaming platform for building realtime data pipelines.
- Apache HBase: Distributed, scalable, NoSQL database for fast response inquiry data.
Databases
- SQL/RDBMS (Structured Query Language): Essential for working with relational databases.
- NoSQL Databases (e.g., MongoDB, Cassandra): Useful for handling unstructured or semi-structured data.
Data Warehousing
- Dimensional Modelling: Designing data warehouses for efficient querying and reporting.
- Data profiling, validation, and cleansing techniques: Ensuring data accuracy and quality.
- Unit testing and integration testing for ETL processes.
- Apache NiFi, Apache Airflow, Talend, or Informatica, Pentaho, Talend: Tools for Extracting, Transforming, and Loading Data.
- Experience using Google Cloud Platform or AWS
- Tableau, Power BI, etc: Reporting and dashboarding tools
- Shell scripting: Automating routine tasks and processes.