About your responsibilities for the Role:
- Perform data exploration, data cleaning, data imputation, and feature engineering on unstructured and structured data.
- Build the infrastructure for optimal extraction, transformation, and loading (ETL) of data from a wide variety of data sources.
- Develop and maintain optimal data pipeline architecture for training statistical and machine learning models such as regression and classification.
- Develop and maintain evaluations to measure the effectiveness of training data. This includes measuring the capabilities of models on a variety of tasks and domains.
- Collaborate with data scientists and machine learning engineers to develop a comprehensive data science/machine learning solution pipeline.
Requirements:What you need to have (Minimum Qualifications):
- Bachelor’s degree from computer science or related fields, or equivalent software engineering experience.
- Proficiency in Python programming language.
- Experience in dataset processing and feature engineering using tools such as Numpy, Pandas, and Scikit-Learn.
- Visualization skills using tools such as Matplotlib, Seaborn, and Bokeh.
- Understanding of deep learning frameworks such as Pytorch and TensorFlow.
- Understanding of SQL and NoSQL.
- Understands Hadoop / Spark / Kafka / Hive / Presto.
- Proficiency in source control i.e. Git.
Preferred:What would make you stand out from the crowd (Preferred Qualifications):
- Deep understanding of Object-Oriented Programming (OOP) concepts such as inheritance, delegation, and abstract class.
- Understanding of cloud-native technologies such as AWS, GCP, and Azure.
- Experience in using Docker.
- Experience in using AWS services such as S3, EC2, Glue, Sagemaker.
- Experience in AWS Step Function and/or AWS Lambda is even better.
- Proficiency in Scala and Java programming languages.
- Enjoy iterating quickly with research prototypes and learning new technologies.