Montreal, Canada or Remote
Data Engineer
Full-Time, Part-Time, Contractual
About the Role
We are seeking a talented Data Engineer with extensive experience in Python, SQL, NoSQL, and API creation and integration for near-live data streaming. The Data Engineer will play a vital role in designing, building, and maintaining data pipelines, ensuring the efficient flow of information within the organization. The ideal candidate will have a strong understanding of data architecture, a passion for working with large-scale data, and the ability to deliver insights that drive business success in the retail sector.
Responsibilities
-
Design and Develop Data Pipelines: Create and maintain scalable data pipelines using Python, ensuring that data is accessible, consistent, and reliable. Develop, construct, test, and maintain data architectures such as data lakes, databases, and large-scale processing systems.
-
Implement ETL Processes: Use AWS Glue and PySpark to implement ETL processes.
-
Create and Integrate APIs for Near-Live Data Streaming: Develop and manage APIs to facilitate near-live data streaming, integrating with various systems and platforms.
-
Collaborate with Data Architects: Work closely with data architects to implement data models, data lakes, and data warehouses, aligning with organizational goals and industry standards. Implement complex data projects.
-
Collaborate with Data Scientists: Assist in data-related technical issues and support their data infrastructure needs.
-
Implement Data Integration Solutions: Develop and manage data integration strategies, including pub/sub and data streaming services, to support various platforms and systems.
-
Optimize Performance: Monitor and optimize data systems' performance, ensuring smooth operations and optimal resource utilization.
-
Ensure Data Compliance: Establish data governance policies and adhere to regulatory requirements, maintaining data integrity and security.
-
Support AI and Machine Learning Initiatives: Collaborate with AI teams to provide data support for machine learning models and algorithms.
-
Technology Evaluation: Evaluate and implement new technologies and tools that align with the company's vision, including cloud platforms such as AWS, GCP, and Azure.
Skills
-
Python, PySpark
-
AWS Glue, S3, Data Lake
-
ETL processes
-
Terraform (plus)
Qualifications
-
Bachelor’s or Master’s degree in Computer Science, Information Technology, or related field.
-
3+ years of experience in data engineering, data modeling, or related areas.
-
Extensive experience in Python, with the ability to develop efficient data pipelines and processes.
-
Knowledge and experience with SQL and NoSQL databases, understanding how to manipulate and analyze data effectively.
-
Experience in API creation and integration for near-live data streaming, with a deep understanding of data synchronization and real-time processing.
-
Familiarity with data lake and data warehouse technologies, and experience with at least one large-scale data implementation.
-
Knowledge of major cloud platforms such as AWS, GCP, and Azure, and experience with pub/sub and data streaming services.
-
Strong understanding of the retail industry, with a focus on technology-driven solutions.