Senior Lead Engineer – Data Engineering
We are seeking a Senior Data Engineer with in-depth knowledge of Databricks and Unity Catalog to serve as the subject matter expert for all things Databricks within our organization. This role requires deep expertise in Databricks, including CI/CD setup for data bricks, data lineage through Unity Catalog, and strong proficiency in ETL, SQL, and modern data engineering practices. You will be the go-to person for designing, implementing, and optimizing data solutions with Databricks.
Key Responsibilities:
- Serve as the point of contact and subject matter expert for all Databricks-related activities, including architecture, development, and operational best practices.
- Should work closely with Sales team, propose data roadmap to prospects intending to migrate to cloud, create proof of concepts to showcase our expertise.
- Design, develop, and manage ETL/ELT pipelines in Databricks using Python (PySpark), integrating various data sources to support business operations.
- Leverage Unity Catalog to ensure data lineage, security, and governance are properly managed across the Databricks environment.
- Implement and maintain CI/CD pipelines for Databricks, ensuring smooth deployments, version control, and automation using Git and other DevOps tools.
- Build scalable data architectures, including Data Lakes, Lakehouses, and Data Warehouses, ensuring efficient data management and accessibility.
- Configure and optimize Databricks clusters, jobs, and workflows for both batch and streaming data processing to handle large-scale datasets.
- Stay up-to-date with the latest Databricks features and advancements, continuously enhancing our data engineering practices.
- Collaborate with cross-functional teams to implement data governance and ensure compliance with security and industry regulations.
- Monitor and tune Databricks workloads to ensure high performance and scalability, adapting to business needs as required.
- Provide training, guidance and mentorship to fellow cloud engineers, ensuring adherence to best practices and fostering a collaborative environment.
Qualifications
- 5+ years of experience in data engineering with significant expertise in Databricks and Apache Spark.
- Proficient in Unity Catalog for managing data lineage, security, and governance within the Databricks ecosystem.
- Experience of estimating and migrating legacy data warehouse workloads to Azure/Hybrid Cloud.
- Proficient in Unity Catalog for managing data lineage, security, and governance within the Databricks ecosystem.
- Experience building and optimizing ETL pipelines using tools like Azure Data Factory, Informatica, or similar.
- Strong understanding of CI/CD practices with experience in Git for version control and integration with Databricks.
- Expertise in SQL development and performance tuning for large-scale datasets.
- Knowledge of the Azure ecosystem, including data services like Azure Data Factory, Azure Data Lake and Azure Storage.
- Ability to work with both batch and streaming data processing pipelines.
- Experience with data modeling and dimensional design (e.g., star schema).
- Good understanding of data governance, compliance, and security best practices.
- Excellent communication and problem-solving skills, with the ability to manage multiple priorities.
- Ability to stay current on Databricks innovations and proactively introduce new features and capabilities to the team.