Job Description
• Data Engineering: Strong foundation in data engineering principles, ETL/ELT processes, and data pipeline design patterns
• PySpark: Proven hands-on experience developing data pipelines using PySpark, including DataFrames API, Spark SQL, and performance optimization
• Databricks Platform: Practical experience with Databricks workspace, cluster management, notebooks, and job orchestration
• Workspace AI Agent: Knowledge of Databricks Workspace AI Agent capabilities and integration
• Data Modelling: Experience implementing data models including dimensional modelling, data vault, or Lakehouse architectures
• Delta Lake: Understanding of Delta Lake features including ACID transactions, schema evolution, and optimization techniques
• Python: Strong Python programming skills for data processing and automation
• SQL proficiency for data querying and transformation
• Experience with cloud platforms (Azure, AWS, or GCP)
• Knowledge of streaming data processing (Structured Streaming)
• Familiarity with DevOps practices and CI/CD pipelines
• Experience with version control systems (Git)
Requirements
• Must have experience in data engineering or related roles
• Hands-on experience with Databricks platform
• Proven track record of refactoring legacy code to modern frameworks
• Experience building and maintaining production data pipelines at scale
• Background working across multiple data sources and formats
• Experience in agile development environments
• Databricks Certified Data Engineer Associate OR Databricks Certified Data Engineer Professional
Additional Certifications (Preferred)
• Databricks Certified Associate Developer for Apache Spark
• Cloud platform certifications (Azure Data Engineer Associate, AWS Certified Data Analytics, or Google Cloud Professional Data Engineer)
• Relevant data engineering or big data certifications