Job Description
Your mission as a big data engineer at Zarrin Roya is to design, develop, and maintain distributed systems for processing large-scale datasets. Also, you are responsible for building a reliable, scalable, and efficient data infrastructure that empowers our teams to access data and run complex analytics in real-time.
Key Responsibilities:
- Build and maintain scalable ETL pipelines for batch and real-time data processing.
- Design and implement distributed SQL query engines using Trino and Apache Spark for large-scale data analytics.
- Optimize data processing workflows and ensure real-time querying and analytics.
- Work with data storage systems like Apache Iceberg, MinIO, and Ceph to ensure efficient data management and accessibility.
- Collaborate with data scientists and analysts to deliver solutions for data modeling, data transformation, and real-time analytics.
- Troubleshoot and optimize the performance of distributed data systems to meet SLA requirements.
Requirements:
- Proficiency in Python or Scala for building data processing systems.
- Experience with Apache Kafka or Redpanda for streaming data pipelines.
- Solid understanding of Trino and Apache Spark for distributed SQL-based querying and real-time data processing.
- Hands-on experience with Apache Iceberg for managing versioned data lakes.
- Familiarity with object storage solutions like MinIO and Ceph for self-hosted environments.
- Experience with ClickHouse or other OLAP systems for high-performance analytics.
- Knowledge of data orchestration tools like Apache Airflow and dbt for managing workflows.
- Strong problem-solving, analytical thinking, and Effective communication skills.