NEW
Clear Filters

Job Description

Your mission as a big data engineer at Zarrin Roya is to design, develop, and maintain distributed systems for processing large-scale datasets. Also, you are responsible for building a reliable, scalable, and efficient data infrastructure that empowers our teams to access data and run complex analytics in real-time.

Key Responsibilities:

  • Build and maintain scalable ETL pipelines for batch and real-time data processing.
  • Design and implement distributed SQL query engines using Trino and Apache Spark for large-scale data analytics.
  • Optimize data processing workflows and ensure real-time querying and analytics.
  • Work with data storage systems like Apache Iceberg, MinIO, and Ceph to ensure efficient data management and accessibility.
  • Collaborate with data scientists and analysts to deliver solutions for data modeling, data transformation, and real-time analytics.
  • Troubleshoot and optimize the performance of distributed data systems to meet SLA requirements.

Requirements:

  • Proficiency in Python or Scala for building data processing systems.
  • Experience with Apache Kafka or Redpanda for streaming data pipelines.
  • Solid understanding of Trino and Apache Spark for distributed SQL-based querying and real-time data processing.
  • Hands-on experience with Apache Iceberg for managing versioned data lakes.
  • Familiarity with object storage solutions like MinIO and Ceph for self-hosted environments.
  • Experience with ClickHouse or other OLAP systems for high-performance analytics.
  • Knowledge of data orchestration tools like Apache Airflow and dbt for managing workflows.
  • Strong problem-solving, analytical thinking, and Effective communication skills.