Startale
Web3 for Billions
Remote
Full Time

Data Infrastructure Engineer (Big Data)

Backend Engineer -Data

Key Responsibilities

  • Big Data Processing: Implement and manage big data processing systems. Experience or strong interest in big data implementation is required.
  • Data Pipeline Implementation: Develop and maintain robust data processing pipelines. Candidates should have experience or a strong interest in data pipeline architectures.
  • Batch vs. Real-Time Processing: Clearly articulate the differences in implementation strategies between batch processing and real-time APIs.
  • Streaming Data Responsibilities: Explain the separation of responsibilities between producers and consumers in streaming data processes.
  • Database Management: Build and operate databases storing over 10 GiB of data, ensuring efficiency and scalability.
  • Data Platform Operations: Operate platforms such as Amazon Redshift, Google BigQuery, Snowflake, and Databricks. Experience in managing these or similar platforms is highly desirable.

Qualifications and Skills

  • Experience with Big Data: Proven track record in handling large-scale data projects, with specific skills in time-series databases, streaming data processing, and multi-tiered database architectures.
  • Data Warehousing and Data Lakes: Hands-on experience with data warehouse and data lake technologies, including understanding of Lambda architecture.
  • Technical Proficiency: Strong technical skills in relevant big data technologies and frameworks.
  • Problem Solving: Excellent analytical and problem-solving skills, capable of managing complex data challenges.
  • Communication: Effective communication skills, able to document and explain data processes clearly to both technical and non-technical stakeholders.

Nice to Have

  • Cloud Experience:
    • Experience with cloud platforms such as AWS, Google Cloud Platform (GCP), or Microsoft Azure.
    • Knowledge of cloud-based data storage solutions (e.g., S3, Google Cloud Storage, Azure Blob Storage).
    • Familiarity with cloud-based data processing services (e.g., AWS Lambda, Google Cloud Dataflow, Azure Data Factory).
    • Experience with cloud infrastructure automation and management tools (e.g., Terraform, CloudFormation, Ansible).
  • Machine Learning Integration: Understanding of integrating machine learning models into data pipelines.
  • DevOps Practices: Experience with DevOps practices and tools for continuous integration and deployment (CI/CD).
  • Data Security: Knowledge of data security best practices and compliance standards in cloud environments.
  • Visualization Tools: Experience with data visualization tools and platforms (e.g., Tableau, Power BI, Looker).
  • Programming Languages: Proficiency in additional programming languages relevant to data processing and backend development (e.g., Scala, Go, Rust).