RemoteFull Time
Data Infrastructure Engineer (Big Data)
Backend Engineer -Data
Key Responsibilities
- Big Data Processing: Implement and manage big data processing systems. Experience or strong interest in big data implementation is required.
- Data Pipeline Implementation: Develop and maintain robust data processing pipelines. Candidates should have experience or a strong interest in data pipeline architectures.
- Batch vs. Real-Time Processing: Clearly articulate the differences in implementation strategies between batch processing and real-time APIs.
- Streaming Data Responsibilities: Explain the separation of responsibilities between producers and consumers in streaming data processes.
- Database Management: Build and operate databases storing over 10 GiB of data, ensuring efficiency and scalability.
- Data Platform Operations: Operate platforms such as Amazon Redshift, Google BigQuery, Snowflake, and Databricks. Experience in managing these or similar platforms is highly desirable.
Qualifications and Skills
- Experience with Big Data: Proven track record in handling large-scale data projects, with specific skills in time-series databases, streaming data processing, and multi-tiered database architectures.
- Data Warehousing and Data Lakes: Hands-on experience with data warehouse and data lake technologies, including understanding of Lambda architecture.
- Technical Proficiency: Strong technical skills in relevant big data technologies and frameworks.
- Problem Solving: Excellent analytical and problem-solving skills, capable of managing complex data challenges.
- Communication: Effective communication skills, able to document and explain data processes clearly to both technical and non-technical stakeholders.
Nice to Have
- Cloud Experience:
- Experience with cloud platforms such as AWS, Google Cloud Platform (GCP), or Microsoft Azure.
- Knowledge of cloud-based data storage solutions (e.g., S3, Google Cloud Storage, Azure Blob Storage).
- Familiarity with cloud-based data processing services (e.g., AWS Lambda, Google Cloud Dataflow, Azure Data Factory).
- Experience with cloud infrastructure automation and management tools (e.g., Terraform, CloudFormation, Ansible).
- Machine Learning Integration: Understanding of integrating machine learning models into data pipelines.
- DevOps Practices: Experience with DevOps practices and tools for continuous integration and deployment (CI/CD).
- Data Security: Knowledge of data security best practices and compliance standards in cloud environments.
- Visualization Tools: Experience with data visualization tools and platforms (e.g., Tableau, Power BI, Looker).
- Programming Languages: Proficiency in additional programming languages relevant to data processing and backend development (e.g., Scala, Go, Rust).