We are seeking an experienced DevOps Engineer with a strong focus on AWS cloud infrastructure to join our dynamic team. The ideal candidate will play a key role in designing and building a Platform as a Service (PaaS) to support our Data and Product Engineering teams in developing the next-generation Runtime. This role demands extensive knowledge of runtime environments, troubleshooting expertise, and the ability to lead and engage a team toward a common mission.
If you thrive in a collaborative, ownership-driven environment and possess a passion for building robust platforms, this is the role for you.Key Responsibilities:
- Platform Design & Development:
- Design, build, and manage scalable Platform as a Service (PaaS) solutions using Kubernetes, Nomad, and other technologies.
- Develop runtime environments tailored to the needs of Data and Product Engineering teams.
- Cloud Infrastructure Management:
- Design, deploy, and manage infrastructure using AWS services like EC2, S3, Lambda, RDS, and others.
- Automate infrastructure provisioning and management using Terraform.
- Observability & Monitoring:
- Implement and maintain observability tools, including Grafana, Prometheus, and Kibana, to ensure platform health and performance.
- Set up and refine alerting systems to promptly identify and address issues.
- Troubleshooting & Support:
- Provide advanced troubleshooting and root cause analysis for runtime and platform issues.
- Ensure high availability and reliability across platform services.
- Leadership & Collaboration:
- Lead and coordinate the team to align with the shared mission and goals.
- Foster a culture of shared ownership and accountability, ensuring all team members are engaged and aligned.
Required Qualifications:
- 5+ years of experience as a DevOps Engineer or similar role.
- Extensive hands-on experience with AWS cloud services (EC2, ECS, EKS, IAM, RDS, S3, CloudWatch, etc.).
- Proficiency in Infrastructure as Code (IaC) tools, particularly Terraform.
- Strong knowledge of Kubernetes, Nomad, or other orchestration tools for runtime environments.
- Expertise in observability and alerting tools like Prometheus, Grafana, and Kibana.
- Demonstrated ability to diagnose and resolve complex runtime issues.
- Experience leading and coordinating teams, with a focus on collaboration and shared mission.
Preferred Qualifications:
- Experience with CI/CD pipelines and tools like Jenkins, GitHub Actions, or GitLab CI/CD.
- Knowledge of security best practices in cloud environments, including IAM policies and compliance.
- Familiarity with scripting languages (e.g., Python, Bash) for automation and tooling.
- Exposure to HashiCorp Nomad or similar tools is a plus.
Soft Skills:
- Strong problem-solving and troubleshooting skills.
- Excellent communication and teamwork abilities.
- A proactive and ownership-driven mindset, contributing to a culture of accountability.