Job PurposeAt Emirates, we believe in connecting the world, to and through, our global hub in Dubai; and in constantly innovating to ensure our customers ‘Fly Better’. Join our Observability team as a Senior Technical DevOps Engineer - Observability, where you will lead the implementation and administration of state-of-the-art monitoring tools for infrastructure, applications, and platforms.
The Senior Technical DevOps Engineer - Observability will act as a key technologist in the DevOps and Automation team, drive DevOps and Automation practices at product and program level. Engineers enterprise level technology solutions for CI/CD and Cloud automation through infrastructure as code, containerisation and configure management. Ensure robust Cloud engineering and DevOps practices for solution planning, technology development, deployment, release, hyper-care and operations of technology solution. Apply Site reliability principles to ensure availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning and act as a bridge between development and operations by applying a engineering mind-set to system administration.
In this role, you will:
- Lead the discovery phase of Cloud adoption/ migration by partnering with the Architects and Security to strategize well defined DevOps and Cloud engineering architectures.
- Lead the automation of change management, building resiliency, monitoring, capacity planning and emergency response for set of products that you are responsible for.
- Act as a lead for service reliability, scalability and performance for a set of products as part of system enablement team for Agile release trains. Primary lead who identifies and improves the lifecycle of products/services by addressing flow of value through the continuous deployment pipelines.
- Review deployment and operational environments, i.e., execute initiatives to reduce failure, troubleshoot issues across the entire infrastructure stack, expand monitoring capabilities, and manage technical operations.
- Collaborate with Release Engineering, Product Development and Security teams to develop change management strategies e.g. canary-style deployments, pushing out new releases without interrupting services, and rolling back features that demonstrate problems.
- Enable automation of everything possible and ensure continuous deployment pipeline is always in good health and is responsible for the technical design and high-quality coding for technology products with built in quality and production deployment ready.
- Build, maintain and operate technical environments for software development, continuous integration and continuous deployment based on cloud application design principles such as 12 factor apps.
- Embrace DevSecOps principles to advise and architect security solutions with engineering and product teams, ensuring all technologies and projects are robust and adhere to security standards.
- Be an advisor to enable the teams to drive corrective, adaptive, preventative and perfective maintenance of solutions automating environment management, service management, infrastructure management and cyber security aspects of the systems.
- Apply Site reliability principles to ensure availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning and act as a bridge between development and operations by applying an engineering mind-set to system administration.
QualificationTo be considered for this role, you must meet the below requirements:
- Degree or Honours (12+3 or equivalent) in Information Technology or relevant.
- 5+ years of experience in DevOps, Site Reliability Engineering (SRE), or similar roles.
Knowledge/Skills:
- Hands-on experience implementing and managing enterprise-class observability and self-healing solutions such as Splunk, Dynatrace, AppDynamics, Datadog, Grafana, and Automic Orchestrator.
- Strong hands-on experience in designing and implementing observability solutions across cloud-native, hybrid, on-premises, and mainframe environments (AWS, Azure, VMware, and data centres).
- Proven expertise in fully automated deployment pipelines, infrastructure automation, and Infrastructure as Code (IaC) using tools such as Terraform and Ansible.
- Strong experience in Observability as Code, including instrumentation and monitoring configuration for applications, databases, and infrastructure platforms.
- Strong scripting and automation skills using Python, Bash, PowerShell, Shell scripting, REX, and JCL, with experience integrating and consuming REST APIs.
- Experience with containerization and orchestration platforms such as OpenShift and Kubernetes.
- Familiarity with CI/CD pipelines and tools such as Jenkins and GitLab.
- Strong understanding of Application Performance Monitoring (APM), log analytics, monitoring concepts, dashboard creation, alert configuration, and metric collection.
- Strong analytical and troubleshooting skills with the ability to write and manage complex queries using PL/SQL, Splunk Processing Language (SPL), or equivalent query languages.
Other preferred skills:- SNMP and Network Monitoring using Zabbix.
- Strong administration experience of Linux & Windows Operating System.
- Knowledge of IBM Mainframe ZOS system, System Management Facility, RACF, Observability and Monitoring.
- Configure / create dashboards and alerts on Observability tools.
Leadership Role: No
Salary & benefitsJoin us in Dubai and enjoy an attractive tax-free salary and travel benefits that are exclusive to our industry, including discounts on flights and hotel stays around the world. Find out what it’s like to live and work in our fast-paced, cosmopolitan home city in the Dubai Lifestyle section of our website www.emirates.com/careers.