Search for more jobs
Platform Reliability Engineer
Posting date: 08/16/2024
Location: Chicago, IL
Job Requisition: 391366_external_USA-IL-Chicago
Address: USA-IL-Chicago-300 South Riverside Plaza
Store Code: Greenville Data Center - It (5118640)
Ahold Delhaize USA, a division of global food retailer Ahold Delhaize, is part of the U.S. family of brands, which also includes five leading omnichannel grocery brands - Food Lion, Giant Food, The GIANT Company, Hannaford and Stop & Shop. Ahold Delhaize USA associates support the brands with a wide range of services, including Finance, Legal, Sustainability, Commercial, Digital and E-commerce, Technology and more.
Primary Purpose
Platform Reliability Engineer will help ensure service availability, identifying and automating manual processes, and bridging the gaps between product development teams and operations. Implementing operational improvements in availability, latency, performance, efficiency, change management, monitoring, incident response, patch management and capacity planning are all within scope for this role. Whether it's done through code, the introduction of modern tools, and/or better processes continuous improvement and efficiency is the goal.
You'll provide operational excellence with troubleshooting skills, ownership in supporting various Azure services
Our flexible/hybrid work schedule includes 3 in-person days at one of our core locations and 2 remote days.
Applicants must be currently authorized to work in the United States on a full-time basis until the end of their appointment.
Duties and Responsibilities
- Builds, manages, and operate Azure Core Services with automation and infrastructure as code
- Manages, and operates the continuous delivery framework and tools, manages, and automates the lifecycle of the different cloud platform components and help support product teams
- Leverage cloud architecture, applying site reliability principles, full-stack troubleshooting skills across network, application, security, Identity, OS, Containers, on-prem, and distributed services layers.
- Provide reasoning about system & application architecture as well as be comfortable looking at code and offering feedback on how it can be improved to increase reliability.
- Identify opportunities and drive the implementation of automation to improve patch management, service health, manageability, reliability, and telemetry.
- Own, triage, investigate and resolve service issues with an emphasis on broad communications, learning & teaching throughout the process
- Design process or technology solutions that monitor, identify, and resolve platform, system, deployment, and environmental issues both prior & post production releases, and ensure measurable improvements against Service KPIs.
- Drive Security and compliance aspects for services in accordance with Azure compliance requirements.
- Engage in service capacity planning, demand forecasting and work towards Azure cost optimizations.
- Create and document Runbooks, Operational procedures, and Standards on confluence
- Communicate on a deeply technical level with product engineering, project management and product teams to improve and optimize products, improve infrastructure, and evolve services.
- Work within a project management/agile scrum teams in a support role as part of a wider team
- Remain current on new technologies, methods and procedures including, but not limited to, coding practices such as Test Driven Development, Continuous Integration, Continuous Deployment and Operational excellence
Required Qualifications
- Bachelor's Degree in Computer Science, Information Technology, Engineering, or related field
- 7+ years of IT experience focused on infrastructure which includes server, storage, network, security, Identity
- 3+ years of experience supporting, maintaining, and automating Azure environments
- 2+ years of experience using IaC tools (ARM, Terraform, JSON,YAML, PowerShell, Github etc...)
- Production experience in Cloud technologies - Azure IaaS, PaaS, networking, Azure functions, Azure automation and runbooks, workbooks, Insights, Security center, Azure Monitor, Log Analytics.
- Ability to read, write, configure, design, and script end-to-end service telemetry, alerting and self-healing capabilities for platform services
- Ability to work in an Extreme Programming environment and work in a paired programming/operating model
- Able to facilitate diverse teams, multi-task, and work under pressure to meet aggressive schedule targets
- Hands on experience with IaC tools like ADO, ARM, terraform, ansible, PowerShell, python, azcli, github
- Experience in service capacity planning, demand forecasting, software performance analysis and system tuning
- Technical and Operational expertise in Windows/Linux/VMware/Hyper-V/AKS, SQL and N0-SQL DB's, IaaS, PaaS, FaaS, Data, BCDR, Security, Management, Storage, Networking, Monitoring, Identity and Connectivity
- Experience managing and maintaining code repos, build systems, and CICD pipelines
- Experience in infrastructure and configuration as code, as well as service auto-scale capabilities.
- Worked in Devops and Agile environments, Blend of both Development and SRE mindset
- Systematic problem-solving and troubleshooting skills coupled with a strong sense of ownership and drive.
- Participate in on call rotation. Participate, collaborate, and provide guidance in retrospectives.
- At least 4 years of hands-on operational experience supporting the following or related experience:
- Azure Virtual Network, VWAN, Express route, Load Balancer (L4/L7), Traffic Manager, CDN, Azure DNS, routing & routing protocols like BGP, firewall concepts
- Azure Identity including any of the following: Azure AD, PIM, Conditional Access, MFA, Azure AD Connect, Password less sign-ins, Microsoft Defender, key vault
- Azure Governance, Security, Monitoring, Workbooks, Compliance, and cost awareness
- Azure Virtual Machines, Containers and/or Kubernetes and/or OpenShift (infrastructure perspective)
- Azure Storage Account, Disk, Snapshot, Backup, Site Recovery, file sync, Data Lake
Preferred Qualifications
- Certification in Azure Administrator -required, Azure DevOps -preferred, Azure Solutions Architect -preferred
#LI-Hybrid #LI-CW1 #DiceJobs
Ahold Delhaize USA, a division of global food retailer Ahold Delhaize, is part of the U.S. family of brands, which also includes five leading omnichannel grocery brands - Food Lion, Giant Food, The GIANT Company, Hannaford and Stop & Shop. When considered together, the companies of Ahold Delhaize USA comprise the largest grocery retail group on the East Coast and the fourth largest grocery retail group in the nation, serving millions of omnichannel customers each week.
Read full job description â–¼