Distributed Systems - Site Reliability Engineer (SRE)

Cupertino, CA
  • Job Code
    200057184
Summary

Summary

Posted: May 2, 2019

Weekly Hours: 40

Role Number: 200057184

The Software Engineering Operations team within Software Delivery is looking for Site Reliability Engineers to maintain and improve services that enable thousands of Apple engineers to develop the software products that delight millions of Apple customers. In this position, you will have the opportunity to work with a group of top notch systems engineers from related but different backgrounds that fosters a culture of innovation and continuous improvement. To be successful in this role, the candidate must be hands-on, proactive, good at problem solving and have a strong desire to learn and work towards excellence

This job will provide you with:

A team of highly skilled coworkers ready to both mentor and learn from you
Unique distributed computing problems with an open mind on how they can be solved.
The opportunity to collaborate with talented engineering teams across a wide range of technology disciplines
The freedom to take ownership and drive meaningful improvements in the operational reliability of mission critical services

Key Qualifications

  • Passion for continually learning and exploring new technologies
  • Well versed in Linux and macOS systems management
  • Familiar with application and service monitoring tools and techniques
  • Knowledge of cloud platforms and virtualization technologies
  • Development experience with Python, Ruby, Scala or Go
  • Involvement with incident management and response
  • Excellent collaborative skills, with strong written and verbal communication

Description

Responsibilities will include:
Identify sources of instability in distributed systems and drive operational excellence
Monitor and stress test systems to collect metrics for tuning and capacity planning
Reduce the burden of toil with iterative development of tooling and automation
Collaborate with engineering teams to release new features and become an authority on our services
Participate in on-call rotation

Education & Experience

B.S. or equivalent experience in a technical discipline

Additional Requirements

  • These are not hard requirements but this position might be of interest if you have experience with or a desire to learn about:
  • Cloud orchestration technologies such as Mesos or Kubernetes
  • Virtualization platforms such as KVM, Docker, and Qemu
  • Object and distributed block storage technologies such as S3 or Ceph
  • Splunk, Grafana, Graphite or other monitoring tools
  • Puppet, Ansible or other configuration management tools
  • Understanding of server hardware and tools such as HP iLO and IPMI to monitor for hardware failures


Before you go...

Our free job seeker tools include alerts for new jobs, saving your favorites, optimized job matching, and more! Just enter your email below.

Share this job:

Distributed Systems - Site Reliability Engineer (SRE)

Apple, Inc.
Cupertino, CA

Join us to start saving your Favorite Jobs!

Sign In Create Account