Site Reliability Engineer
Microsoft
Site Reliability Engineer
Multiple Locations, United States
Save
Overview
Do you have a passion for problem-solving and troubleshooting complex distributed systems? Are you enthusiastic about ample learning prospects and overcoming intricate challenges within a dynamic environment? If so, we encourage you to apply for this opportunity to join our team as a Site Reliability Engineer.
As part of our team, you’ll contribute to solving real-world distributed systems challenges in a dynamic cloud environment. Our platform supports services deployed across thousands of machines, containers, and microservices, spanning multiple regions and availability zones. Ensuring consistent and timely configuration delivery is key to keeping the platform secure, reliable, and responsive. This is an opportunity to work on high-impact problems with engineers who are passionate about scalability, reliability, and clean design. You'll be involved in building systems that other teams rely on every day—and you'll learn a lot doing it.
The position offers a entry point into cloud-scale infrastructure, giving you the chance to grow your technical skills while contributing to a foundational service within Azure. You’ll work alongside experienced engineers, writing production-quality code, participating in design discussions, and learning how to build and operate reliable distributed systems. You’ll gain hands-on experience with real-world cloud infrastructure challenges—from scaling services to maintaining system reliability across global deployments.
The team supports continuous learning and mentorship, offering a collaborative environment where you can deepen your engineering skills and grow your understanding of large-scale systems. With flexibility for up to 100% remote work, this role provides a solid foundation for long-term technical growth and career development.
Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.
Qualifications
Required/Minimum Qualifications:
- 4+ years technical experience in software engineering, network engineering, or systems administration
- OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 1+ year(s) technical experience in software engineering, network engineering, or systems administration
- OR Master's Degree in Computer Science, Information Technology, or related field.
- 2+ years of coding/designing experience in distributed systems cloud based services.
- 1+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, or Rust.
Other Requirements:
- Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings:
- Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.
Additional or Preferred Qualifications:
- 5+ years technical experience in software engineering, network engineering, or systems administration
- OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in software engineering, network engineering, or systems administration
- OR Master's Degree in Computer Science, Information Technology, or related field AND 1+ year(s) technical experience in software engineering, network engineering, or systems administration.
- 3+ years of coding/designing experience in distributed systems cloud based services.
- 2+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, or Rust.
#azurecorejobs
Responsibilities
- Develops, tests, troubleshoots, and implements changes to optimize code and improve products.
- Participates in onboarding, code/design reviews, and regular meetings with the engineering teams that develop and manage those products.
- Independently develops code or scripts that automate the performance of repetitive and easily scalable operations processes.
- Designs, develops, and maintains telemetry pipelines and monitoring tools that detail operations metrics.
- Responds to incidents during regular on-call rotations.