hero

The #1 Source for
In-Person NYC Tech Jobs

Build your future in the capital of everything.
Obviously New York.
companies
Jobs

Senior Software Engineer/SRE - Automated Disaster Recovery

Bloomberg

Bloomberg

Software Engineering
New York, NY, USA
Posted on Aug 20, 2025

The Team:

We are the Platform Database Services Disaster Recovery as a Service SRE team (DRaaS), charged to administer the end-to-end testing of Bloomberg's datacenters for disaster recovery scenarios of numerous services which support applications that constitute Bloomberg’s line of products! On any given day we're inventing, engineering, developing, building, coding, trouble-shooting and maintaining a wide range of: tools, monitors, frameworks, interfaces, protocols, solutions and best-practices around Disaster Recovery. These components stitch together a robust suite of automated and self-healing systems that manage the services that the Platform Database Services SRE team provides to the rest of the firm.

What's in it for you:

You will be part of a team that works to help meet company and regulatory defined Disaster Testing standards. Manage and develop solutions that support various disaster recovery tools, creating these applications to integrate the services they provide into the Bloomberg operational environment as well as Bloomberg products. This in-house tooling suite is required to test our clusters and managed services that reside in our datacenters and nodesites in an automated, scale-able and self driven fashion, complete with accompanying metrics and transparency tools that would be required for internal and external clients. Tooling is expected to be written with end-to-end unit testing and continuous integration to provide the highest level of stability.

We have product ownership and "the classic SRE responsibilities" such as: system tuning, performance analysis, defining and following availability targets such as SLA’s, SLO’s and SLI’s as well as having immediate access to the experts that are designing and coding the Bloomberg specific components, APIs and methods used by and supporting the disaster recovery infrastructure. You’ll receive insight and entry to the lowest levels of how Bloomberg applications interact with each other and the runtime environments for the purposes of both in-depth troubleshooting and enhancing stability, reliability, performance and feature-set.

You'll need to have:

  • 4+ years of experience in Python and/or TypeScript

  • A degree in Computer Science, Engineering or similar field of study or equivalent work experience

  • 5+ years experience with Unix, Unix tools and shell scripting

  • Experience designing stable, long-lasting APIs

  • Deep understanding of TCP/IP networking and the OSI model

  • Experience designing and automating repeatable processes in a client/server modeled environment

  • Ability to build and maintain highly sophisticated, available, performant, and scalable, critically important systems

  • Experience building monitors and alarms for system performance, status and stability

  • Experience with CI/CD systems and writing robust unit and system tests

We'd love to see:

  • Basic knowledge in Rapid framework

  • Experience analyzing existing systems and identifying shortcomings with proven methods for improvement

  • Experience with Chaos Engineering

  • Experience with Splunk/Humio and Grafana or other metric based reporting tools

  • Experience with GitHub and JIRA

  • Passion for product ownership