hero

The #1 Source for
In-Person NYC Tech Jobs

Build your future in the capital of everything.
Obviously New York.
companies
Jobs

Senior Software Engineer - Reliability - Artificial Intelligence

Bloomberg

Bloomberg

Software Engineering
Posted on Jan 17, 2026

Bloomberg’s Engineering AI department has 400+ AI practitioners building highly sought after products and features that often require novel innovations. We are investing in AI to build better search, discovery, and workflow solutions using technologies such as transformers, gradient boosted decision trees, large language models, and dense vector databases. We are expanding our group and seeking highly skilled individuals who will be responsible for contributing to the team (or teams) of Machine Learning (ML) and Software Engineers that are bringing innovative solutions to AI-driven customer-facing products.

At Bloomberg, we believe in fostering a transparent and efficient financial marketplace. Our business is built on technology that makes news, research, financial data, and analytics on over 35 million financial instruments searchable, discoverable, and actionable across the global capital markets.

Bloomberg has been building Artificial Intelligence applications that offer solutions to these problems with high accuracy and low latency since 2009. We build AI systems to help process and organize the ever-increasing volume of structured and unstructured information needed to make informed decisions. Our use of AI uncovers signals, helps us produce analytics about financial instruments in all asset classes, and delivers clarity when our clients need it most.

About the Team

Bloomberg’s AI is increasingly embedded in premiere client-facing applications. The AI Resilience & Insights (AIRI) team is being formed to raise reliability across ENG AI, unify reliability standards, and make reliability visible through client-impact measurement and clear operational insights.

The Role

We’re hiring the first Senior Software Engineer (Reliability) on our AI Resilience & Insights team. You’ll build the foundations that help detect issues earlier, respond faster, and prevent repeat incidents—starting with a new generative AI-powered chat function being integrated into the Bloomberg Terminal.

As the AI department expands into agentic and tool-driven systems, you’ll help define how reliability is measured and improved for multi-step workflows and external dependencies, including LLM providers.

What’s in it for you:

  • Define how we measure reliability for key AI user experiences, and roll that measurement out with service owners.
  • Instrument generative AI-powered conversational agent with real user monitoring and client error tracking so we can see failures the way clients do.
  • Improve alert quality so alerts are actionable and tied to client impact.
  • Standardize incident response practices across ENG AI (runbooks, readiness checks, post-incident learning).
  • Build dashboards that connect user impact to the underlying drivers, giving teams a clear view of what matters.
  • Strengthen resilience around upstream dependencies, including external model providers, using pragmatic controls like timeouts, retries, and fallbacks.
  • Participate in a secondary on-call rotation after ramp, focused on strengthening systems through automation and engineering.

What success looks like within 6–12 months:

  • Reliability measurement is in place across top AI experiences, starting with ASKB.
  • Client-impact signals are trusted and used to detect and prioritize issues.
  • Alerting is cleaner and more actionable, reducing noise and speeding up resolution.
  • Incident response is more consistent, with fewer repeat issues and faster learning.

You’ll need to have:

  • Strong software engineering skills in Python and/or Go, with experience building production systems and automation.
  • Ability to debug distributed systems and improve reliability through instrumentation and engineering.
  • Familiarity with observability, incident response, and building tools that reduce toil.
  • Strong collaboration skills and good judgment to balance “push standards” vs “enable teams.”
  • 5+ years of relevant engineering experience.

Nice to have, bonus points for:

  • Experience with Grafana, OpenTelemetry, Kubernetes, and Infrastructure-as-Code.
  • Experience working with client telemetry or real user monitoring.
  • Exposure to external AI/LLM providers and building resilient integrations.
  • Interest in reliability for agent/tool systems and multi-step AI workflows.

Why This Role?

You’ll be a founding engineer shaping how Bloomberg delivers reliable AI experiences for clients. The work is highly visible, tied directly to real-world product impact, and centered on engineering solutions that improve reliability at scale.

We give back to the technology community and you can read more about our outreach at: http://www.techatbloomberg.com/ai