Skip to content

Site Reliability Engineer

Signal AI

UKhybrid$50k/yrPosted May 29, 2026

At a glance

Highlights

  • Strategic AI-augmented operations focus
  • Hands-on impact on platform reliability
  • Acquisition integration with technical complexity
  • Cost-saving initiatives with measurable results
  • Collaborative infrastructure team environment

Heads up

  • On-call rotation required
  • Multi-quarter acquisition integration

Why this role might suit you

The position provides influence over a strategic AI-augmented operations initiative, exposure to cutting-edge infrastructure at a fast-growing AI company, and the opportunity to lead high-impact reliability projects that directly affect platform performance and cost efficiency.

Skills

awsterraformpythongomonitoringincident-responsecapacity-planningcost-analysissecuritylinuxnetworkingvpcdnstcpipeksclaude-enterprise

About the role

We're on a mission to change the way businesses make decisions with our cutting-edge AI technology. To achieve that, we’re looking for passionate people to join our open and inclusive workplace. Our inclusive environment welcomes skills and experiences from diverse backgrounds, and defines who we are.

We're hiring an SRE to help us run and evolve the infrastructure behind Signal AI's decision intelligence platform.

You'd be joining a small, collaborative Infrastructure team at a moment when the work is genuinely changing shape. Over the last year we've hardened the platform, reduced cost, and built serious observability into our highest-volume systems. The next year is about scaling that work, absorbing infrastructure from a recent acquisition, and being thoughtful about how AI shows up in operational work: not as a gimmick, but as a tool we trust ourselves to use well.

We're looking for someone who wants to shape the direction of the team; someone who brings curiosity and care to the work, and who wants to leave things meaningfully better than they found them.

What we've shipped recently

Cut ~$50k/year off our Elasticsearch bill by migrating compute to more efficient chips. (Apr 2026)

Built the foundation for our MCP server platform: leveraging and contributing to open-source tooling to give the whole company extensible, production-grade AI integrations. (2025–2026)

Rebuilt production from scratch in a full DR gameday. End-to-end restore validated across our multi-account AWS setup. (Jan 2026)

What we're working on next

AI-augmented operations: Claude Enterprise is deployed across Signal. We want this team to help define what good looks like for SRE: incident triage, runbook generation, capacity planning, cost analysis. This is a strategic investment, not a side project: and we'd love someone genuinely curious about what these tools can and can't do.

Security in the age of AI The threat landscape has shifted. Supply chain security is more at threat than ever, and powerful models are emerging that promise to change how the industry thinks about security. We're looking for someone interested in thinking seriously about what actually matters to protect now.

Acquisition integration: Bringing a recently acquired product's infrastructure under our reliability, security, and operational standards. A substantial, multi-quarter piece of work with real technical and organisational complexity, and plenty of room to make your mark.

Batch workload consolidation: Moving disparate batch jobs onto EKS for unified scheduling, cost visibility, and operational tooling.

Your first six months

We want to set you up to thrive. Here's what that looks like in practice:

Month 1: You're onboarded across our AWS estate, Terraform, and observability stack. You've completed your first on-call shift with support from the team, landed your first PR in the DevOps repo, and started working Claude Enterprise into your daily flow.

Month 3: You're owning a workstream end-to-end. You've led the SRE response to at least one production incident and hosted your first post-mortem. You’ve surfaced a real opportunity that you've pushed to a measurable result.

Month 6: You're driving a multi-quarter workstream with clear direction, and you're contributing insights to our AI-in-operations playbook: including where Claude adds real leverage and where it doesn't.

What we’re looking for

You have solid AWS and Terraform experience, and you're comfortable writing Python or Go to solve operational problems. You think in distributed systems: failure modes, observability, blast radius: and you take problems end-to-end rather than stopping at the edges of your own work.

You're pragmatic about AI tooling. Not evangelical, not dismissive. You can tell us when you'd reach for an LLM and when you wouldn't, and you'd have a clear reason either way.

You communicate openly and you're comfortable pushing back when you think something could be better. We want to leverage your experience and perspective to grow our platform.

We know not every strong candidate will have every skill on this list. If you're excited about the work and you're close on the experience, we'd encourage you to apply.

Nice to haves

Networking depth. You're comfortable below the load balancer: TCP/IP fundamentals, DNS, VPC design, and what actually happens when a service can't reach another one.

Operational security instincts. You follow the threat landscape with genuine interest: not just CVEs, but shifts in how attacks happen and how the industry is responding. You have a point of view on what actually matters right now.

Linux internals comfort. When something behaves strangely under load, you know where to look.

Communication across technical levels. You can collaborate with your infrastructure teammates and explain the same concepts clearly to a product manager. You've worked alongside colleagues with a wide range of technical backgrounds and adapted naturally.

Not sure you meet every requirement? Studies show that women and other underrepresented groups often hesitate to apply unless they check every box. At Signal AI, diverse perspectives strengthen our teams, drive innovation, and lead to better performance. So even if your background doesn’t align perfectly with each qualification, we encourage you to apply if you’re passionate about this role.

We're dedicated to creating an inclusive environment where every Signaller feels welcomed, valued, and heard—a place where you can truly thrive as yourself.

Compensation Range: £70K - £85K

Compensation

This DevOps / SRE role pays $50k/yr. Within typical range for devops / sre roles in United Kingdom.

Questions about this role

  • How do I apply to this Site Reliability Engineer role at Signal AI?

    Click "Apply with AI Applyd" above. We auto-fill the application from your resume and answer screening questions in seconds. No copy and paste, no juggling tabs.

  • What's the typical salary for DevOps / SRE in United Kingdom?

    Compensation for DevOps / SRE roles in United Kingdom varies widely by seniority, employer size, and remote vs onsite arrangement. Check the salary range on this listing when published, or browse our DevOps / SRE hub for United Kingdom medians across recent openings.

  • How fast does AI Applyd auto-apply?

    Most applications complete in under 90 seconds. You can track the status in your dashboard and watch the screenshot proof land the moment the application submits.

  • What ATS does Signal AI use?

    AI Applyd supports Greenhouse, Lever, Ashby, Workday, iCIMS, SmartRecruiters, LinkedIn Easy Apply, and most other ATS platforms. If we can submit through the platform, we do.

Want AI Applyd to auto-apply to roles like this?

We tailor your resume per posting, fill the forms, and track replies for you.