Lead Site Reliability Engineer

Capital One

Ciudad De México, MXonsitePosted Jun 26, 2026

Skills

kubernetesdatadogdockerpythonazurecicdjavaawsgo

About the role

WeWork Reforma Latino (97001), Mexico, Ciudad de Mexico, Ciudad de Mexico

Lead Site Reliability Engineer

We're building a Site Reliability Engineering center in Mexico City, and we're hiring a Manager-level Backend Engineer to own the reliability and operational maturity of our settlement platforms. These are batch-critical systems that process every credit and debit transaction across the network.

This is a foundational role. You'll be one of the first engineers in CDMX responsible for ensuring settlement cycles complete accurately, on time, and in compliance with SOX and PCI-DSS requirements. You'll work across hybrid infrastructure (on-prem data centers and AWS), partner closely with UK-based engineers, and build the automation and observability that allows Mexico City to operate settlement.

What You'll Do

Own reliability for batch settlement systems - ensure cycle completion windows are met, data integrity is maintained, and failures are detected before they reach downstream consumers

Build and improve observability for settlement pipelines - dashboards, alerts, and anomaly detection that make system health legible and reduce reliance on tribal knowledge

Drive automation of operational toil - certificate rotation, environment provisioning, compliance artifact generation, and manual validation steps that currently require human intervention

Partner with UK-based settlement engineers - acquire domain expertise on Durbin compliance windows, cross-border DCI routing, and acquirer/issuer SLA adherence

Participate in incident management - respond to settlement failures, drive root cause analysis, and implement durable fixes that prevent recurrence

Contribute to regulatory readiness - ensure SRE practices produce audit-ready artifacts for SOX and PCI-DSS exams without manual toil

What Success Looks Like

Independently validate and troubleshoot settlement cycle failures

At least two manual settlement operations processes fully automated

Settlement observability coverage sufficient to detect anomalies before cycle deadlines

Documented runbooks and severity criteria for all critical settlement failure modes

The Environment

You'll work with batch processing systems that handle financial transactions across multiple on-prem data centers with active/active and active/passive configurations. The stack includes Java, Python, shell scripting, SQL, AWS, Kubernetes, OpenShift containers, Datadog, Observe, and legacy payment platforms. CI/CD pipelines, API automation, and secret management via HashiCorp Vault are part of daily operations. You'll leverage agentic AI automation (Claude Code or others) to accelerate development and build automation solutions. You'll need strong troubleshooting and debugging skills and be comfortable with both modern cloud-native tooling and traditional enterprise batch systems.

Basic Qualifications

Professional English fluency

Bachelor's degree

At least 6 years of experience in SRE, production operations, or reliability engineering

Experience in DevOps Engineering (internship experience does not apply)

5+ years of experience in at least one of the following: Java, Python, Go

At least 4 years of experience with Cloud Native technologies (Amazon Web Services, Microsoft Azure, Google Cloud Platform)

3+ years of experience with container orchestration services including Docker or Kubernetes

Experience with Shell or Bash scripting

At least 3 years of Unix or Linux system administration experience

Preferred Qualifications

Experience developing automation solutions using agentic AI tools (Claude Code, Copilot CLI)

Troubleshooting and debugging skills across distributed systems

Familiarity with payments, financial services, or other regulated high-availability domains

Knowledge or experience of Networking concepts (TCP/DNS/TLS)

At Capital One, we respect individual differences in culture, religion, and ethnicity. Likewise, we promote equal opportunities and development for all personnel. In the hiring process, we seek to provide equal employment opportunities to candidates, regardless of race, color, religion, gender, sexual orientation, marital or civil status, national origin, disability, or any other situation protected by federal, state, or local laws.

For technical support or questions about Capital One's recruiting process, please send an email to Careers@capitalone.com

Capital One does not provide, endorse nor guarantee and is not liable for third-party products, services, educational tools or other information available through this site.

Capital One Financial is made up of several different entities. Please note that any position posted in Canada is for Capital One Canada, any position posted in the United Kingdom is for Capital One Europe, any position posted in the Philippines is for Capital One Service Corp (COPSSC), and any position posted in Mexico is for Capital One Technology Labs Mexico.

Questions about this role

Click "Apply with AI Applyd" above. We auto-fill the application from your resume and answer screening questions in seconds. No copy and paste, no juggling tabs.

Compensation for DevOps / SRE roles in Mexico varies widely by seniority, employer size, and remote vs onsite arrangement. Check the salary range on this listing when published, or browse our DevOps / SRE hub for Mexico medians across recent openings.

Most applications complete in under 90 seconds. You can track the status in your dashboard and watch the screenshot proof land the moment the application submits.

AI Applyd supports Greenhouse, Lever, Ashby, Workday, iCIMS, SmartRecruiters, LinkedIn Easy Apply, and most other ATS platforms. If we can submit through the platform, we do.

Want AI Applyd to auto-apply to roles like this?

We tailor your resume per posting, fill the forms, and track replies for you.