Site Reliability Engineer
Skills
About the role
Who are Tyk, and what do we do?
The Tyk API Management platform is helping to drive the connected world and power new products and services. We’re changing the way that organisations connect any number of their systems and services.Whether internal, external, public or highly encrypted systems, Tyk helps businesses drive value across the retail, finance, telecoms, healthcare, or media industries (to name just a few!)
If you’ve banked online, used an app to check the news, or perhaps even driven a connected car, API’s, and by extension, Tyk, make that possible. Founded in 2015 with offices in London – UK, London – Ontario, Atlanta and Singapore, we have many thousands of users of our B2B platform across the globe. Brands using Tyk range from Lotte, Bell, T Mobile, to RBS, Capital One and Vinci. We have a varied user base hailing from every continent – even Antarctica.
Our Mission
Tyk is on a mission to connect every system in the world. We’ve started by building an API Management platform.
Total flexibility, default remote, radical responsibility
We offer unlimited paid holidays and remote working from anywhere in the world, for everyone, Why? Tyk was founded on the principle of offering flexibility and autonomy to our employees, we believe this allows our employees to achieve their best results. It also means we can build the best possible team, location and working hours are no barrier.
If this sounds like an environment that you believe could work for you then read on to find out more.
The role:
We’re looking for a Site Reliability Engineer to manage, maintain, improve and provide support on our platform. You will be curious by nature, always looking for ways to improve, as we will look to you for new ideas, solutions and metrics on how we can improve the platform. You will also be our first line of incident management to our clients and will help define our response going forward. This is a great opportunity to become an integral part of Tyk as we continue on our journey.
As a remote first company, you will have the opportunity to work with an industry leading distributed team. Having access to expertise from across the globe will give you both the support and opportunity to help shape not only Tyk’s Cloud platform but also the Tyk as a whole as we continue to grow.
Requirements
Here’s what you’ll be responsible for:
Maintaining global Tyk Cloud within SL(A/I/O)s you will help to define
Identifying reliability issues and working together with your squad to solve them
Identifying and introducing new metrics and building relevant dashboards
Participating in the on-call rotation
Working with your squad to multi-region and multi-cloud reach of the platform
Documenting operational knowledge
Conducting post-incident analysis
Automating common tasks
Be a key shaper and contributor to our continuous improvement agenda – be it the clarity of our user stories, how we estimate, communicate with other teams or customers – we expect this role to be advocate of continuous improvement
Reliability of our new global Tyk Cloud platform
Automation of operations and support
Writing and maintaining documentation on SRE processes and policies
Recommending and implementing ways of driving operational efficiency and driving down our cost to run, without impacting service
Assisting in penetration testing for Cloud through liaising with our provider, providing technical details, and environment setup
Incident management
Here’s what we’re looking for:
Experience
Strong collaboration skills
Launching and operating production scale kubernetes clusters
Designing and operating infrastructure on AWS and other providers
Operating MongoDB (or other document database) clusters
Operating Redis (or other key-value storage) clusters
Administering Linux servers
Maintaining distributed software
Operating Prometheus and Grafana
Operating logging collection and analysis systems
Participating in the on-call rotation(16:00pm – 4:00am UTC)
Skills:
Kubernetes & containers (advanced)
AWS / EKS (advanced)
Linux (advanced)
Terraform and IaC in general (proficient)
Helm (proficient)
Go (familiar)
MongoDB (or similar)
Redis (or similar)
Monitoring – prometheus, grafana, thanos (familiar)
Grasp of networking concepts (subnets, routing, peering, load balancing, NAT, etc.)
Common networking protocols (DNS, TCP/IP, HTTP, TLS, UDP)
Proactive, energetic, innovative and change oriented
Nice to have:
GCP or Azure
Bare metal infrastructure engineering
API management experience
Large scale distributed storage management
Familiarity with Rancher
CKA/CKAD/CKS
Creating and delivering production software in Go language
Benefits
Here’s why you should join us:
Everyone has unlimited paid holiday.
We have total flexibility in hours, as we believe creativity flows better when our people are given freedom to decide when they are most productive. Everyone is unique after all.
Employee share scheme
Generous maternity and paternity leave
Company retreats
We all share the same vision – we value authenticity, respect, responsibility, independence, honesty, diversity and inclusion and most importantly treating others how you wish to be treated. We look for like-minded people who bring their personalities to work everyday, strive to achieve their personal goals and who are willing to challenge the way we do things, why? – to make what we do even better!
Our values tell the story of Tyk – here’s how:
It’s ok to screw up!
We’ve found that it’s often the ‘stupid’ or unexpected ideas that turn out to be the successful ones – so try it, at least we can say we have!
The only stupid idea, is the untested one!
It’s in our DNA – starting a business with founders 12 hours apart, giving our gateway away for free – sure, we did that, and we’d do it again!
Trust starts with you – make it count!
Trust is a two-way street – instill it from day one!
Assume best intent!
We have each other’s back – we’re all on the same team. Think before you speak or act.
Make things, better!
Always try to leave things better than when you found them – change is constant, inevitable and embraced! Be that change we want to see.
What’s it like to work here?! check it out: https://tyk.io/worklife/
Tyk is an equal opportunities employer and we are determined to ensure that no applicant or employee receives less favourable treatment on the grounds of gender, age, disability, religion, belief, sexual orientation, marital status, or race, or is disadvantaged by conditions or requirements which cannot be shown to be justifiable.
You can about us here https://tyk.io
Questions about this role
How do I apply to this Site Reliability Engineer role at Tyk Technologies?
Click "Apply with AI Applyd" above. We auto-fill the application from your resume and answer screening questions in seconds. No copy and paste, no juggling tabs.
What's the typical salary for DevOps / SRE in Mexico?
Compensation for DevOps / SRE roles in Mexico varies widely by seniority, employer size, and remote vs onsite arrangement. Check the salary range on this listing when published, or browse our DevOps / SRE hub for Mexico medians across recent openings.
How fast does AI Applyd auto-apply?
Most applications complete in under 90 seconds. You can track the status in your dashboard and watch the screenshot proof land the moment the application submits.
What ATS does Tyk Technologies use?
AI Applyd supports Greenhouse, Lever, Ashby, Workday, iCIMS, SmartRecruiters, LinkedIn Easy Apply, and most other ATS platforms. If we can submit through the platform, we do.
Want AI Applyd to auto-apply to roles like this?
We tailor your resume per posting, fill the forms, and track replies for you.