MLOps / DevOps Engineer
Skills
About the role
About the Role
We are looking for an MLOps/DevOps Engineer to build, deploy, and operate infrastructure for LLM and AI workloads in production. You will work closely with ML and backend engineers to create reliable environments for training/fine-tuning, model serving, and GPU-based compute, ensuring performance, scalability, and high availability.
Key Responsibilities
Design and manage scalable infrastructure for AI/ML workloads (training, fine-tuning, inference).
Deploy, manage, and optimize GPU-enabled environments (drivers, CUDA runtime readiness, GPU monitoring, scheduling).
Build and maintain CI/CD pipelines for backend services (APIs, microservices), and
ML/LLM deployments (model versioning, rollout, rollback).
Containerize and orchestrate services using Docker and Kubernetes (EKS/GKE/AKS or self-managed).
Implement best practices for MLOps lifecycle:
model packaging and artifact management
reproducible deployments
environment management across dev/stage/prod
Set up observability (metrics, logging, alerting, tracing) for infrastructure and model services.
Improve system reliability via SRE practices: incident response, root-cause analysis, SLAs/SLOs, capacity planning.
Collaborate with ML engineers to productionize LLM workflows (LoRA adapters, inference endpoints, batch jobs).
Optimize cost and performance (autoscaling, efficient GPU utilization, job scheduling, caching).
Required Skills & Qualifications (Must Have)
3–5 years experience in DevOps / Platform Engineering / MLOps role
Strong Linux administration and networking fundamentals.
Hands-on experience with Docker and Kubernetes (deployments, services, ingress, scaling).
Experience building CI/CD pipelines (GitHub Actions / GitLab CI / Jenkins).
Proficiency in scripting/automation using Python (or strong bash + ability to work in Python).
Cloud experience with AWS / GCP / Azure (compute, networking, IAM, storage).
Familiarity with infrastructure automation and configuration management (Terraform/Ansible is a plus).
Good to Have (Preferred)
Experience with model serving frameworks: vLLM, Triton Inference Server, TorchServe, Ray Serve.
Exposure to ML lifecycle tools: MLflow, Weights & Biases, DVC.
Understanding of LLM fine-tuning concepts (LoRA/QLoRA) and deployment requirements.
Experience working with distributed systems, job schedulers, or workflow orchestration (Argo, Airflow, Prefect).
Knowledge of vector databases / RAG pipelines (FAISS, Pinecone, Weaviate, pgvector).
Familiarity with GPU performance tuning/monitoring (nvidia-smi, DCGM, Prometheus exporters).
Experience:
LLM: 3 years (Required)
Ai architecture: 3 years (Required)
DevOps engineer: 3 years (Required)
Work Location: In person
Questions about this role
How do I apply to this MLOps / DevOps Engineer role at iApp Technologies?
Click "Apply with AI Applyd" above. We auto-fill the application from your resume and answer screening questions in seconds. No copy and paste, no juggling tabs.
What's the typical salary for DevOps / SRE in your country?
Compensation varies by seniority, employer size, and location. When this listing publishes a salary band you'll see it in the badge row above the description.
How fast does AI Applyd auto-apply?
Most applications complete in under 90 seconds. You can track the status in your dashboard and watch the screenshot proof land the moment the application submits.
What ATS does iApp Technologies use?
AI Applyd supports Greenhouse, Lever, Ashby, Workday, iCIMS, SmartRecruiters, LinkedIn Easy Apply, and most other ATS platforms. If we can submit through the platform, we do.
Want AI Applyd to auto-apply to roles like this?
We tailor your resume per posting, fill the forms, and track replies for you.