Senior Site Reliability Engineer (SRE)
Job Description
Location: Fully remote EU timezone (CET ±2h)
Start date: ASAP
Languages: Fluent English is mandatory
Industry: Cloud Computing
We are hiring at Pragmatike to expand our team and drive the growth of our internal projects.
Our focus is on developing
- edge solutions in Cloud Computing, while fostering a culture of collaboration and innovation. Joining us means being part of a passionate team where your ideas and skills directly contribute to shaping tomorrows technologies.
If you're excited about working on ambitious projects in a dynamic and flexible environment, we'd love to hear from you!
Responsibilities - Operate and maintain Linux-based infrastructure (Debian/Ubuntu).
- Deploy, manage, and scale Kubernetes clusters across
- metal, virtualized, and
- prem environments. - Oversee full cluster lifecycle: upgrades, node pools, networking, storage, and security hardening.
- Implement automation for provisioning and operations using Ansible, Bash/Python, and Git
Ops workflows. - Design and maintain networking architecture including VLANs, L2/L3 routing, VPNs, and
- site connectivity. - Build automated deployment workflows (PXE boot, Preseed,
- init). - Deploy and maintain observability stacks (Prometheus/Grafana, Loki, ELK, Graylog).
- Lead incident response and escalation activities across the platform.
- Improve system availability and reduce latency at all levels.
- Define and implement SLOs/SLIs at multiple infrastructure levels (physical network/hardware, platform virtualization, software services).
- Optimize alerting and monitoring pipelines to provide actionable insights.
- Establish and maintain
- call schedules to ensure coverage across timezones. - Develop Standard Operating Procedures (SOPs) for repeatable operations and maintenance tasks.
- Coordinate physical maintenance for Policlouds (periodic maintenance, hardware issues, DC-Ops).
- Manage virtualization and orchestration layers (Open
Stack, Proxmox, VMware). - Help develop and maintain overall architecture across all products.
- Plan resources for future initiatives, accounting for demand and growth projections.
- Work with development teams to improve overall quality and optimize resource utilization.
- Collaborate with
- functional stakeholders (Hivenet, Policloud, Customer Success teams).
Requirements - Expert-level,
- on experience operating Kubernetes in production environments. - Strong network engineering skills (VLANs, L2/L3 routing, VPNs,
- site connectivity) - this is essential for the role. - Strong proficiency with Linux systems administration (Debian/Ubuntu).
- Solid understanding of networking fundamentals and ability to design complex network architectures.
- Experience building and maintaining automation workflows (Ansible, Bash/Python, Git-based).
- Experience with observability stacks such as Prometheus, Grafana, ELK, Loki, or Graylog.
- Background with virtualization technologies (Open
Stack, Proxmox, VMware). - Experience with
- metal provisioning and MAAS (Metal as a Service). - Strong understanding of distributed systems and container orchestration.
- Process-oriented mindset with ability to develop SOPs and operational procedures from scratch.
- Experience with incident response, escalation procedures, and
- call rotations. - Ability to work autonomously in a
- paced,
- driven environment. - Strong technical skills combined with alignment to team values.
Nice To Have - Experience with service mesh (Istio, Linkerd) or advanced CNI implementations.
- Knowledge of Cloudflare APIs, DNS automation, or tunnel configurations.
- Experience with GPU infrastructure, node preparation, or resource scheduling.
- Familiarity with security best practices (RBAC, firewalls, network policies).
- Exposure to IT asset management or license tracking workflows.
- Experience working in
- timezone environments and coordinating across distributed teams. - Background establishing reliability practices and SRE frameworks in growing organizations.
- metal, virtualized, and
- prem environments.
Ops workflows.
- site connectivity.
- init).
- call schedules to ensure coverage across timezones.
Stack, Proxmox, VMware).
- functional stakeholders (Hivenet, Policloud, Customer Success teams).
- Expert-level,
- on experience operating Kubernetes in production environments. - Strong network engineering skills (VLANs, L2/L3 routing, VPNs,
- site connectivity) - this is essential for the role. - Strong proficiency with Linux systems administration (Debian/Ubuntu).
- Solid understanding of networking fundamentals and ability to design complex network architectures.
- Experience building and maintaining automation workflows (Ansible, Bash/Python, Git-based).
- Experience with observability stacks such as Prometheus, Grafana, ELK, Loki, or Graylog.
- Background with virtualization technologies (Open
Stack, Proxmox, VMware). - Experience with
- metal provisioning and MAAS (Metal as a Service). - Strong understanding of distributed systems and container orchestration.
- Process-oriented mindset with ability to develop SOPs and operational procedures from scratch.
- Experience with incident response, escalation procedures, and
- call rotations. - Ability to work autonomously in a
- paced,
- driven environment. - Strong technical skills combined with alignment to team values.
Nice To Have - Experience with service mesh (Istio, Linkerd) or advanced CNI implementations.
- Knowledge of Cloudflare APIs, DNS automation, or tunnel configurations.
- Experience with GPU infrastructure, node preparation, or resource scheduling.
- Familiarity with security best practices (RBAC, firewalls, network policies).
- Exposure to IT asset management or license tracking workflows.
- Experience working in
- timezone environments and coordinating across distributed teams. - Background establishing reliability practices and SRE frameworks in growing organizations.
- timezone environments and coordinating across distributed teams.
Why Join Us:
- 100% remote work with flexible hours
- High-impact role with autonomy and ownership
- Collaborative and international engineering team
- Cutting-edge tech stack with strong focus on reliability and automation.
- Informații detaliate despre oferta de muncă
Firma: PRAGMATIKE Localiția: Cluj-Napoca
Cluj-Napoca, Cluj County, RomaniaAdăugat: 4. 12. 2025
Postul de muncă activ
Fii primul, care se va înregistra la oferta de muncă respectivă!