Site Reliability Engineer
At Paymentology, we’re redefining what’s possible in the payments space. As the first truly global
- processor, we give banks and fintechs the technology and talent to launch and manage Mastercard, Visa, and Union
Pay cards at scale - across more than 60 countries.
Our advanced,
- cloud platform delivers
- time data, unmatched scalability, and the flexibility of shared or dedicated processing instances. It's this global reach and innovation that sets us apart.
We’re looking for a Site Reliability Engineer to ensure the high availability, scalability, and performance of our platform. This role is essential to maintaining reliable systems, reducing operational overhead, and enabling continuous improvement across our global technology landscape. If you're passionate about automation, incident response, and working at the intersection of infrastructure and software, this is your opportunity to help build resilient systems that power financial inclusion worldwide.
What you get to do::Platform Reliability and Scalability
- Build software that enhances Paymentology services' scalability and reliability.
- Ensure platform services meet required uptime and service quality levels.
- Contribute to the design of reliable cloud infrastructure and implement reusable
- uptime components as code. - Regularly review and optimise SRE practices, tools, and methodologies to enhance overall system reliability and team efficiency.
Observability and Automation
- Contribute to the design, implementation, and maintenance of observability and monitoring solutions to track the platform health, its
- effectiveness, the reliability, and scalability, and identify potential issues which can be fed back to product and platform engineering in a continuous improvement loop. - Develop and implement automation scripts and tools to streamline operations and reduce manual interventions.
- Enable product teams to
- serve by participating in the development of a developer platform.
Production Issue Resolution
- Play an active role with the incident response teams, diagnosing and resolving production issues quickly to minimise downtime.
Standards Compliance
- Support product teams in building services that adhere to our security and quality standards.
Cross-team Collaboration
- Work closely with engineering, operations, and product teams to ensure reliability is considered throughout the
-
- end software development lifecycle. We seek to achieve this through advocacy and developing a culture of reliability. **
At Paymentology, it’s not just about building great payment technology, it’s about building a company where people feel they belong and their work matters. You’ll be part of a diverse, global team that’s genuinely committed to making a positive impact through what we do. Whether you’re working across time zones or getting involved in initiatives that support local communities, you’ll find real purpose in your work - and the freedom to grow in a supportive,
- thinking environment.
< 10%
What it takes to succeed:
- Strong understanding of cloud networking principles.
- Proficiency with leading monitoring tools, such as Datadog, Honeycomb. io, Splunk, Prometheus, Grafana, ELK Stack, and New Relic.
- Programming expertise, especially in systems programming languages and databases
- Familiarity with one of these
- leading CI/CD tools such as Jenkins, Git
Hub Actions, Gitlab CI, Code
Pipelines, Circle
CI, and Argo
CD. - Proven in achieving
- level and
-
- end SLIs, SLOs, and SLAs, and fostering accountability. - Ability to navigate complex situations and lead effective
- incident reviews (PIRs). - Knowledge of implementing solutions to reduce Mean Time to Identify (MTTI) and Mean Time to Resolve (MTTR).
- Comprehensive understanding of
- scale distributed platform architecture. - Expertise in implementing best practices for load balancing, fault tolerance, and resource allocation to maintain service quality and efficiency at scale.
- Understanding of security best practices within cloud environments.
Education and Experience:
- Bachelor’s Degree in Computer Science, Information Technology, or related field.
- Professionals with a verifiable employment history in the role may also be considered.
- 2+ years of experience as a Site Reliability Engineer.
- 2+ years in software development.
- Extensive cloud experience, especially with AWS.
- Proven expertise in one of the
-
- code using Terraform, Cloud
Formation, Puppet, and Ansible. - Hands-on experience with Docker, ECS, EKS, and Kubernetes.
- Informații detaliate despre oferta de muncă
Firma: Paymentology Localiția: Bucureşti
Bucharest, Bucharest, RomaniaAdăugat: 8. 8. 2025
Postul de muncă activ
Fii primul, care se va înregistra la oferta de muncă respectivă!