SRE

Employer: Euro-Testing Software Solutions
Domain:
  • Banks - Financial Institutions
  • IT Software
  • Job type: full-time
    Job level: 1 - 5 years of experience
    Location:
  • BUCHAREST
  • Updated at: 28.11.2024
    Remote work: Hybrid

    Short company description

    Euro-Testing Software Solutions is a privately-owned software company specialized in Full-Service Software Testing, Penetration Testing, Vulnerability Identification & Management, Application and Data Security, Static & Dynamic Code Analysis as well as, DevOps/DevSecOps, Robotic Process Automation, Implementation and Customization for Atlassian and Micro Focus (HPE) products.

    Requirements

    • Experience in using: Linux, UNIX and Windows
    • DB administration & maintenance: Oracle, Cassandra, PostgreSQL, AWS DB setups, Caching DB.
    • Familiar with: GIT, Jira, Jenkins, Ansible
    • Strong knowledge of DevOps and CI/CD pipeline (GitHub, Terraform)
    • Knowledge of monitoring solutions: Grafana, Prometheus, Dynatrace
    • 'Hands-on' AWS implementation experience across a broad range of AWS services.
    • Must have AWS development experience (Containerization - Docker, Amazon EKS, Lambda, EC2, S3, Amazon DocumentDB, PostgreSQL)
    • Experience with core AWS platform architecture, including areas such as: Organizations, Account Design, VPC, Subnet, segmentation strategies.
    • Comfortable working with cloud-native infrastructure, such as AWS Lambda, Google App Engine, and Azure Cloud Services.
    • Backup and Disaster Recovery approach and design
    • Environment and application automation
    • Proficiency in programming languages such as Python, Go, or Java
    • Familiar with Encryption, Logging, and Privacy/Security Protocols (e.g., TLS 1.2, ELK stack)
    • Good knowledge of REST/SOAP/JSON web service API implementation.
    • Bachelor's degree in Computer Science, Information Technology, or a related field.
    • Relevant industry certifications, such as through the Site Reliability Engineering (SRE) Foundation.
    • Strong understanding of cloud-based applications and infrastructure, including AWS, Azure, or Google Cloud.
    • Experience with IT operations best practices such as ITIL, COBIT, or DevOps.
    • Experience with IT service management tools such as ServiceNow or Remedy.
    • Familiarity with banking customer acquisition applications is preferred.

    Responsibilities

    • Monitoring system performance, identifying bottlenecks, and executing pipeline optimization.
    • Implementing comprehensive service metrics to track and report on system reliability, performance, and efficiency.
    • Developing and maintaining CI/CD pipelines, enhancing the consistency and speed of software deployment.
    • Automating routine tasks and creating tools to improve team efficiency and system robustness.
    • Collaborating with development teams to integrate operational considerations into the software development life cycle.
    • Managing incident response protocols, including on-call rotations for junior engineers and strategic planning for senior personnel.
    • Conducting post-incident reviews to prevent recurrence and refine the system reliability framework.
    • Contributing to disaster recovery plans and ensuring robust backup systems are in place.
    • Partner with development teams to improve services through rigorous testing and release procedures.
    • Participate in system design consulting, platform management, and capacity planning.
    • Create sustainable systems and services through automation and uplifts.
    • Balance feature development speed and reliability with well-defined service-level objectives.
    • Working on-call shift to prevent incidents from ever happening.
    • Running our infrastructure with Ansible, Terraform, GitLab CI/CD, and Kubernetes.

    You do some of this daily:
    • Approach operations challenges with a software engineering perspective, leveraging: Coding, Automation and Engineering principles.
    • Monitor and appropriate address system issues.
    • Create strategies to detect issues.
    • Design systems to troubleshoot automatically.
    • Write and review post-mortems.
    • Collaborate with development teams and other stakeholders to identify potential risks.
    • Once risks are identified, you will analyze and evaluate potential impact and likelihood of occurrence.
    • Based on the risk assessment, you will implement various risk mitigation strategies to mitigate operational risks.
    • Continuously monitor and review the effectiveness of their risk strategies.
    • Study historical trends in terms of performance by using metrics like charts and graphs.
    • Trace the problems with system monitoring tools.
    • Monitor the log files to manage infrastructures at scale.
    • Minimizing the MTTR for reliable systems is necessary to reduce downtime. As an SRE, you can improve this metric by resolving the incidents quickly.
    • Maintain internal tooling.

    Other info

    Just about you
    • Have an enthusiastic, go-for-it attitude.
    • Focus on quality of your work.
    • Excellent communication skills and team player.
    • Open-minded and flexible.
    • Hard-worker and passionate.
    • Demonstrated ability to adapt to new technologies and learn quickly.
    • Works well under pressure and meets deadline.
    • Ability to problem solve in a fast-paced, high-stakes environment.
    • Proven ability to collaborate with multi-disciplinary teams of business analysts, developers, data scientists, and subject matter experts.

    Job-uri similare care te-ar putea interesa:

    Aplica fara CV
    BUCURESTI,

    Hybrid

    Hybrid

    Vezi job-uri similare (4)