Trust In Soda
Dublin, Dublin
23/04/2025
Contractor
Cloud Site Reliability Engineer - HIRING ASAP Start date: ASAP Duration: 6 months Location: 1 week per month in Dublin, 3 weeks working from home Rate: €375 per day Summary: We are looking for a self-driven engineer to help scale our growing public cloud presence. The Cloud Site Reliability Engineer deliver reliable runtimes for our clients' business critical workloads. This team is responsible for cross-cutting cloud management capabilities and are the experts on the state of our clients' cloud platforms at any moment. The team comes from diverse technical backgrounds, and the responsibilities provide opportunity for a variety of challenges that require engineers to work on software and systems challenges. Ideal candidates will have a background in either software engineering or systems engineering with a desire to learn the other or previous experience as an SRE. As a Site Reliable Engineering in the Cloud domain, you will have the opportunity to shape the cloud operations that support over 30 million investors and their financial futures. Responsibilities: Support and execute events and experiences across our client. Design, develop and scale the delivery of Product Experiences in support of internal and external stakeholders. Provide creative production and logistics expertise for events including experience consultation, demo design, staffing, tech, and on-site support. Create partnerships with marketing, product, and creative teams. Support the development of processes and playbooks that streamline and scale our operations. Partner with agencies to plan and execute experiential. Availability and willingness to travel up to 25%. Establish and support strategic measurement and marketing plans for events. Support post-reporting processes and data collection. Help define and execute a comprehensive cloud reliability and observability strategy, ensuring that our clients' cloud systems are always available when our customers need them. Bring together technical, procedural, and financial data to reduce toil and increase efficiency. You will execute plans for technical standardization and process refinement within the engineering organization, especially for Site Reliability Engineers. Troubleshoot stack-wide engineering issues related to hardware, software, network, applications, and cloud service providers. Take part in peer code reviews providing qualitative feedback and facilitate and learning environment through equitable exchange of ideas. Coach peer SREs and development teams on how to build highly available cloud systems. Key Skills Bachelor's degree or higher in a technology related field (eg Engineering, Computer Science, etc.) required, master's degree a plus 4+ years of hands-on experience deploying and/or supporting highly distributed multi-tiered systems at scale. Hands-on experience with Public Cloud environments, preferably AWS and Azure. Certifications a plus. Experience with container orchestration, preferably with Kubernetes. Collaboration and Relationships - Ability to work with a variety of individuals and groups, both in person and virtually, in a constructive and collaborative manner and build and maintain effective relationships. Overall 8+ years of working experience is required. Experience with enabling and managing cloud services, usage, and optimizations. Experience with enabling and managing cloud services, usage, and optimization. 4+ years' experience designing, implementing, and managing Kubernetes (EKS, AKS) Experience with programming languages such as Python and Go Experience designing, implementing, and hosting solutions based in AWS and/or Azure. AWS, Azure and/or Kubernetes certifications Bachelor's degree in computer science, Mathematics, or related sciences - or equivalent work experience 8+ years IT experience Hands on experience on Observability and Resiliency set up for platform and applications. Solid understanding/experience of networking, virtualization, storage, containers, and serverless Experience with Linux systems, ideally with experience in systems administration. Ability to automate with various Scripting languages (Python, Shell Scripting, etc.) Experience managing systems using infrastructure as code tools (IAM, ARM, Terraform, Chef, ) Demonstrated ability to utilize modern monitoring tools (DataDog, Prometheus, Splunk, ) Proficiency with CI/CD tools, especially Jenkins Ability to triage, execute root cause analysis, and be decisive under pressure. Experience managing and interpreting large datasets using query languages and visualization tools. Ability to think in systems and apply technical and non-technical problem solving. Experience with Enterprise IT asset management or other related practices. Proficient communication skills with an ability to reach both technical and non-technical audience. Desire to call yourself a Site Reliability Engineer and a commitment to reducing toil.