Epicareer Might not Working Properly
Learn More

.NET Site Reliability Engineer

Salary undisclosed

Apply on


Original
Simplified

Project Description:

  • As a Site Reliability Engineer you will play a crucial role in ensuring the reliability, scalability, and performance of our systems. Collaborate with cross-functional teams to design, build, and maintain scalable infrastructure, automate operational processes, and respond to incidents swiftly. The ideal candidate is passionate about automation, has a deep understanding of system architecture, and is dedicated to delivering high-quality, reliable services.

Responsibilities:

• System and Service Reliability: Ensure overall reliability and performance, Monitoring system health, Performing root cause analysis

• Incident/Ticket Management: Alert management, Incident response, Triaging, Investigating & Mitigating incidents, Co-ordinating with cross functional teams

• Automation and Tooling: Automation and process improvements, Developing automation tools, scripts and infrastructure, Identify and automate repetitive tasks to reduce manual work

• Capacity Planning and Scalability: Collaborate with development and infrastructure teams, Conduct capacity planning exercises, Forecast resource requirements, Optimize system scalability to handle increased workloads

• Performance and Optimization: Monitor and analyze performance metrics, Identify bottlenecks and recommend optimizations, Collaborate to optimize application code, database queries and system configurations

• Reliability Engineering Practices: Advocate and implement reliability engg. practices, Error budgeting and reviews, Conduct blameless postmortems

• Continuous Improvement: Analyze incident trends and monitor system metrics, Gather feedback from devops, app developers and customers, Identify areas of improvement and collaborate with development teams.

• Collaboration and Communication: Foster collaborations with development , operations & cross functional teams, Acts as a bridge between different teams, Knowledge sharing, promote effective communications, Create and contribute to documentation & share best practices

Mandatory Skills Description:

• 5-7 years of experience

• Bachelor's or higher degree in Computer Science, Information Technology, or related field.

• Proven experience as a Site Reliability Engineer, DevOps Engineer, or similar role.

• Strong problem-solving skills and the ability to troubleshoot complex issues.

• Excellent communication and collaboration skills.

• Strong programming/scripting skills (e.g., Python, Go, Shell) for automation and tooling.

• Ability to troubleshoot and do simple code fixes (.Net)

• Proficient in cloud computing platforms (e.g., AWS, Azure - preferred, GCP).

• Familiarity with CI/CD pipelines and version control systems (e.g., Jenkins, Git, GitHub Actions).

• Proficient with setting up and integrating with monitoring tools (eg, Dynatrace, Moogsoft, Azure Monitor)

• Service Management (Incident, Change, Problem, Alert Management)

• Schedule: Amenable to US hours shift - following cleint's business hours (central time)

Nice-to-Have Skills Description:

• Knowledge in containerization and orchestration tools (e.g., Docker, Kubernetes) is a plus.

• Knowledge of Linux/Unix systems and networking is a plus

• Knowledge with configuration management tools (e.g., Ansible, Puppet, Chef) is a plus

Languages:

  • English: Advanced