Senior SRE (FinOps & Performance oriented) – LATAM

LATAM

About Distillery

Distillery accelerates innovation through an unyielding approach to nearshore software development. The world’s most innovative technology teams choose Distillery to help accelerate strategic innovation, fill a pressing technology gap, and hit mission-critical deadlines. We support essential applications, mobile apps, websites, and eCommerce platforms by placing senior, strategic technical leaders and deploying fully managed technology teams that work intimately alongside our client’s in-house development teams. At Distiller  we’re not here to reinvent nearshore software development, we’re on a mission to perfect it. Distillery is committed to diversity and inclusion. We actively seek to cultivate a workforce that reflects the rich tapestry of perspectives, backgrounds, and experiences present in our society. Our recruitment efforts are dedicated to promoting equal opportunities for all candidates, regardless of race, ethnicity, gender, sexual orientation, disability, age, or any other dimension of diversity


About the Role

We are looking for a systems-minded Site Reliability Engineer (SRE) to join our e-commerce DevOps engineering team. In this role, you will bridge the gap between application development (Java, TypeScript, Next.js) and infrastructure (AWS, EKS, EC2).

As a curious, high-ownership engineer thriving in multicultural, remote-first environments, your mission will be to ensure our platform is resilient under peak load, observable by default, and cost-optimized by design.

This is a hands-on role for someone who enjoys solving complex infrastructure challenges, partnering closely with developers, and building reliable systems at scale.



Key Responsibilities

Cloud & Systems Troubleshooting

  • Solve complex performance bottlenecks across the entire stack, from Linux kernel tuning to cloud-native AWS architectures
  • Debug networking issues including TCP/IP, DNS, and distributed system behaviors
  • Go beyond surface-level fixes to identify true root causes and drive long-term reliability improvements

Performance Engineering

  • Partner with software engineers to design and execute load and stress testing strategies
  • Take ownership of performance results, helping teams refactor infrastructure or application code
  • Ensure platform stability during traffic spikes and peak e-commerce events

Infrastructure as Code

  • Maintain and scale cloud infrastructure using Terraform
  • Contribute to automation and platform consistency across environments
  • Experience with Pulumi is a plus

FinOps as a Product

  • Treat cloud cost as a primary engineering metric alongside performance and uptime
  • Lead cost optimization efforts to ensure efficient and sustainable infrastructure spending
  • Ensure every dollar spent contributes directly to reliability and scalability

Modern Observability

  • Drive best-in-class observability practices across services and infrastructure
  • Ensure deep actionable visibility regardless of tooling (Datadog, OpenTelemetry, etc.)
  • Build monitoring and alerting systems that enable proactive incident prevention

Engineering Culture & Enablement

  • Act as a force multiplier by building paved roads and self-service reliability tools
  • Empower developers to own service reliability with minimal friction
  • Collaborate closely with Engineering, Product, Project Management, and CX teams during production incidents
  • Mentor peers and share knowledge to raise operational excellence across the organization

Required Qualifications

Core Experience

  • 4+ years of experience in SRE/DevOps roles with a strong software development background
  • Proven ability to build, operate, and scale production systems in business-critical environments

Systems & Cloud Expertise

  • Deep experience managing large-scale Linux environments on AWS
  • Strong debugging skills when abstraction layers fail
  • Solid cloud foundations across networking, IAM, storage, compute, databases, and caching services
  • Hands-on expertise with Kubernetes fundamentals, especially AWS EKS

Programming & Automation

  • High proficiency in Python, Go, and Bash scripting
  • Experience with Java or TypeScript/Next.js is a strong plus
  • Strong bias toward automation, tooling, and operational efficiency

Security Awareness

  • Familiarity with access boundaries, secrets management, encryption, and guardrails
  • Understanding of reliability and security best practices in cloud environments

Domain Knowledge (Preferred)

  • Experience with e-commerce platforms, billing/fulfillment systems, or Shopify ecosystem is highly valued

Essential Skills and Abilities

  • Analytical mindset, treating latency and cost as equally important engineering metrics
  • Strong communication skills to translate infrastructure needs into actionable developer goals
  • Pragmatic approach focused on automation, documentation, and continuous improvement
  • Ownership mentality with the ability to thrive in fast-paced, production-focused environments

Why You Should Work at Distillery

Join a global team committed to Distillery's core values: Unyielding Commitment, Relentless Pursuit, Courageous Ambition, and Authentic Connection.

  • 100% Remote Work: Enjoy the freedom to work from anywhere while collaborating with a diverse, multinational team.
  • Competitive Compensation: Generous and competitive package in USD, along with a comprehensive benefits plan.
  • Flexible Hours: Create a schedule that aligns with your life and priorities.
  • Home Office Setup: Receive all the hardware and software needed to succeed from home.
  • Innovative Workplace: Collaborate with the global Top 1% of talent in a multicultural and dynamic environment.
  • Focus on Growth: Pursue professional and personal development while contributing your unique talents to a team where you can truly shine!