Senior SRE (FinOps & Performance oriented) – LATAM

LATAM

About Distillery

Distillery is a global technology consulting firm that partners with innovative companies to build high-quality software solutions. We specialize in assembling elite, distributed engineering teams that work closely with our clients to solve complex business challenges.

At Distillery, we value craftsmanship, ownership, and continuous learning. Our teams are empowered to make technical decisions, collaborate openly, and deliver real impact. We work with modern technologies, cloud-native architectures, and data-driven organizations across multiple industries.


About the Role

We are looking for a systems-minded Site Reliability Engineer (SRE) to join our e-commerce DevOps engineering team. In this role, you will bridge the gap between application development (Java, TypeScript, Next.js) and infrastructure (AWS, EKS, EC2).

As a curious, high-ownership engineer thriving in multicultural, remote-first environments, your mission will be to ensure our platform is resilient under peak load, observable by default, and cost-optimized by design.

This is a hands-on role for someone who enjoys solving complex infrastructure challenges, partnering closely with developers, and building reliable systems at scale.



Key Responsibilities

Cloud & Systems Troubleshooting

  • Solve complex performance bottlenecks across the entire stack, from Linux kernel tuning to cloud-native AWS architectures
  • Debug networking issues including TCP/IP, DNS, and distributed system behaviors
  • Go beyond surface-level fixes to identify true root causes and drive long-term reliability improvements

Performance Engineering

  • Partner with software engineers to design and execute load and stress testing strategies
  • Take ownership of performance results, helping teams refactor infrastructure or application code
  • Ensure platform stability during traffic spikes and peak e-commerce events

Infrastructure as Code

  • Maintain and scale cloud infrastructure using Terraform
  • Contribute to automation and platform consistency across environments
  • Experience with Pulumi is a plus

FinOps as a Product

  • Treat cloud cost as a primary engineering metric alongside performance and uptime
  • Lead cost optimization efforts to ensure efficient and sustainable infrastructure spending
  • Ensure every dollar spent contributes directly to reliability and scalability

Modern Observability

  • Drive best-in-class observability practices across services and infrastructure
  • Ensure deep actionable visibility regardless of tooling (Datadog, OpenTelemetry, etc.)
  • Build monitoring and alerting systems that enable proactive incident prevention

Engineering Culture & Enablement

  • Act as a force multiplier by building paved roads and self-service reliability tools
  • Empower developers to own service reliability with minimal friction
  • Collaborate closely with Engineering, Product, Project Management, and CX teams during production incidents
  • Mentor peers and share knowledge to raise operational excellence across the organization

Required Qualifications

Core Experience

  • 4+ years of experience in SRE/DevOps roles with a strong software development background
  • Proven ability to build, operate, and scale production systems in business-critical environments

Systems & Cloud Expertise

  • Deep experience managing large-scale Linux environments on AWS
  • Strong debugging skills when abstraction layers fail
  • Solid cloud foundations across networking, IAM, storage, compute, databases, and caching services
  • Hands-on expertise with Kubernetes fundamentals, especially AWS EKS

Programming & Automation

  • High proficiency in Python, Go, and Bash scripting
  • Experience with Java or TypeScript/Next.js is a strong plus
  • Strong bias toward automation, tooling, and operational efficiency

Security Awareness

  • Familiarity with access boundaries, secrets management, encryption, and guardrails
  • Understanding of reliability and security best practices in cloud environments

Domain Knowledge (Preferred)

  • Experience with e-commerce platforms, billing/fulfillment systems, or Shopify ecosystem is highly valued

Essential Skills and Abilities

  • Analytical mindset, treating latency and cost as equally important engineering metrics
  • Strong communication skills to translate infrastructure needs into actionable developer goals
  • Pragmatic approach focused on automation, documentation, and continuous improvement
  • Ownership mentality with the ability to thrive in fast-paced, production-focused environments

Why You Should Work at Distillery

  • Work on high-scale, mission-critical e-commerce infrastructure supporting real-world peak traffic events
  • Collaborate with talented engineers in a culture that values craftsmanship and ownership
  • Drive meaningful impact across performance, cost optimization, and system reliability
  • Grow your career through modern cloud-native technologies and distributed engineering practices
  • Join a remote-first environment that emphasizes trust, autonomy, and work-life balance
  • Be part of a company that invests in long-term partnerships and technical excellence