Data Engineer- AWS-based data lake and analytics platform. – AWS

LATAM

Data Engineer

Summary:

We are seeking an experienced Data Engineer to support the re-implementation of a

large-scale AWS-based data lake and analytics platform. This is a lift-and-shift effort to

recreate ingestion pipelines, EMR/Glue workflows, Redshift loading logic, DataBricks integration

and SageMaker ML orchestration in a new AWS account using Terraform. You’ll play a key role

in rebuilding core data workflows, validating source integrations, and ensuring data accuracy

across the stack.

Responsibilities:

● Rebuild and validate data ingestion pipelines using AWS services (Lambda, Kinesis

Firehose, MSK, S3).

● Migrate and reconfigure processing jobs in Glue, EMR, and Amazon Managed

Workflows for Apache Airflow (MWAA).

● Recreate and validate table definitions in Glue Data Catalog for downstream Athena

queries.

● Support data ingestion from third-party APIs (e.g., Revature API, eCommerce Affiliates)

using Lambda or Airflow DAGs.

● Collaborate with ML engineers to ensure SageMaker/Personalize workflows are rebuilt

and operational.

● Work with the DevOps team to align Terraform-managed resources with data pipeline

needs.

● Conduct data validation across the migration: object counts, schema consistency,

source-to-target QA.

● Document data flow logic and maintain lineage across ingestion, transformation, and

analytics layers.

Required Skills:

● 5+ years of experience in data engineering or analytics engineering roles.

● Strong AWS experience: Lambda, Kinesis (Data Streams & Firehose), MSK, S3, Glue,

Athena, EMR, Redshift.

● Experience with Airflow, either self-managed or MWAA.

● Proficiency in Python, especially for Lambda functions and ETL logic.

● Experience building or re-building Glue Data Catalog schemas and maintaining

partitioning strategies.

● Understanding of JSON, Parquet, or AVRO file formats and versioning strategies in S3

data lakes.


● Experience integrating and authenticating to external APIs and managing secrets

securely.

Nice to Have:

● Familiarity with Sailthru, Zephr, Databricks, or other Martech tools.

● Experience with Sagemaker pipelines, endpoint deployment, or feature store

management.

● Familiarity with cross-account data ingestion strategies in AWS.

● Hands-on experience working with Terraform to define data infrastructure.

● Knowledge of Redshift Spectrum or federated queries from Redshift to S3.