As a member of the data engineering team, you will be the key technical expert developing and overseeing PepsiCo's data product build & operations and drive a strong vision for how data engineering can proactively create a positive impact on the business. You'll be an empowered member of a team of data engineers who build data pipelines into various source systems, rest data on the PepsiCo Data Lake, and enable exploration and access for analytics, visualization, machine learning, and product development efforts across the company. As a member of the data engineering team, you will help lead the development of very large and complex data applications into public cloud environments directly impacting the design, architecture, and implementation of PepsiCo's flagship data products around topics like revenue management, supply chain, manufacturing, and logistics. You will work closely with process owners, product owners and business users. You'll be working in a hybrid environment with in-house, on-premise data sources as well as cloud and remote systems.
Responsibilities- Be a founding member of the data engineering team. Help to attract talent to the team by networking with your peers, by representing PepsiCo HBS at conferences and other events, and by discussing our values and best practices when interviewing candidates.
- Own data pipeline development end-to-end, spanning data modeling, testing, scalability, operability and ongoing metrics.
- Ensure that we build high quality software by reviewing peer code check-ins.
- Define best practices for product development, engineering, and coding as part of a world class engineering team.
- Collaborate in architecture discussions and architectural decision making that is part of continually improving and expanding these platforms.
- Lead feature development in collaboration with other engineers; validate requirements / stories, assess current system capabilities, and decompose feature requirements into engineering tasks.
- Focus on delivering high quality data pipelines and tools through careful analysis of system capabilities and feature requests, peer reviews, test automation, and collaboration with other engineers.
- Develop software in short iterations to quickly add business value.
- Introduce new tools / practices to improve data and code quality; this includes researching / sourcing 3rd party tools and libraries, as well as developing tools in-house to improve workflow and quality for all data engineers.
- Support data pipelines developed by your team through good exception handling, monitoring, and when needed by debugging production issues.
- 6-9 years of overall technology experience that includes at least 5+ years of hands-on software development, data engineering, and systems architecture.
- 4+ years of experience in SQL optimization and performance tuning
- Experience with data modeling, data warehousing, and building high-volume ETL/ELT pipelines.
- Experience building/operating highly available, distributed systems of data extraction, ingestion, and processing of large data sets.
- Experience with data profiling and data quality tools like Apache Griffin, Deequ, or Great Expectations.
Current skills in following technologies:
- Python
- Orchestration platforms: Airflow, Luigi, Databricks, or similar
- Relational databases: Postgres, MySQL, or equivalents
- MPP data systems: Snowflake, Redshift, Synapse, or similar
- Cloud platforms: AWS, Azure, or similar
- Version control (e.g., GitHub) and familiarity with deployment, CI/CD tools.
- Fluent with Agile processes and tools such as Jira or Pivotal Tracker
- Experience with running and scaling applications on the cloud infrastructure and containerized services like Kubernetes is a plus.
- Understanding of metadata management, data lineage, and data glossaries is a plus.