- We are seeking a skilled Associate Manager – AIOps & MLOps Operations to support and enhance the automation, scalability, and reliability of AI/ML operations across the enterprise. This role requires a solid understanding of AI-driven observability, machine learning pipeline automation, cloud-based AI/ML platforms, and operational excellence. The ideal candidate will assist in deploying AI/ML models, ensuring continuous monitoring, and implementing self-healing automation to improve system performance, minimize downtime, and enhance decision-making with real-time AI-driven insights.
- Support and maintain AIOps and MLOps programs, ensuring alignment with business objectives, data governance standards, and enterprise data strategy.
- Assist in implementing real-time data observability, monitoring, and automation frameworks to enhance data reliability, quality, and operational efficiency.
- Contribute to developing governance models and execution roadmaps to drive efficiency across data platforms, including Azure, AWS, GCP, and on-prem environments.
- Ensure seamless integration of CI/CD pipelines, data pipeline automation, and self-healing capabilities across the enterprise.
- Collaborate with cross-functional teams to support the development and enhancement of next-generation Data & Analytics (D&A) platforms.
- Assist in managing the people, processes, and technology involved in sustaining Data & Analytics platforms, driving operational excellence and continuous improvement.
- Support Data & Analytics Technology Transformations by ensuring proactive issue identification and the automation of self-healing capabilities across the PepsiCo Data Estate.
- Support the implementation of AIOps strategies for automating IT operations using Azure Monitor, Azure Log Analytics, and AI-driven alerting.
- Assist in deploying Azure-based observability solutions (Azure Monitor, Application Insights, Azure Synapse for log analytics, and Azure Data Explorer) to enhance real-time system performance monitoring.
- Enable AI-driven anomaly detection and root cause analysis (RCA) by collaborating with data science teams using Azure Machine Learning (Azure ML) and AI-powered log analytics.
- Contribute to developing self-healing and auto-remediation mechanisms using Azure Logic Apps, Azure Functions, and Power Automate to proactively resolve system issues.
- Support ML lifecycle automation using Azure ML, Azure DevOps, and Azure Pipelines for CI/CD of ML models.
- Assist in deploying scalable ML models with Azure Kubernetes Service (AKS), Azure Machine Learning Compute, and Azure Container Instances.
- Automate feature engineering, model versioning, and drift detection using Azure ML Pipelines and MLflow.
- Optimize ML workflows with Azure Data Factory, Azure Databricks, and Azure Synapse Analytics for data preparation and ETL/ELT automation.
- Implement basic monitoring and explainability for ML models using Azure Responsible AI Dashboard and InterpretML.
- Collaborate with Data Science, DevOps, CloudOps, and SRE teams to align AIOps/MLOps strategies with enterprise IT goals.
- Work closely with business stakeholders and IT leadership to implement AI-driven insights and automation to enhance operational decision-making.
- Track and report AI/ML operational KPIs, such as model accuracy, latency, and infrastructure efficiency.
- Assist in coordinating with cross-functional teams to maintain system performance and ensure operational resilience.
- Support the implementation of AI ethics, bias mitigation, and responsible AI practices using Azure Responsible AI Toolkits.
- Ensure adherence to Azure Information Protection (AIP), Role-Based Access Control (RBAC), and data security policies.
- Assist in developing risk management strategies for AI-driven operational automation in Azure environments.
- Prepare and present program updates, risk assessments, and AIOps/MLOps maturity progress to stakeholders as needed.
- Support efforts to attract and build a diverse, high-performing team to meet current and future business objectives.
- Help remove barriers to agility and enable the team to adapt quickly to shifting priorities without losing productivity.
- Contribute to developing the appropriate organizational structure, resource plans, and culture to support business goals.
- Leverage technical and operational expertise in cloud and high-performance computing to understand business requirements and earn trust with stakeholders.
- 5+ years of technology work experience in a global organization, preferably in CPG or a similar industry.
- 5+ years of experience in the Data & Analytics field, with exposure to AI/ML operations and cloud-based platforms.
- 5+ years of experience working within cross-functional IT or data operations teams.
- 2+ years of experience in a leadership or team coordination role within an operational or support environment.
- Experience in AI/ML pipeline operations, observability, and automation across platforms such as Azure, AWS, and GCP.
- Excellent Communication: Ability to convey technical concepts to diverse audiences and empathize with stakeholders while maintaining confidence.
- Customer-Centric Approach: Strong focus on delivering the right customer experience by advocating for customer needs and ensuring issue resolution.
- Problem Ownership & Accountability: Proactive mindset to take ownership, drive outcomes, and ensure customer satisfaction.
- Growth Mindset: Willingness and ability to adapt and learn new technologies and methodologies in a fast-paced, evolving environment.
- Operational Excellence: Experience in managing and improving large-scale operational services with a focus on scalability and reliability.
- Site Reliability & Automation: Understanding of SRE principles, automated remediation, and operational efficiencies.
- Cross-Functional Collaboration: Ability to build strong relationships with internal and external stakeholders through trust and collaboration.
- Familiarity with CI/CD processes, data pipeline management, and self-healing automation frameworks.
- Strong understanding of data acquisition, data catalogs, data standards, and data management tools.
- Knowledge of master data management concepts, data governance, and analytics.