Summary
Overview
Work History
Education
Skills
Certification
Timeline
Generic
Vinay Nair

Vinay Nair

Hands On Product Owner/Lead SRE/Infrastructure Architect
Zaandam

Summary

With 15 years of IT experience, I have a strong background in Unix, Storage, and Backups, as well as expertise in DevOps and Cloud technologies such as AWS and Azure. I am well-versed in various CICD tools, DevOps security/monitoring tools, and concepts including Kubernetes, Terraform, and Packer. I have experience leading SRE teams and have trained and coached engineers in SRE and DevOps principles. My passion for IT extends beyond just a job, and I am dedicated to implementing SRE ways of working in various teams. I am a logical thinker who believes in fixing basics before making big changes, and I am quick to grasp new concepts in a fast-paced environment.

Overview

15
15
years of professional experience
4
4
years of post-secondary education
2
2
Certifications
5
5
Languages

Work History

Lead Site Reliability Engineer & Product Owner

Shell
The Hague
08.2023 - Current

• Led a newly formed team as Product Owner and Lead SRE to manage cloud costs, with a goal of saving millions of dollars

• Collaborated with a small team of engineers and business leadership to automate Trusted Advisor and Advisor findings for cost optimization

• Worked on various cost-saving initiatives such as resizing VMs, archiving abandoned disks, and enabling intelligent tiering in object storage

• Developed dashboards to visualize potential cost savings and automated solutions for account/subscription owners to opt in and customize

• Defined, refined, and prioritized features with the team, and implemented solutions using Python, Terraform, and CICD pipelines

Product Owner and SRE Coach, AWS Platform Team

Shell
Amsterdam
08.2022 - 07.2023

• Led AWS Account Foundations Team in Shell as Product Owner & SRE Coach

• Stabilized newly merged team of 5 developers and 20 operations members

• Implemented SRE best practices and resolved ops vs dev challenges

• Managed over 600 Business Critical AWS Accounts

• Oversaw Landing Zone and all aspects of AWS foundation

• Member of Technical Review Board for Exceptions and Deviations

• Drove ops transition to automated DevOps practices

• Introduced Proper Scrum teams in Azure Devops

• Committed and adhered to SAFE Programme Increment Goals

• Improved refinement process and established maintenance processes

• Worked on Account Vending Machines, Stacksets, ControlTower, Lambdas, API Gateways, Servicenow Integrations, Security Hub remediations, Transit Gateways, Organizations, and SCPs

Lead Site Reliability Engineer

Shell
Amsterdam
09.2021 - 07.2022

Worked as a Lead SRE for team responsible for Data Science and Observability.

  • Specialised in AWS Sagemaker/AzureML/Prometheus/Grafana/Dynatrace/Jaeger/Open Telemetry etc
  • Started as the first MLOps engineer in allow listing and providing AzureML and AWS Sagemaker as central services in Shell.
  • Work in a combined team with AWS engineers, Shell Business & Management, Data Scientists etc to demo and create reusable Pipelines meant for Spinning up new instances of Sagemaker for specific business cases.
  • Implemented data virtualization with Dremio. Implemented terraform like functionality using Python and APIs and github actions to render YAML definitions of projects into Dremio objects. Before that setup the whole Dremio infrastructure using Terraform Enterprise and Github Actions.
  • Implemented Cloud Security Posture Management first using Aqua and later began migration efforts into WIZ.
  • Used Prometheus and Grafana with Thanos for observing 26 EKS/AKS clusters centrally.
  • Later on, embarked on a journey to extend observability for all of Shell (all azure and AWS accounts).
  • POCed and designed a solution with OpenTelemetry and Jaeger to implement also tracing. Dynatrace was chosen as the new platform for central observability. But ultimately the project got halted because of contractual issues.
  • Also implemented Event to Incident conversion using Moogsoft . Prometheus alerts were automatically converted to Servicenow Incidents using Moogsoft

Cloud DevOps Engineer

Nationale Nederlanden
The Hague
09.2020 - 09.2021

Worked on the core AWS team that provisioning over 500 AWS accounts. Worked on AWS Landing zone provisioning with major focus on:

  • Security measures using AWS Config Rules, Security Hub, GuardDuty, Shield, WAF, etc
  • AWS Organizations and SCPs & SSO.
  • Transit Gateway setup with Direct connect & AWS Firewall.
  • DNS setup across all accounts & on-prem.
  • Logging with cloudtrail/flow logs and splunk.

Few Key achievements here are:

  • Redesigned Concourse Pipelines to be many folds faster and safer to deploy.
  • Implemented AWS Firewall as a replacement to On-premise Proxy system for egress traffic.
  • Implemented a UI using AWS S3 to visualize and understand SCP policies applied to each OU/Account level without providing anyone access to the core accounts.

Lead Site Reliability Engineer

Shell
Amsterdam
06.2019 - 08.2020

Worked as a Lead SRE for a new (Run and Build Team) team.

  • Implemented Jenkins that integrates with Kubernetes for Agents. Every team was designed to have agents that spun up in a namespace of their own. The agent would already be integrated with Hashicorp Vault for Secrets Management and for ability to retrieve Terraform tokens and other secrets for the team.
  • Setup Terraform Enterprise for shell architecting and designing the whole solution.
  • Setup Sample App called 'hello-cloud' that would be implemented as either Lambdas/Terraform/helm as deployment mechanism. The teams simply had to clone the app and rename variables in the pipeline such as team name to get started.
  • Held and hosted weekly webinars for a developer community of over 200 to demo the roadmap/migration paths to Jenkins & Terraform Enterprise.

Senior Site Reliability Engineer

Shell
Bengaluru
05.2017 - 05.2019

200 shell staff working on building and using the latest technologies to find petroleum.

  • Among the first 3 engineers who setup the whole idea. Started a whole journey in Shell that still continues. A platform that has grown to over a 100 SREs and hundreds of consumer community.
  • Into core team that facilitates and dictates how the entire program should work. Project is driven by Infrastructure as Code principal using Terraform,packer etc. Each commit goes through a release management process and is tested via CI/CD using Jenkins that is run on Kubernetes. Most applications run on Kubernetes. Have over 600 Lambdas in the environment. Use many of AWS' offered services. Working on end to end provisioning which starts from creation of a blank AWS account to setting up everything including Kubernetes, Jenkins etc on code.
  • Also involved in architecting of the entire environment, thinking of Cost, Security, Compliance, Being cloud Agnostic,Entitlements, Authentication, Authorisation, Storage etc. Work as the top Key member here owing to vast infrastructure experience along with great coding skills.
  • The tools used are: Terraform : Terraform is used at the core of the entire move to achieve Infrastructure as code. (IaC). Packer : Used to create AMIs and images. Git/Github: For code development and automation. Jenkins : CI/CD . Used to achieve a fully CI/CD compliant environment. Kubernetes & Docker : Every development happens here.
  • Majority of the work involves around kubernetes. AWS : The entire move is on to AWS. Prometheus/Grafana : Used for monitoring. Terratest : for testing terraform. Qumulo: Cloud NAS storage solution. Lots of Python scripts. Agile methodology. Modern DevOps tools such as Clair/White-source/Selenium etc Lots of strategising and planning.

Duty Manager

Oracle India
Bengaluru
09.2014 - 04.2017

Oracle Managed cloud services is an environment providing Cloud Services to over 500+ clients across the world.

  • A highly automated and reliable environment which demands a constant knowledge update. Into core Storage/Unix Escalations in an environment of 900+ NAS Filers & about 30 SAN Arrays along with about 20,000 servers.
  • Lots of scripting and automation to make the management of such an environment possible.
  • Oracle ZFS storage Administration (Subject Matter Expert) AXIOM Pillar & AXIOM FS1 Administration(L3) Sun StorageTek 6780 Administration Performance Analysis, On the Floor Escalations- Duty Manager, Severity 1 issue handling, Linux Server Administration (OEL), Scripting & automation, RCA Analysis, Team handling, Storage Migrations, Storage Upgrades for both SAN/NAS and Switches, Rundeck, Git, Ansible, Jenkins, Terraform etc

Freelance Consultant

Readers Digest
Bengaluru
09.2014 - 05.2016

Reader's Digest was the client for whom I had worked while I was in my earlier assignment in HCL. When I quit, I was offered a freelance role directly with them. I freelanced and helped stabilize the handover.

Associate Consultant

HCL Technologies
Noida
09.2008 - 09.2014

• Worked as UNIX/Storage/Backup SME and Track Lead at Reader's Digest Association

• Jointly responsible for Storage and Backup escalations with onsite counterpart

• Promoted from Graduate Engineering Trainee to Team Lead and then Manager at Reader's Digest Association

Education

Bachelor of Technology - Electrical, Electronics And Communications Engineering

University of Calicut
Kerala
06.2004 - 08.2008

Skills

    Python

undefined

Certification

CKA

Timeline

Lead Site Reliability Engineer & Product Owner

Shell
08.2023 - Current

CKA

02-2023

CKAD

02-2023

Product Owner and SRE Coach, AWS Platform Team

Shell
08.2022 - 07.2023

Lead Site Reliability Engineer

Shell
09.2021 - 07.2022

Cloud DevOps Engineer

Nationale Nederlanden
09.2020 - 09.2021

Lead Site Reliability Engineer

Shell
06.2019 - 08.2020

Senior Site Reliability Engineer

Shell
05.2017 - 05.2019

Duty Manager

Oracle India
09.2014 - 04.2017

Freelance Consultant

Readers Digest
09.2014 - 05.2016

Associate Consultant

HCL Technologies
09.2008 - 09.2014

Bachelor of Technology - Electrical, Electronics And Communications Engineering

University of Calicut
06.2004 - 08.2008
Vinay NairHands On Product Owner/Lead SRE/Infrastructure Architect