Career Profile
Seasoned DevOps Engineer with a deep understanding of modern infrastructure and software architectures, now specializing in Site Reliability Engineering. Adept at leveraging SRE principles to optimize system performance and ensure high availability. Recent projects include developing strategic dashboards aligned with Google's golden signals for enhanced observability. Extensive experience with Kubernetes management across cloud and on-premise platforms, coupled with strong Python and Go development skills. Proven ability to collaborate effectively across teams and deliver impactful solutions.
Skills & Proficiency
APM
Python
AWS & Azure
Docker
Kubernetes
Unix & Linux Scripting
Go
CI/CD Processing
C/C++
PowerShell Scripting
Core Competencies
Experience
- Modernized application performance monitoring by migrating from App Dynamics to Dynatrace, leading to increased visibility and actionable insights.
- Developed strategic KPI dashboards aligned with Google SRE golden signals, providing actionable insights into system health and performance.
- Created comprehensive SRE documentation, standardizing monitoring and observability practices for applications and platforms.
- Designed and implemented Azure dashboards and alerts, empowering application and DevOps teams with proactive monitoring capabilities.
- Collaborated with application teams to refine and develop critical metrics, driving data-driven decision-making and business optimization.
- Delivered critical observability and monitoring insights during major incidents, facilitating rapid resolution and minimizing business impact.
- Performed customer installations including Kubernetes, Anodot's application, and monitoring application. Troubleshooting issues with the customer's environment and validating data ingested into the platform.
- Partnered with Sales and Customer Success to both discover and analyze data from the point of origin, through pipeline or APIs, and into the platform
- Developed documentation around defining core business KPI for Anodot's platform both SaaS and On-prem. This was centered around monitoring dashboards, alerting and defining effective ranges and impact analysis.
- Performed load and scale testing on the platform within isolated AWS environments. This included scaling out a Kubernetes cluster and pushing ingest rate and payloads into the platform.
- Developed a weighted analysis and metric form for the comparison of enterprise backup vendor products including ROI and cost analysis for a project with a $1-2M budget.
- Created Wiki-based documentation and architectural schematics for corporate data protection environment including RACI charts, Escalation Matrix, and Complex Decision Trees leading to a 100% satisfaction compliance review board finding.
- Automated AWS cloud-based application protection using snapshots and lifecycle management techniques. Moved overall protection compliance rate from 20% to 99.9% within a 3-month span.
- Designed Nutanix local and remote data protection domains to protect critical worldwide data from over 15 countries and 25 cities to global data centers within four global regions.
- Developed an API script used for the protection and isolation of a tertiary copy of critical application backup data resolving a mandate instituted by executive leadership for a near-immutable layer of data protection.
- Worked with architects on application and infrastructure design and documentation leading to improvements in regulatory compliance and reductions in operational oversights.
- Assisted in the development of the first malicious destruction of data protection policies and procedures.
- Redesigned entire corporate backup infrastructure using disk-based backups and asynchronous replication technologies.
- Architected and implemented a complete overhaul of an isolated backup SAN fabric.
- Oversaw NAS infrastructure redesign to include both DR and NDMP backup components.
- Integrated monitoring and alerting into NAS and Backup infrastructure environments.
- Developed Unix scripts providing the development division with independent control over production/development/QA life cycles for their configuration management environment.