Professional Summary

DevOps Engineer specializing in Infrastructure-as-Code (IaC) and Kubernetes orchestration. I bridge the gap between legacy enterprise systems (Active Directory, On-prem) and modern cloud-native stacks (Azure AKS, AWS EKS). I focus on using code to kill manual toil, scale infrastructure, and improve system reliability. Microsoft Certified Azure Administrator (AZ-104).

Technical Skills

Hover over skills to see which projects use them

DevOps & Automation

  • Terraform Used in: IDP, Multi-Cloud K8s
  • Infrastructure as Code (IaC) Used in: IDP, Multi-Cloud K8s
  • Docker Used in: IDP, Multi-Cloud K8s
  • GitHub Actions Used in: IDP (CI/CD Pipeline)
  • Jenkins Used in: Previous enterprise projects
  • CI/CD Pipelines Used in: IDP, Multi-Cloud K8s

Cloud & Infrastructure

  • Microsoft Azure (VMs, VNets, NSGs) Used in: Ohio DRC, IDP, Multi-Cloud K8s
  • Azure Load Balancers Used in: IDP (AKS Infrastructure)
  • Azure Monitor & Entra ID Used in: Ohio DRC, Infosys
  • Amazon Web Services (AWS) Used in: Multi-Cloud K8s (EKS)

Operations & Reliability

  • ServiceNow (ITSM) Used in: Ohio DRC (99.8% SLA)
  • Incident Management Used in: Ohio DRC, Infosys
  • Monitoring & Alerting Used in: Observability Stack, Ohio DRC
  • Root Cause Analysis Used in: Ohio DRC, Infosys

Scripting & OS

  • Python Used in: IDP (Orchestration), Automation
  • Bash Used in: CI/CD, Infrastructure Scripts
  • PowerShell Used in: Ohio DRC (Windows Server)
  • Linux Used in: IDP, Multi-Cloud K8s, Observability
  • Windows Server Used in: Ohio DRC (Enterprise IT)

Professional Experience

Click highlighted achievements to see additional context and metrics

Information Technologist I – Infrastructure & Operations
Ohio Department of Rehabilitation and Correction
Aug 2024 - Present
  • Built a Python-based automation to pick up and route ServiceNow tickets, which stopped the team from having to manually sort every VPN and access request
    Created a Python script that polls the ServiceNow API, categorizes tickets based on keywords, and auto-assigns them to the correct queue. Reduced manual ticket sorting time by ~2 hours/day for the team.
  • Maintained 99.8% system availability by resolving 100 tickets monthly related to network outages, internet connectivity, and server performance issues
    Handle incidents ranging from VPN failures to server performance degradation. Maintained 99.8% uptime across supported systems through rapid incident response and root cause fixes.
  • Owned the identity and access management flow for the department, using MIM and Active Directory to manage daily permissions for 10,000+ users
    Responsible for user provisioning, deprovisioning, and permission changes across 10,000+ users. Used Microsoft Identity Manager (MIM) to sync identity changes between on-prem AD and cloud systems.
  • Fixed synchronization errors between on-prem servers and Azure Entra ID by debugging logs and teaming up with cloud leads to ensure SSO remained stable
    Diagnosed sync failures in Azure AD Connect by analyzing event logs and connector space errors. Worked with Azure team to resolve authentication issues preventing SSO login for 500+ users.
AI Operations & QA Intern
WelSpot
Mar 2024 - Aug 2024
  • Dropped chatbot error rates by 10% by designing 50-line prompt chains and building a validation system to verify LLM output quality
    Created structured prompt templates with validation logic to catch hallucinations and format errors. Reduced error rate from 15% → 5% by implementing output verification before responses were sent to users.
  • Plugged AI logic into the web application by working with the dev team to integrate Langchain into a Flask-based backend
    Integrated Langchain framework into existing Flask API endpoints. Worked with frontend team to handle async LLM responses and implement proper error handling for API timeouts.
  • Identified and fixed 20+ specific failure modes in the AI logic before the code was deployed to production
    Created test cases covering edge cases like empty inputs, special characters, and context overflow. Found 20+ bugs including prompt injection vulnerabilities and memory leaks in the LLM wrapper code.
System Engineer – Application Support & QA
Infosys
Nov 2020 - Apr 2022
  • Managed nightly data processing cycles using BMC Control-M to ensure all critical batch jobs finished before business-start SLAs
    Monitored batch job execution in Control-M, handling job failures and dependencies. Ensured all ETL processes completed before 6 AM business cutoff, maintaining SLA compliance for month-end reporting.
  • Sped up bug detection by using nmon and Informatica to map out resource telemetry, identifying the exact timing of CPU and memory spikes
    Used nmon to capture real-time server metrics during batch processing. Correlated CPU/memory spikes with Informatica job logs to identify which ETL transformations were causing performance bottlenecks, reducing debug time from hours to minutes.
  • Identified application breaking points by running heavy load and stress tests using JMeter and LoadRunner
    Designed load test scenarios simulating peak user traffic. Identified database connection pool exhaustion at 500 concurrent users, leading to infrastructure scaling recommendations.

Project Deep Dives

Click any project to explore the problem, solution architecture, technical decisions, and lessons learned.

Internal Developer Platform (IDP)
Kubernetes • Terraform • GitHub Actions • Python

Cut environment setup time from 2 hours to 15 minutes by giving developers a Terraform-based "push button" to launch their own infrastructure. Ended the "works on my machine" problem by building a GitHub Actions pipeline that automatically catches configuration errors before deployment.

2h → 15min
Deployment Time
99.95%
Success Rate
100%
Automation
Multi-Cloud Kubernetes Platform
EKS • AKS • Helm • ArgoCD • Istio

Managed 50+ microservices across AWS and Azure using ArgoCD to keep cloud environments in sync with GitHub-based source of truth. Built real-time Grafana dashboards to monitor cluster health, linking pod restarts to resource spikes to reduce troubleshooting time for the team.

50+
Microservices
99.9%
Uptime SLA
GitOps
Deployment Model
Observability Stack Implementation
Prometheus • Grafana • Loki • Jaeger

Built monitoring and logging stack to catch production issues faster. Prometheus for metrics, Grafana for dashboards, Loki for logs, Jaeger for distributed tracing.

70%
Faster MTTD
100+
Dashboards
Full Stack
Tracing