• Experience with Hadoop, HDFS, Spark (PySpark & Scala), Flink, Druid, Trino, Iceberg, Hive Metastore
• ETL & CDC: 20+ batch/streaming pipelines (Kafka → S3/Kafka), Airflow DAG orchestration, Flink jobs
• Migration & Optimization: Ambari → CDP, CDP → Kubernetes; improved Lakehouse read performance by ~45%
• Dashboarding: Superset, Pivot; made 10+ data sources accessible to multiple teams
• DevOps & CI/CD
• Kubernetes cluster setup, maintenance, scaling, security, GPU node integration
• CI/CD: GitLab, ArgoCD, Kaniko; end-to-end pipeline design for Python & Java projects
• Containerization: Docker, Harbor (security scan, backup, LDAP), Nexus, Helm (4+ charts from scratch)
• Monitoring: Prometheus, Grafana, Loki → managing logs & metrics for 20+ services
• Secrets & Security: Kubeseal, LDAP/AD integration
• Platform Engineering
• Managed Kafka, Flink, Druid, Trino clusters (HA, scaling, monitoring, ingestion)
• Scalable storage & replication with MinIO (S3)
• PostgreSQL, MySQL, MSSQL installation & management
• Enabled 60+ users to access S3 (Iceberg) via CloudBeaver
• Deployed DS models as prod-ready APIs using FastAPI (tested for millions of users)
• Built crawler & brute-force detection systems
• Supported LLM/RAG projects (architecture & deployment)
• Dockerized DS projects & automated Kubernetes deployment via CI/CD