- Primary Maintainer of an internal FastAPI resource backend used by an in-house collection framework (32 endpoints / 8 routers; MongoDB → PostgreSQL dual-stack rearchitecture).
- Co-maintainer of the in-house scraping framework — introduced a Job domain model that became the basis for company-wide monitoring (Grafana plus custom data-quality CLIs).
- Sole owner of a 28-backend egress platform on Kubernetes with drain-rotate + synthetic-check self-healing and a daily Slack health report.
- End-to-end designer of a vertical AI pipeline for an industrial-security customer (Slack → Argo Workflow → scraper → in-house LLM → report → Slack thread).
Career profile
Hiring-oriented profile for Sungju Kim — Data & AI Systems Engineer.
Data Platform / Software Engineer — 5 years across Korea, France, and the US.
I design and operate large-scale data collection pipelines and the cloud / Kubernetes infrastructure they run on, end-to-end. Most recently as Data Platform / Software Engineer at S2W (Seongnam, KR), Primary Maintainer of an internal FastAPI resource backend and sole owner of the egress / monitoring / data-quality tooling. Before that: Data Engineer (remote, freelance) at Quantum Analytica (Boston, USA) on PySpark / Snowflake ETLs, and Python Developer on the Scraping team at Data Impact by NielsenIQ (Paris, FR) — internship to CDI (French permanent contract).
- Led the cannabis market analytics ETL framework on PySpark — 8 ETLs, parquet+Zstd to S3 and PostgreSQL.
- Led a 4-person real estate ETL project to production on Databricks, powering a commercialized dashboard.
- Built a Snowflake data warehouse from multi-source ingestion that helped close a client contract.
- Internship to full-time (CDI) on a 25-person scraping team supporting global e-commerce data clients (L'Oréal, Coca-Cola, Unilever).
- Built and maintained spiders across 57 countries on a custom Scrapy framework — 553 web + 108 mobile spider directories.
- Contributed anti-bot middlewares (Cloudflare/Akamai bypass, Redis cookie management, multi-mode pipelines) and a 15-month monitoring stack on InfluxDB + Grafana + Slack with ML-based anomaly alerts.
I work across the stack required to ship a small data / AI system end-to-end.
Production-grade systems I owned end-to-end or led.
Vertical AI Pipeline for an Industrial-Security Customer (S2W)
Sole owner end-to-end: Slack command → Argo Workflow → site-specific scrapers → in-house LLM field extraction + report draft → results posted back to the same Slack thread. Offered to the customer as Slack self-service.
Self-Healing Egress Platform (S2W)
Sole owner across three stages: vendor selection + contract negotiation → Kubernetes IP-rotation platform → Postgres event ledger + drain-rotate cron + synthetic-CONNECT health checker. 28 backends across 5 providers, behind HAProxy.
Concurrency-Safe Resource Backend for a Collection Framework (S2W)
Primary Maintainer of an internal FastAPI backend: 8 routers / 32 v2 endpoints over MongoDB + PostgreSQL dual-stack. Atomic account acquire/release, Playwright session merge, dynamic seed scheduling, captures → Airflow trigger. Led the MongoDB → PostgreSQL relational-path migration and a v3.0.0 architecture refresh.
Cannabis Market ETL Framework (Quantum Analytica)
Led an 8-ETL PySpark framework writing parquet + Zstd to S3 and PostgreSQL dual-target, plus a Snowflake data warehouse build that helped close a client contract.
Global E-commerce Data Collection — 57 countries (Data Impact by NielsenIQ)
Built and maintained spiders across 553 web + 108 mobile retailer directories on a custom Scrapy framework. Contributed anti-bot middlewares (Cloudflare / Akamai), a 15-month InfluxDB / Grafana monitoring stack with ML-based anomaly alerts, and a Bitbucket-Pipelines → ScrapingHub auto-deploy.
Preferred channel for hiring conversations.
For formal applications, please refer to the resume submitted through the application channel.
View the public lab and systems →