Career profile

Hiring-oriented profile for Sungju Kim — Data & AI Systems Engineer.

Profile

Data Platform / Software Engineer — 5 years across Korea, France, and the US.

I design and operate large-scale data collection pipelines and the cloud / Kubernetes infrastructure they run on, end-to-end. Most recently as Data Platform / Software Engineer at S2W (Seongnam, KR), Primary Maintainer of an internal FastAPI resource backend and sole owner of the egress / monitoring / data-quality tooling. Before that: Data Engineer (remote, freelance) at Quantum Analytica (Boston, USA) on PySpark / Snowflake ETLs, and Python Developer on the Scraping team at Data Impact by NielsenIQ (Paris, FR) — internship to CDI (French permanent contract).

Experience

Data Platform / Software Engineer

S2W · Seongnam, Korea

Jan 2024 – Jun 2026

Primary Maintainer of an internal FastAPI resource backend used by an in-house collection framework (32 endpoints / 8 routers; MongoDB → PostgreSQL dual-stack rearchitecture).
Co-maintainer of the in-house scraping framework — introduced a Job domain model that became the basis for company-wide monitoring (Grafana plus custom data-quality CLIs).
Sole owner of a 28-backend egress platform on Kubernetes with drain-rotate + synthetic-check self-healing and a daily Slack health report.
End-to-end designer of a vertical AI pipeline for an industrial-security customer (Slack → Argo Workflow → scraper → in-house LLM → report → Slack thread).

Data Engineer (Freelance, Remote)

Quantum Analytica · Boston, USA (remote)

Oct 2021 – Aug 2023

Led the cannabis market analytics ETL framework on PySpark — 8 ETLs, parquet+Zstd to S3 and PostgreSQL.
Led a 4-person real estate ETL project to production on Databricks, powering a commercialized dashboard.
Built a Snowflake data warehouse from multi-source ingestion that helped close a client contract.

Python Developer — Scraping Team (Internship → CDI)

Data Impact by NielsenIQ · Paris, France

Feb 2021 – Mar 2023

Internship to full-time (CDI) on a 25-person scraping team supporting global e-commerce data clients (L'Oréal, Coca-Cola, Unilever).
Built and maintained spiders across 57 countries on a custom Scrapy framework — 553 web + 108 mobile spider directories.
Contributed anti-bot middlewares (Cloudflare/Akamai bypass, Redis cookie management, multi-mode pipelines) and a 15-month monitoring stack on InfluxDB + Grafana + Slack with ML-based anomaly alerts.

Skills, by layer

I work across the stack required to ship a small data / AI system end-to-end.

Data Layer

PythonSQLPySparkSnowflakeDatabricksPostgreSQLMongoDBElasticsearchRedisInfluxDB

Platform Layer

DockerKubernetesLinuxArgoCDArgo WorkflowsAirflowHAProxyGrafanaLokiGitLab CI

AI Application Layer

LLM AgentsPrompt WorkflowsRAG conceptsEvaluationHuman-in-the-loopAutomated ReportsLangChainVision LLMs

Product / FDE Layer

Problem FramingWorkflow MappingTechnical CommunicationIterationRequirements DiscoveryCross-functional Delivery

Selected work

Production-grade systems I owned end-to-end or led.

Vertical AI Pipeline for an Industrial-Security Customer (S2W)

Sole owner end-to-end: Slack command → Argo Workflow → site-specific scrapers → in-house LLM field extraction + report draft → results posted back to the same Slack thread. Offered to the customer as Slack self-service.

PythonArgo WorkflowsFastAPISlack BoltIn-house LLM

Self-Healing Egress Platform (S2W)

Sole owner across three stages: vendor selection + contract negotiation → Kubernetes IP-rotation platform → Postgres event ledger + drain-rotate cron + synthetic-CONNECT health checker. 28 backends across 5 providers, behind HAProxy.

KubernetesHAProxyGluetunPostgreSQLGitLab CI

Concurrency-Safe Resource Backend for a Collection Framework (S2W)

Primary Maintainer of an internal FastAPI backend: 8 routers / 32 v2 endpoints over MongoDB + PostgreSQL dual-stack. Atomic account acquire/release, Playwright session merge, dynamic seed scheduling, captures → Airflow trigger. Led the MongoDB → PostgreSQL relational-path migration and a v3.0.0 architecture refresh.

FastAPIMongoDB (Beanie)PostgreSQL (Tortoise)Playwright

Cannabis Market ETL Framework (Quantum Analytica)

Led an 8-ETL PySpark framework writing parquet + Zstd to S3 and PostgreSQL dual-target, plus a Snowflake data warehouse build that helped close a client contract.

PySparkAWS S3SnowflakeDatabricksPostgreSQL

Global E-commerce Data Collection — 57 countries (Data Impact by NielsenIQ)

Built and maintained spiders across 553 web + 108 mobile retailer directories on a custom Scrapy framework. Contributed anti-bot middlewares (Cloudflare / Akamai), a 15-month InfluxDB / Grafana monitoring stack with ML-based anomaly alerts, and a Bitbucket-Pipelines → ScrapingHub auto-deploy.

Python 3.8ScrapySeleniumScrapingHubGCSInfluxDB

Contact

Preferred channel for hiring conversations.

Email sungjukim906@gmail.com GitHub github.com/sungjuu LinkedIn linkedin.com/in/sungju-kim

For formal applications, please refer to the resume submitted through the application channel.

View the public lab and systems →