--- title: "Scrapy + Playwright 抓取TikTok Shop Data" type: source tags: [playwright, scrapy, tiktok, 跨境电商] date: 2026-04-18 --- ## Source File - [[raw/跨境电商/Scrapy + Playwright 抓取TikTok Shop Data.md]] ## Summary - 核心主题:TikTok Shop 数据抓取环境配置指南 - 问题域:跨境电商数据采集、Docker 环境配置 - 方法/机制:Python 虚拟环境 + Scrapy + Playwright 组合爬虫架构 - 结论/价值:提供 Docker 容器内运行 Python 爬虫的完整配置方案 ## Key Claims - Scrapy + Playwright 组合是抓取动态网页(TikTok Shop)的最佳方案 - Docker 容器内运行需要额外配置虚拟环境(venv)才能正常工作 - 虚拟环境可以隔离依赖,避免全局污染 ## Key Concepts - [[虚拟环境 (venv)]]:Python 依赖隔离机制,通过 `python3 -m venv venv` 创建 - [[Scrapy]]:Python 爬虫框架,适合结构化网页抓取 - [[Playwright]]:Microsoft 浏览器自动化工具,支持 Chromium/Chrome 渲染动态内容 - [[scrapy-playwright]]:Scrapy 与 Playwright 集成的中间件,支持 JavaScript 渲染页面 ## Key Entities - [[Docker]]:容器化平台,本场景的部署环境 - [[TikTok Shop]]:字节跳动旗下电商平台,本文的抓取目标 ## Connections - [[Scrapy]] ← uses ← [[scrapy-playwright]] - [[Playwright]] ← provides ← [[浏览器自动化]] - [[Docker]] ← requires ← [[虚拟环境 (venv)]] ## Contradictions - (暂无)