--- title: "Scrapy + Playwright 抓取TikTok Shop Data" type: source tags: [scrapy, playwright, tiktok-shop, python, docker, web-scraping] date: 2026-05-02 --- ## Source File - [[raw/AI/Scrapy + Playwright 抓取TikTok Shop Data.md]] ## Summary(用中文描述) - 核心主题:使用 Scrapy + Playwright 技术栈抓取 TikTok Shop 店铺数据 - 问题域:TikTok Shop 电商数据采集、Web 页面爬取的环境配置 - 方法/机制:通过 Python venv 隔离环境,安装 scrapy + scrapy-playwright,并配置 Playwright Chromium 浏览器,实现动态页面渲染与数据抓取 - 结论/价值:提供了一套完整的技术方案,用于在 Docker 环境中运行 TikTok Shop 数据爬虫 ## Key Claims(用中文描述) - 开发者通过创建 Python venv 隔离项目依赖,避免与系统环境冲突 - scrapy-playwright 插件将 Playwright(浏览器自动化)与 Scrapy(爬虫框架)结合,支持 JS 渲染页面抓取 - Docker 容器内运行需要额外配置 venv 和 PATH 环境变量 ## Key Quotes > "source venv/bin/activate" — 进入 Python 虚拟环境,终端前缀出现 `(venv)` 表示激活成功 > "scrapy runspider tiktok_shop_spider.py -a shop_url=\"https://www.tiktok.com/shop/store/aopuro/7495894041403296077\"" — 运行 TikTok Shop 爬虫的命令示例 > "python -c \"from playwright.sync_api import sync_playwright; print('Playwright OK')\"" — 验证 Playwright 安装是否成功的命令 ## Key Concepts - [[Scrapy]]:Python 开源爬虫框架,支持异步 HTTP 请求和数据管道 - [[Playwright]]:微软开发的浏览器自动化工具,支持 Chromium/Firefox/WebKit,支持 JS 渲染页面抓取 - [[scrapy-playwright]]:连接 Scrapy 与 Playwright 的中间件,使 Scrapy Spider 能控制 Playwright 浏览器 - [[Python venv]]:Python 虚拟环境工具,用于隔离项目依赖,避免包版本冲突 ## Key Entities - [[TikTok Shop]]:字节跳动旗下的电商平台,本文档的爬取目标 ## Connections - [[TikTok Shop - Apache Superset Dashboard设计思路]] ← 关联 ← [[Scrapy + Playwright 抓取TikTok Shop Data]] - [[可自动化、可扩展、AI增强的电商数据采集与处理系统]] ← 属于 ← [[Scrapy + Playwright 抓取TikTok Shop Data]] ## Contradictions - (暂无发现冲突内容)