Files
nexus/wiki/sources/scrapy-playwright-抓取tiktok-shop-data.md

50 lines
2.9 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
title: "Scrapy + Playwright 抓取TikTok Shop Data"
type: source
tags: [playwright, scrapy, tiktok-shop, python, docker, 爬虫]
date: 2026-04-24
---
## Source File
- [[跨境电商/Scrapy + Playwright 抓取TikTok Shop Data.md]]
## Summary用中文描述
- 核心主题:使用 Scrapy + Playwright 技术栈抓取 TikTok Shop 商家数据的环境配置与运行指南
- 问题域TikTok Shop 跨境电商数据采集的工程实现
- 方法/机制:通过 Python venv 虚拟环境隔离依赖,使用 scrapy-playwright 集成包驱动 Chromium 浏览器执行动态页面渲染,再通过 Docker 容器化部署
- 结论/价值:提供了完整的开发环境搭建流程和生产级 Docker 部署配置,是跨境电商数据采集项目的技术基座
## Key Claims用中文描述
- **虚拟环境隔离是首选方案**:通过 `python3 -m venv` 创建独立虚拟环境,安装 Scrapy + scrapy-playwright 依赖,相比 Docker 直接安装更适合开发调试
- **Playwright Chromium 是渲染引擎**:通过 `playwright install chromium` 安装无头浏览器,负责处理 TikTok Shop 的 JavaScript 动态加载内容
- **Docker 部署需配置 venv 环境变量**:在 Dockerfile 中添加 `RUN python3 -m venv /app/venv ENV PATH="/app/venv/bin:$PATH"`,使容器内 Python 命令使用虚拟环境
- **可用命令行参数指定目标店铺**:通过 `scrapy runspider tiktok_shop_spider.py -a shop_url="..."` 传递 TikTok Shop 店铺 URL 参数
## Key Quotes
> "最推荐:创建虚拟环境 (venv) 并安装 Scrapy + Playwright" — 文档作者推荐的最佳实践方案
> "source venv/bin/activate" — venv 激活命令
> "RUN python3 -m venv /app/venv ENV PATH=\"/app/venv/bin:$PATH\"" — Docker 中配置 Python venv 的标准写法
> "python -c \"from playwright.sync_api import sync_playwright; print('Playwright OK')\"" — Playwright 验证命令
## Key Concepts
- [[Scrapy]]Python 爬虫框架,负责请求调度、数据解析和管道存储
- [[Playwright]]Microsoft 开发的无头浏览器自动化工具,支持 Chromium/Firefox/WebKit 多引擎,用于渲染 JavaScript 动态页面
- [[scrapy-playwright]]:连接 Scrapy 与 Playwright 的集成包,使 Scrapy Spider 能够执行浏览器自动化操作
- [[venv]]Python 内置虚拟环境工具,用于隔离项目依赖,避免版本冲突
- [[Docker]]:容器化平台,用于生产环境部署
- [[Chromium]]Google 浏览器引擎Playwright 的默认渲染引擎
## Key Entities
- [[TikTok Shop]]:字节跳动旗下的电商平台,本文档的数据采集目标
- shenwei文档作者提供实际操作笔记
## Connections
- [[TikTok Shop Apache Superset Dashboard]] ← uses ← [[Scrapy-Playwright-TikTok-Shop-Data]]
- [[做tk跨境思路不对努力白费]] ← related_to ← [[Scrapy-Playwright-TikTok-Shop-Data]]
## Contradictions
- 无已知冲突内容