Files
nexus/wiki/sources/scrapy-playwright-抓取tiktok-shop-data.md
2026-04-28 20:03:11 +08:00

51 lines
2.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
title: "Scrapy + Playwright 抓取TikTok Shop Data"
type: source
tags: [playwright, scrapy, tiktok-shop, python, docker]
date: 2026-04-28
---
## Source File
- [[Scrapy + Playwright 抓取TikTok Shop Data]]
## Summary用中文描述
- 核心主题:使用 Scrapy + Playwright 抓取 TikTok Shop 店铺数据的技术配置与实践指南
- 问题域TikTok Shop 跨境电商数据采集的 Python 环境搭建与依赖安装
- 方法/机制:
- 创建 Python 虚拟环境venv并激活
- 在虚拟环境中安装 `scrapy``scrapy-playwright`
- 安装 Playwright Chromium 浏览器
- 通过 `scrapy runspider` 命令行运行爬虫并传入店铺 URL 参数
- Docker 环境下需在 Dockerfile 中预配置 Python 虚拟环境路径
- 验证 Playwright 安装成功的测试脚本
- 结论/价值:提供了完整的开发环境配置流程,覆盖本地开发和 Docker 容器两种部署场景
## Key Claims用中文描述
- scrapy-playwright 插件可实现 Scrapy 爬虫与 Playwright 浏览器自动化协同工作
- 在 Docker 容器中运行需要通过 Dockerfile 预先配置 Python venv 环境变量
- Playwright Chromium 是驱动动态页面渲染的核心依赖
- `python -c "from playwright.sync_api import sync_playwright; print('Playwright OK')"` 可验证安装成功
## Key Quotes
> "pip install scrapy scrapy-playwright" — 核心依赖安装命令
> "scrapy runspider tiktok_shop_spider.py -a shop_url=\"https://www.tiktok.com/shop/store/xxxx/xxxxxxxxxxxx\"" — 爬虫运行命令示例
> "RUN python3 -m venv /app/venv && ENV PATH=\"/app/venv/bin:$PATH\"" — Docker 虚拟环境配置
## Key Concepts
- [[WebScraping]]:通过 Scrapy + Playwright 组合实现 TikTok Shop 动态网页数据抓取
- [[BrowserAutomation]]Playwright 提供浏览器自动化能力,用于渲染 JavaScript 动态内容
- [[VirtualEnvironment]]Python venv 隔离项目依赖,避免包冲突
## Key Entities
- [[TikTok Shop]]:数据采集的目标电商平台
- [[Scrapy]]Python 爬虫框架,提供网页抓取基础设施
- [[Playwright]]:微软开源浏览器自动化工具,支持 Chromium/ Firefox/WebKit
- [[Docker]]:容器化部署平台,文中涉及 Dockerfile 配置
## Connections
- [[可自动化、可扩展、AI增强的电商数据采集与处理系统]] ← related_to ← [[Scrapy + Playwright 抓取TikTok Shop Data]]
- [[TikTok Shop - Apache Superset Dashboard设计思路]] ← related_to ← [[Scrapy + Playwright 抓取TikTok Shop Data]]
## Contradictions
- 无明显内容冲突