Auto-sync: 2026-04-18 12:03

This commit is contained in:
2026-04-18 12:03:11 +08:00
parent 1a82750f1c
commit 7d361490b2
85 changed files with 2857 additions and 7 deletions

View File

@@ -0,0 +1,38 @@
---
title: "Scrapy + Playwright 抓取TikTok Shop Data"
type: source
tags: [playwright, scrapy, tiktok, 跨境电商]
date: 2026-04-18
---
## Source File
- [[raw/跨境电商/Scrapy + Playwright 抓取TikTok Shop Data.md]]
## Summary
- 核心主题TikTok Shop 数据抓取环境配置指南
- 问题域跨境电商数据采集、Docker 环境配置
- 方法/机制Python 虚拟环境 + Scrapy + Playwright 组合爬虫架构
- 结论/价值:提供 Docker 容器内运行 Python 爬虫的完整配置方案
## Key Claims
- Scrapy + Playwright 组合是抓取动态网页TikTok Shop的最佳方案
- Docker 容器内运行需要额外配置虚拟环境venv才能正常工作
- 虚拟环境可以隔离依赖,避免全局污染
## Key Concepts
- [[虚拟环境 (venv)]]Python 依赖隔离机制,通过 `python3 -m venv venv` 创建
- [[Scrapy]]Python 爬虫框架,适合结构化网页抓取
- [[Playwright]]Microsoft 浏览器自动化工具,支持 Chromium/Chrome 渲染动态内容
- [[scrapy-playwright]]Scrapy 与 Playwright 集成的中间件,支持 JavaScript 渲染页面
## Key Entities
- [[Docker]]:容器化平台,本场景的部署环境
- [[TikTok Shop]]:字节跳动旗下电商平台,本文的抓取目标
## Connections
- [[Scrapy]] ← uses ← [[scrapy-playwright]]
- [[Playwright]] ← provides ← [[浏览器自动化]]
- [[Docker]] ← requires ← [[虚拟环境 (venv)]]
## Contradictions
- (暂无)