Files
nexus/wiki/sources/Scrapy-Playwright-抓取TikTok-Shop-Data.md
weishen e62797a33a Batch 9: Obsidian插件/AI开源平替/Coze培训/TK面单/Ubuntu科学上网
- Sources: 5个新文档
- Concepts: ProxyChains, SOCKS5代理, Docker Daemon代理
- Index: 更新至 Batch 9
- 累计 sources: 108/182
2026-04-16 06:36:36 +08:00

41 lines
1.6 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
title: "Scrapy + Playwright 抓取 TikTok Shop Data"
type: source
tags: [scrapy, playwright, tiktok, data-collection, python]
date: 2025-09-29
---
## Source File
- [[raw/跨境电商/Scrapy + Playwright 抓取TikTok Shop Data.md]]
## Summary
- 核心主题:使用 Scrapy + Scrapy-Playwright 抓取 TikTok Shop 店铺数据
- 问题域TikTok Shop 页面为动态渲染,传统 HTTP 请求无法获取数据
- 方法/机制Python venv 虚拟环境隔离依赖scrapy-playwright 驱动 Chromium 渲染动态内容;`scrapy runspider` CLI 运行爬虫
- 结论/价值:提供 Docker 容器化部署配置venv + PATH 环境变量Playwright Chromium 替代 requests + Selenium 组合
## Key Claims
- Python venv 虚拟环境是管理 Scrapy/Playwright 依赖的最佳实践,避免全局环境污染
- `scrapy-playwright` 集成包将 Playwright 无头浏览器注册为 Scrapy 下载器中间件
- `playwright install chromium` 安装无头 Chromium支持 JavaScript 渲染
- Docker 容器部署需在 Dockerfile 中预先配置 venv 并设置 PATH
## Key Concepts
- [[Scrapy]]Python 开源爬虫框架,异步结构化抓取,支持 Item Pipeline
- [[Playwright]]Microsoft 浏览器自动化工具,支持 Chromium/Firefox/WebKit
- [[电商数据采集]]TikTok Shop 数据采集的技术栈
## Key Entities
- [[TikTok Shop]]:字节跳动旗下电商平台,数据采集目标
## Connections
- [[Scrapy]] ← 中间件整合 ← [[Playwright]]
- [[Scrapy]] → 输出结构化数据 → [[电商数据采集]]
## Contradictions
-
## Metadata
- 来源:个人实践笔记
- 标签scrapy、playwright、tiktok