Auto-sync: 2026-04-16 21:08

2026-04-16 21:08:55 +08:00
parent be7e39a4d0
commit 0dc7e71539
37 changed files with 846 additions and 3 deletions
--- a/wiki/sources/ai-enhanced-ecommerce-data-collection-processing-system.md
+++ b/wiki/sources/ai-enhanced-ecommerce-data-collection-processing-system.md
@@ -0,0 +1,48 @@
+---
+title: "可自动化、可扩展、AI增强的电商数据采集与处理系统"
+type: source
+tags: [电商, 数据采集, 自动化, AI, n8n, Scrapy, Playwright]
+date: 2025-11-11
+---
+
+## Source File
+- [[raw/Home Office/可自动化、可扩展、AI增强的电商数据采集与处理系统.md]]
+
+## Summary
+- 核心主题：基于 Docker + Ubuntu + n8n 的电商数据采集与处理系统设计
+- 问题域：电商网站产品信息自动化采集、清洗、AI处理与可视化
+- 方法/机制：Scrapy + Playwright 爬虫层 → n8n 自动化管道 → LLM AI处理 → PostgreSQL/Grafana 存储展示
+- 结论/价值：构建可自动化、可扩展的电商数据管线，支持定时采集、AI摘要分类、异常检测、报告通知
+
+## Key Claims
+- Scrapy + Playwright 组合适合电商爬虫（静态抓取+动态渲染）
+- n8n 可通过 workflow 实现全管线自动化
+- Ollama 本地模型可替代外部 API 进行离线 AI 处理
+- 分布式调度可用 Scrapyd 或 Archetype 实现扩展
+
+## Key Quotes
+> "你想要的是一个可自动化、可扩展、AI增强的数据采集与处理系统，基于 Docker + Ubuntu + n8n 搭建。" — 原文开头
+
+## Key Concepts
+- [[Scrapy]]：Python 爬虫框架，适合静态页面和结构化抓取
+- [[Playwright]]：Microsoft 浏览器自动化工具，支持动态页面渲染
+- [[n8n]]：开源工作流自动化工具，可编排爬虫、AI处理、数据存储
+- [[Ollama]]：本地 LLM 运行环境，支持离线 AI 处理
+- [[Docker Compose]]：多容器编排工具，定义爬虫服务架构
+
+## Key Entities
+- [[Docker]]：容器化平台
+- [[PostgreSQL]]：关系型数据库
+- [[Grafana]]：数据可视化工具
+- [[MinIO]]：S3 兼容对象存储
+- [[FastAPI]]：Python Web 框架，可作为服务层暴露 API
+
+## Connections
+- [[Scrapy]] ← depends_on ← [[Playwright]]
+- [[n8n]] ← orchestrates ← [[Scrapy]]
+- [[n8n]] ← calls ← [[Ollama]]
+- [[PostgreSQL]] ← stores ← AI处理结果
+- [[Grafana]] ← visualizes ← PostgreSQL数据
+
+## Contradictions
+- （暂无）