Auto-sync
This commit is contained in:
@@ -1,37 +0,0 @@
|
||||
---
|
||||
id: token
|
||||
title: "Token"
|
||||
type: concept
|
||||
tags: [LLM, tokenization, input-unit]
|
||||
sources:
|
||||
- "[[LLM Terms Framework]]"
|
||||
last_updated: 2025-12-20
|
||||
---
|
||||
|
||||
## Definition
|
||||
|
||||
Token是大模型的基本输入单元,是文本处理的最小单位。
|
||||
|
||||
## Tokenization Rules
|
||||
|
||||
- 1英文字符 ≈ 0.3 token
|
||||
- 1中文字符 ≈ 0.6 token
|
||||
- 标点符号和空格也占用token
|
||||
|
||||
## Why It Matters
|
||||
|
||||
- 影响API调用成本
|
||||
- 决定上下文长度限制
|
||||
- 影响生成速度
|
||||
|
||||
## Context Window
|
||||
|
||||
模型能接受的token数量限制:
|
||||
- 较短的模型:4K-8K tokens
|
||||
- 中等模型:32K-128K tokens
|
||||
- 长上下文模型:1M+ tokens
|
||||
|
||||
## Connections
|
||||
- [[LLM]] ← uses ← [[Token]]
|
||||
- [[Token]] → affects → [[成本计算]]
|
||||
- [[Token]] → affects → [[上下文限制]]
|
||||
Reference in New Issue
Block a user