38 lines
758 B
Markdown
38 lines
758 B
Markdown
---
|
||
id: token
|
||
title: "Token"
|
||
type: concept
|
||
tags: [LLM, tokenization, input-unit]
|
||
sources:
|
||
- "[[LLM Terms Framework]]"
|
||
last_updated: 2025-12-20
|
||
---
|
||
|
||
## Definition
|
||
|
||
Token是大模型的基本输入单元,是文本处理的最小单位。
|
||
|
||
## Tokenization Rules
|
||
|
||
- 1英文字符 ≈ 0.3 token
|
||
- 1中文字符 ≈ 0.6 token
|
||
- 标点符号和空格也占用token
|
||
|
||
## Why It Matters
|
||
|
||
- 影响API调用成本
|
||
- 决定上下文长度限制
|
||
- 影响生成速度
|
||
|
||
## Context Window
|
||
|
||
模型能接受的token数量限制:
|
||
- 较短的模型:4K-8K tokens
|
||
- 中等模型:32K-128K tokens
|
||
- 长上下文模型:1M+ tokens
|
||
|
||
## Connections
|
||
- [[LLM]] ← uses ← [[Token]]
|
||
- [[Token]] → affects → [[成本计算]]
|
||
- [[Token]] → affects → [[上下文限制]]
|