Files
nexus/wiki/concepts/Token.md

38 lines
758 B
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
id: token
title: "Token"
type: concept
tags: [LLM, tokenization, input-unit]
sources:
- "[[LLM Terms Framework]]"
last_updated: 2025-12-20
---
## Definition
Token是大模型的基本输入单元是文本处理的最小单位。
## Tokenization Rules
- 1英文字符 ≈ 0.3 token
- 1中文字符 ≈ 0.6 token
- 标点符号和空格也占用token
## Why It Matters
- 影响API调用成本
- 决定上下文长度限制
- 影响生成速度
## Context Window
模型能接受的token数量限制
- 较短的模型4K-8K tokens
- 中等模型32K-128K tokens
- 长上下文模型1M+ tokens
## Connections
- [[LLM]] ← uses ← [[Token]]
- [[Token]] → affects → [[成本计算]]
- [[Token]] → affects → [[上下文限制]]