--- title: "Defect Prediction" type: concept tags: [testing, machine-learning, quality] sources: [testing-test-results-analyzer] last_updated: 2026-04-28 --- ## Definition 缺陷预测——使用机器学习模型基于代码指标和历史缺陷数据,预测哪些代码区域最可能包含缺陷,指导测试资源的定向投入。 ## Approach ### Feature Engineering (from TestResultsAnalyzer) ```python from sklearn.ensemble import RandomForestClassifier from sklearn.model_selection import train_test_split # 特征:代码指标 features = extract_code_metrics() # 圈复杂度、代码行数、变更频率等 historical_defects = load_historical_defect_data() # 历史缺陷标签 # 训练/测试分割 X_train, X_test, y_train, y_test = train_test_split( features, historical_defects, test_size=0.2, random_state=42 ) # Random Forest 分类器 model = RandomForestClassifier(n_estimators=100, random_state=42) model.fit(X_train, y_train) # 预测 + 置信度 + 特征重要性 predictions = model.predict_proba(features) feature_importance = model.feature_importances_ accuracy = model.score(X_test, y_test) ``` ### Key Metrics - **Prediction Accuracy**:模型在测试集上的准确率,目标 ≥ 85%。 - **Feature Importance**:哪些代码指标(圈复杂度、变更频率、代码行数等)对缺陷预测最有预测力。 - **Confidence Score**:每个预测结果附带置信度评分。 ## Connections - [[Statistical-Analysis]]:模型验证需统计显著性检验。 - [[Test-Coverage-Analysis]]:预测的高风险区域优先增加测试覆盖率。 - [[Release-Readiness-Assessment]]:缺陷预测结果纳入整体发布就绪度评估。 - [[Quality-Metrics]]:缺陷密度是预测模型的目标变量。