f1 score = Precision 과 Recall의 조화평균
둘중 하나의 값이 작을 때 성능에도 영향을 주기 위해서
Precision(정밀도) : True로 예측한 값 중 실제 True
Recall(재현율) : 실제 정답 True 인 값들 중 예측도 True
AUPRC : x축을 Recall, y축을 Precision으로 설정하여 그린 곡선 아래의 면적 값인 모델 평가 지표
Stratified K-Fold
sklearn.model_selection.StratifiedKFold
class sklearn.model_selection.StratifiedKFold(n_splits=5, *, shuffle=False, random_state=None)
Parameters :
- n_splits int, default=5
fold 수
- shuffle bool, default=False
배치로 나누기 전에 shuffle 할건지
나눠지면 shuffle 안됨
- random_state int, RandomState instance or None, default=None
shuffle =True 이면 인덱스 선택할 때 random_ state
Otherwise, leave random_state as None.
Methods:
1. get_n_splits(X=None, y=None, groups=None)
Parameters
X object
Always ignored, exists for compatibility.
y object
Always ignored, exists for compatibility.
groups object
Always ignored, exists for compatibility.
Returns
n_splits int
Returns the number of splitting iterations in the cross-validator.
2. split(X, y, groups=None)
Parameters
X array-like of shape (n_samples, n_features)
Training data, where n_samples is the number of samples and n_features is the number of features.
Note that providing y is sufficient to generate the splits and hence np.zeros(n_samples) may be used as a placeholder for X instead of actual training data.
y array-like of shape (n_samples,)
The target variable for supervised learning problems. Stratification is done based on the y labels.
groups object
Always ignored, exists for compatibility.
Yields:
train ndarray
The training set indices for that split.
test ndarray
The testing set indices for that split.
import numpy as np
from sklearn.model_selection import StratifiedKFold
X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]])
y = np.array([0, 0, 1, 1])
skf = StratifiedKFold(n_splits=2)
skf.get_n_splits(X, y)
# 2
print(skf)
# StratifiedKFold(n_splits=2, random_state=None, shuffle=False)
for train_index, test_index in skf.split(X, y):
print("TRAIN:", train_index, "TEST:", test_index)
X_train, X_test = X[train_index], X[test_index]
y_train, y_test = y[train_index], y[test_index]
# TRAIN: [1 3] TEST: [0 2]
# TRAIN: [0 2] TEST: [1 3]
'Boostcamp AI Tech > [week 6-14] LEVEL2' 카테고리의 다른 글
[Week 10] PStage 과정 정리 (0) | 2021.10.06 |
---|---|
[Week 10] SentencePieceTokenizer (0) | 2021.10.05 |
[week 7 - day 1,2] Transformer (0) | 2021.09.14 |
[Week 6 - Day 3 ] seq2seq (0) | 2021.09.09 |
[Week 6 - Day 2] RNN, LSTM, GRU (0) | 2021.09.07 |