본문 바로가기
Boostcamp AI Tech/[week 6-14] LEVEL2

[Week 9] F1 Score , Stratified K Fold

by newnu 2021. 10. 1.

f1 score = Precision 과 Recall의 조화평균

둘중 하나의 값이 작을 때 성능에도 영향을 주기 위해서 


Precision(정밀도) :  True로 예측한 값 중 실제 True

Recall(재현율) : 실제 정답 True 인 값들 중 예측도 True


AUPRC : x축을 Recall, y축을 Precision으로 설정하여 그린 곡선 아래의 면적 값인 모델 평가 지표

Stratified K-Fold


class sklearn.model_selection.StratifiedKFold(n_splits=5, *, shuffle=False, random_state=None)


Parameters :

- n_splits int, default=5

   fold 수

- shuffle bool, default=False

   배치로 나누기 전에 shuffle 할건지

   나눠지면 shuffle 안됨

- random_state int, RandomState instance or None, default=None

  shuffle =True 이면 인덱스 선택할 때 random_ state

  Otherwise, leave random_state as None.



1. get_n_splits(X=None, y=None, groups=None)



X object

Always ignored, exists for compatibility.

y object

Always ignored, exists for compatibility.

groups object

Always ignored, exists for compatibility.



n_splits int

Returns the number of splitting iterations in the cross-validator.


2. split(X, y, groups=None)


X array-like of shape (n_samples, n_features)

Training data, where n_samples is the number of samples and n_features is the number of features.

Note that providing y is sufficient to generate the splits and hence np.zeros(n_samples) may be used as a placeholder for X instead of actual training data.

y array-like of shape (n_samples,)

The target variable for supervised learning problems. Stratification is done based on the y labels.

groups object

Always ignored, exists for compatibility.



train ndarray

The training set indices for that split.

test ndarray

The testing set indices for that split.


import numpy as np
from sklearn.model_selection import StratifiedKFold

X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]])
y = np.array([0, 0, 1, 1])

skf = StratifiedKFold(n_splits=2)
skf.get_n_splits(X, y)
# 2

# StratifiedKFold(n_splits=2, random_state=None, shuffle=False)

for train_index, test_index in skf.split(X, y):
     print("TRAIN:", train_index, "TEST:", test_index)
     X_train, X_test = X[train_index], X[test_index]
     y_train, y_test = y[train_index], y[test_index]

# TRAIN: [1 3] TEST: [0 2]
# TRAIN: [0 2] TEST: [1 3]



'Boostcamp AI Tech > [week 6-14] LEVEL2' 카테고리의 다른 글

[Week 10] PStage 과정 정리  (0) 2021.10.06
[Week 10] SentencePieceTokenizer  (0) 2021.10.05
[week 7 - day 1,2] Transformer  (0) 2021.09.14
[Week 6 - Day 3 ] seq2seq  (0) 2021.09.09
[Week 6 - Day 2] RNN, LSTM, GRU  (0) 2021.09.07