본문 바로가기
Boostcamp AI Tech/[week 6-14] LEVEL2

[Week 9] F1 Score , Stratified K Fold

by newnu 2021. 10. 1.
반응형

f1 score = Precision 과 Recall의 조화평균

둘중 하나의 값이 작을 때 성능에도 영향을 주기 위해서 

 

Precision(정밀도) :  True로 예측한 값 중 실제 True

Recall(재현율) : 실제 정답 True 인 값들 중 예측도 True

 

AUPRC : x축을 Recall, y축을 Precision으로 설정하여 그린 곡선 아래의 면적 값인 모델 평가 지표

Stratified K-Fold

sklearn.model_selection.StratifiedKFold

class sklearn.model_selection.StratifiedKFold(n_splits=5, *, shuffle=False, random_state=None)

 

Parameters :

- n_splits int, default=5

   fold 수

- shuffle bool, default=False

   배치로 나누기 전에 shuffle 할건지

   나눠지면 shuffle 안됨

- random_state int, RandomState instance or None, default=None

  shuffle =True 이면 인덱스 선택할 때 random_ state

  Otherwise, leave random_state as None.

 

Methods:

1. get_n_splits(X=None, y=None, groups=None)

 

Parameters

X object

Always ignored, exists for compatibility.

y object

Always ignored, exists for compatibility.

groups object

Always ignored, exists for compatibility.

 

Returns

n_splits int

Returns the number of splitting iterations in the cross-validator.

 

2. split(X, y, groups=None)

Parameters

X array-like of shape (n_samples, n_features)

Training data, where n_samples is the number of samples and n_features is the number of features.

Note that providing y is sufficient to generate the splits and hence np.zeros(n_samples) may be used as a placeholder for X instead of actual training data.

y array-like of shape (n_samples,)

The target variable for supervised learning problems. Stratification is done based on the y labels.

groups object

Always ignored, exists for compatibility.

 

Yields:

train ndarray

The training set indices for that split.

test ndarray

The testing set indices for that split.

 

import numpy as np
from sklearn.model_selection import StratifiedKFold

X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]])
y = np.array([0, 0, 1, 1])

skf = StratifiedKFold(n_splits=2)
skf.get_n_splits(X, y)
# 2

print(skf)
# StratifiedKFold(n_splits=2, random_state=None, shuffle=False)

for train_index, test_index in skf.split(X, y):
     print("TRAIN:", train_index, "TEST:", test_index)
     X_train, X_test = X[train_index], X[test_index]
     y_train, y_test = y[train_index], y[test_index]

# TRAIN: [1 3] TEST: [0 2]
# TRAIN: [0 2] TEST: [1 3]

 

반응형

'Boostcamp AI Tech > [week 6-14] LEVEL2' 카테고리의 다른 글

[Week 10] PStage 과정 정리  (0) 2021.10.06
[Week 10] SentencePieceTokenizer  (0) 2021.10.05
[week 7 - day 1,2] Transformer  (0) 2021.09.14
[Week 6 - Day 3 ] seq2seq  (0) 2021.09.09
[Week 6 - Day 2] RNN, LSTM, GRU  (0) 2021.09.07