pandas字符串型空白值处理
import numpy as np
all_datass.replace(to_replace=r'^\s*$',value=np.nan,regex=True,inplace=True)
all_datass =all_datass.dropna()
pandas随机、分层、过采样
随机:pandas sample
DataFrame.sample(n=None, frac=None, replace=False, weights=None, random_state=None, axis=None)[source]
分层:可以是几个特定的类别再随机sample后组合
df[df["s"]=="a"].sample()
df[df["s"]=="b"].sample()
df[df["s"]=="c"].sample()
过采样:(上,下采样)
from imblearn.over_sampling import SMOTE
X_resampled_smote, y_resampled_smote = SMOTE(sampling_strategy=0.05).fit_sample(X, y)
sorted(Counter(y_resampled_smote).items())