作者:生活趣图分享 | 来源:互联网 | 2023-06-03 18:10
ImnewtoMachineLearningandworkingonaprojectusingpython(3.6),pandas,NumpyandSKLearn.我
I'm new to Machine Learning and working on a project using python(3.6), pandas, Numpy and SKLearn.
我是机器学习的新手,并使用python(3.6),pandas,Numpy和SKLearn开展项目。
My DataFrame is:
我的DataFrame是:
discount tax total subtotal productid
3 0 20 13 002
10 3 106 94 003
46.49 6 21 20 004
Here's how I have performed the classification:
以下是我执行分类的方法:
df_full = pd.read_excel('input/Potential_Learning_Patterns.xlsx', sheet_name=0)
df_full.head()
#for convert to numeric
df_full['discount'] = pd.to_numeric(df_full['discount'], errors='coerce')
df_full['productdiscount'] = pd.to_numeric(df_full['discount'], errors='coerce')
df_full['Class'] = ((df_full['discount'] > 20) &
(df_full['tax'] == 0) &
(df_full['productdiscount'] > 20) &
(df_full['total'] > 100)).astype(int)
print (df_full)
# Get some sample data from entire dataset
data = df_full.sample(frac = 0.1, random_state = 1)
print(data.shape)
data.isnull().sum()
# Convert excel data into matrix
columns = "invoiceid locationid timestamp customerid discount tax total subtotal productid quantity productprice productdiscount invoice_products_id producttax invoice_payments_id paymentmethod paymentdetails amount Class(0/1) Class".split()
X = pd.DataFrame.as_matrix(data, columns=columns)
Y = data.Class
# temp = np.array(temp).reshape((len(temp), 1)
Y = Y.values.reshape(Y.shape[0], 1)
X.shape
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.06)
X_test, X_dev, Y_test, Y_dev = train_test_split(X_test, Y_test, test_size = .5)
# Check if there is Classification Values - 0/1 in training set and other set
np.where(Y_train == 1)
np.where(Y_test == 1)
np.where(Y_dev == 1)
# Determine no of fraud cases in dataset
Fraud = data[data['Class'] == 1]
Valid = data[data['Class'] == 0]
# calculate percentages for Fraud & Valid
outlier_fraction = len(Fraud) / float(len(Valid))
print(outlier_fraction)
print('Fraud Cases : {}'.format(len(Fraud)))
print('Valid Cases : {}'.format(len(Valid)))
# Correlation matrix
corrmat = data.corr()
fig = plt.figure( figsize = (12, 9))
sns.heatmap(corrmat, vmax = .8, square = True)
plt.show()
Here's how I have applied reshaping :
以下是我应用重塑的方法:
# Get all the columns from dataframe
columns = data.columns.tolist()
# Filter the columns to remove data we don't want
columns = [c for c in columns if c not in ["Class"] ]
# store the variables we want to predicting on
target = "Class"
for column in data.columns:
if data[column].dtype == type(object):
le = LabelEncoder()
data[column] = le.fit_transform(data[column])
X = data[column]
X = data[column]
Y = data[target]
# Print the shapes of X & Y
print(X.shape)
print(Y.shape)
# define a random state
state = 1
# define the outlier detection method
classifiers = {
"Isolation Forest": IsolationForest(max_samples=len(X),
cOntamination=outlier_fraction,
random_state=state),
"Local Outlier Factor": LocalOutlierFactor(
n_neighbors = 20,
cOntamination= outlier_fraction)
}
# fit the model
n_outliers = len(Fraud)
for i, (clf_name, clf) in enumerate(classifiers.items()):
# fit te data and tag outliers
if clf_name == "Local Outlier Factor":
y_pred = clf.fit_predict(X)
scores_pred = clf.negative_outlier_factor_
else:
clf.fit(X)
scores_pred = clf.decision_function(X)
y_pred = clf.predict(X)
# Reshape the prediction values to 0 for valid and 1 for fraudulent
y_pred[y_pred == 1] = 0
y_pred[y_pred == -1] = 1
n_errors = (y_pred != Y).sum()
# run classification metrics
print('{}:{}').format(clf_name, n_errors)
print(accuracy_score(Y, y_pred ))
print(classification_report(Y, y_pred ))
The code works fine till reshaping the sample and target. But when I try fit method for my classifiers it returns an error like:
代码工作正常,直到重塑样本和目标。但是当我为我的分类器尝试fit方法时,它返回一个错误,如:
ValueError: Expected 2D array, got 1D array instead: array=[1 0]. Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
ValueError:预期的2D数组,改为获得1D数组:array = [1 0]。如果数据具有单个要素,则使用array.reshape(-1,1)重新整形数据;如果数据包含单个样本,则使用array.reshape(1,-1)重新整形数据。
I'm new to machine learning, what I did wrong here? I have multiple features how I can correctly reshape my sample arrays?
我是机器学习的新手,我在这里做错了什么?我有多个功能如何正确地重塑我的样本数组?
Help me, please! Thanks in advance!
请帮帮我!提前致谢!
1 个解决方案