Tags
 IOS SQL HTML C RUBY-ON-RAILS MYSQL ASP.NET DEVELOPMENT RUBY .NET LINUX SQL-SERVER REGEX WINDOWS ALGORITHM ECLIPSE VISUAL-STUDIO STRING SVN PERFORMANCE APACHE-FLEX UNIT-TESTING SECURITY LINQ UNIX MATH EMAIL OOP LANGUAGE-AGNOSTIC VB6

# Calculating AUC for Unsupervised LOF in sklearn

By : Colin Watt
Date : October 18 2020, 08:10 AM
hope this fix your issue You don't have to convert the model.negative_outlier_factor_ into probabilities for ROC_AUC calculation, just a relative score will be good enough.
code :
``````samples = [[0., 0., 0.], [0., .5, 0.], [1., 1., .5]]

from sklearn.neighbors import LocalOutlierFactor
lof = LocalOutlierFactor(n_neighbors=3,novelty=True)
lof.fit(samples)
roc_auc(1/lof.score_samples(X_test),y_test)
``````

Share :

## Correctly calculating the F1 score in Sklearn

Date : March 29 2020, 07:55 AM
Hope that helps The F-score is a weight average of the precision and recall of your dataset. i.e. What portion of your predictions were true and what portion of trues did you predict: https://en.wikipedia.org/wiki/F1_score
I believe that Sklearn's function wants an array or matrix of labels for y_true and y_pred, where y_true is "actual label of i-th element" and y_pred is "predicted/classified label of the i-th element". The order of each must be matched! The ordering is what allows Sklean compute F-score for all predictions instead of just a single value.
code :
``````y_pred = [True, False, True, False, False]
``````
``````y_true = [False, False, True, False False]
``````

## sklearn: calculating accuracy score of k-means on the test data set

By : Jerry
Date : March 29 2020, 07:55 AM
I wish did fix the issue. In terms of evaluating accuracy. You should remember that k-means is not a classification tool, thus analyzing accuracy is not a very good idea. You can do this, but this is not what k-means is for. It is supposed to find a grouping of data which maximizes between-clusters distances, it does not use your labeling to train. Consequently, things like k-means are usually tested with things like RandIndex and other clustering metrics. For maximization of accuracy you should fit actual classifier, like kNN, logistic regression, SVM, etc.
In terms of the code itself, k_means.predict(X_test) returns labeling, it does not update the internal labels_ field, you should do
code :
``````print(k_means.predict(X_test))
``````
``````print(k_means.labels_)
print(y_test)
``````

## Problem while calculating roc curve using sklearn for an array of binaries and an array of float scores

By : Terry Benight
Date : March 29 2020, 07:55 AM
it helps some times It looked like you switched the target scores and the binary labels. I had to remove the dtype=object from your arrays to make it work. Following is the working solution. As per the official page here, the first argument for roc_curve is the binary labels in the range {0,1} and the second argument is the target score. You were passing probab as the target scores and y_test as the binary labels.
code :
``````from sklearn.metrics import roc_curve

y_test = np.asarray([-10.54, -9.49, -9.4, -9.37, -9.36, -9.31, -9.28, -9.14, -9.11, -9.03, -9.01, -9.0, -8.99, -8.98, -8.96, -8.91, -8.9, -8.9, -8.9, -8.89, -8.88, -8.86, -8.86, -8.84, -8.83, -8.78, -8.76, -8.74, -8.74, -8.69, -8.69, -8.69, -8.67, -8.64, -8.61, -8.57, -8.51, -8.5, -8.49, -8.48, -8.4, -8.34, -8.33, -8.3, -8.29, -8.29, -8.27, -8.26, -8.25, -8.22, -8.15, -8.12, -8.1, -8.08, -8.04, -8.04, -7.96, -7.94, -7.94, -7.85, -7.83, -7.82, -7.82, -7.81, -7.76, -7.74, -7.71, -7.65, -7.57, -7.54, -7.47, -7.4, -7.39, -7.34, -7.33, -7.32, -7.27, -7.23, -7.16, -7.08, -7.05, -6.92, -6.9, -6.89, -6.86, -6.86, -6.83, -6.78, -6.73, -6.69, -6.59, -6.57, -6.4, -6.37, -6.21, -6.19, -6.16, -6.04, -6.04, -5.57, -5.54, -5.35, -5.24, -5.0, -4.92])
probas = np.asarray([1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
fpr, tpr, thresholds = roc_curve(probas,y_test)
plt.plot(fpr, label = 'fpr')
plt.plot(tpr, label = 'tpr')
plt.legend(fontsize=16)
``````

## Calculating TF-IDF from a Pandas DataFrame (without sklearn)

By : user3296595
Date : March 29 2020, 07:55 AM
I wish this help you I did a little modification to your implementation, I assume you've already calculated the IDF DataFrame. Let's create a dummy one of some uniform values:
code :
``````IDF = pd.DataFrame([1.0/len(df.index)]*len(df.index), index = df.index)
print(IDF)

0
11-0.txt     0.166667
1342-0.txt   0.166667
1661-0.txt   0.166667
1952-0.txt   0.166667
84-0.txt     0.166667
pg16328.txt  0.166667
``````
``````TF = df.copy()

def choice(term, TF, impute_val=0.000001):
TF = TF.fillna(impute_val)

# Based on the formula provided, calculate the TFIDF score for all documents of this term
tfidf_score = TF[term].values.ravel() * IDF.values.ravel()

doc_names = TF.index.tolist()
# sort by TFIDF score and return the doc name that has max tfidf value
return sorted(zip(doc_names,tfidf_score),key=lambda x: x[1])[-1][0]

print(choice(term='accept', TF=TF))

'1661-0.txt'
``````