logo
Tags down

shadow

Calculating AUC for Unsupervised LOF in sklearn


By : Colin Watt
Date : October 18 2020, 08:10 AM
hope this fix your issue You don't have to convert the model.negative_outlier_factor_ into probabilities for ROC_AUC calculation, just a relative score will be good enough.
code :
samples = [[0., 0., 0.], [0., .5, 0.], [1., 1., .5]]

from sklearn.neighbors import LocalOutlierFactor
lof = LocalOutlierFactor(n_neighbors=3,novelty=True)
lof.fit(samples) 
roc_auc(1/lof.score_samples(X_test),y_test)


Share : facebook icon twitter icon

Correctly calculating the F1 score in Sklearn


By : Marlies van Klink
Date : March 29 2020, 07:55 AM
Hope that helps The F-score is a weight average of the precision and recall of your dataset. i.e. What portion of your predictions were true and what portion of trues did you predict: https://en.wikipedia.org/wiki/F1_score
I believe that Sklearn's function wants an array or matrix of labels for y_true and y_pred, where y_true is "actual label of i-th element" and y_pred is "predicted/classified label of the i-th element". The order of each must be matched! The ordering is what allows Sklean compute F-score for all predictions instead of just a single value.
code :
y_pred = [True, False, True, False, False]
y_true = [False, False, True, False False]

sklearn: calculating accuracy score of k-means on the test data set


By : Jerry
Date : March 29 2020, 07:55 AM
I wish did fix the issue. In terms of evaluating accuracy. You should remember that k-means is not a classification tool, thus analyzing accuracy is not a very good idea. You can do this, but this is not what k-means is for. It is supposed to find a grouping of data which maximizes between-clusters distances, it does not use your labeling to train. Consequently, things like k-means are usually tested with things like RandIndex and other clustering metrics. For maximization of accuracy you should fit actual classifier, like kNN, logistic regression, SVM, etc.
In terms of the code itself, k_means.predict(X_test) returns labeling, it does not update the internal labels_ field, you should do
code :
print(k_means.predict(X_test))
print(k_means.labels_)
print(y_test)

Problem while calculating roc curve using sklearn for an array of binaries and an array of float scores


By : Terry Benight
Date : March 29 2020, 07:55 AM
it helps some times It looked like you switched the target scores and the binary labels. I had to remove the dtype=object from your arrays to make it work. Following is the working solution. As per the official page here, the first argument for roc_curve is the binary labels in the range {0,1} and the second argument is the target score. You were passing probab as the target scores and y_test as the binary labels.
code :
from sklearn.metrics import roc_curve

y_test = np.asarray([-10.54, -9.49, -9.4, -9.37, -9.36, -9.31, -9.28, -9.14, -9.11, -9.03, -9.01, -9.0, -8.99, -8.98, -8.96, -8.91, -8.9, -8.9, -8.9, -8.89, -8.88, -8.86, -8.86, -8.84, -8.83, -8.78, -8.76, -8.74, -8.74, -8.69, -8.69, -8.69, -8.67, -8.64, -8.61, -8.57, -8.51, -8.5, -8.49, -8.48, -8.4, -8.34, -8.33, -8.3, -8.29, -8.29, -8.27, -8.26, -8.25, -8.22, -8.15, -8.12, -8.1, -8.08, -8.04, -8.04, -7.96, -7.94, -7.94, -7.85, -7.83, -7.82, -7.82, -7.81, -7.76, -7.74, -7.71, -7.65, -7.57, -7.54, -7.47, -7.4, -7.39, -7.34, -7.33, -7.32, -7.27, -7.23, -7.16, -7.08, -7.05, -6.92, -6.9, -6.89, -6.86, -6.86, -6.83, -6.78, -6.73, -6.69, -6.59, -6.57, -6.4, -6.37, -6.21, -6.19, -6.16, -6.04, -6.04, -5.57, -5.54, -5.35, -5.24, -5.0, -4.92])
probas = np.asarray([1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
fpr, tpr, thresholds = roc_curve(probas,y_test)
plt.plot(fpr, label = 'fpr')
plt.plot(tpr, label = 'tpr')
plt.legend(fontsize=16)

Calculating TF-IDF from a Pandas DataFrame (without sklearn)


By : user3296595
Date : March 29 2020, 07:55 AM
I wish this help you I did a little modification to your implementation, I assume you've already calculated the IDF DataFrame. Let's create a dummy one of some uniform values:
code :
IDF = pd.DataFrame([1.0/len(df.index)]*len(df.index), index = df.index)
print(IDF)


                0
11-0.txt     0.166667
1342-0.txt   0.166667
1661-0.txt   0.166667
1952-0.txt   0.166667
84-0.txt     0.166667
pg16328.txt  0.166667
TF = df.copy()



def choice(term, TF, impute_val=0.000001):
    TF = TF.fillna(impute_val)

    # Based on the formula provided, calculate the TFIDF score for all documents of this term
    tfidf_score = TF[term].values.ravel() * IDF.values.ravel()

    doc_names = TF.index.tolist()
    # sort by TFIDF score and return the doc name that has max tfidf value
    return sorted(zip(doc_names,tfidf_score),key=lambda x: x[1])[-1][0]

print(choice(term='accept', TF=TF))

'1661-0.txt'

how to choose parameters in TfidfVectorizer in sklearn during unsupervised clustering


By : Steven Adams
Date : March 29 2020, 07:55 AM
I think the issue was by ths following , If you are, for instance, using these vectors in a classification task, you can vary these parameters (and of course also the parameters of the classifier) and see which values give you the best performance.
You can do that in sklearn easily with the GridSearchCV and Pipeline objects
Related Posts Related Posts :
  • Get mongod rs.status() results from a python script
  • ImportError: C extension: No module named 'parsing' not built
  • python pandas update column values related to previous updated row during iteration over it
  • 3 nested loops: Optimizing a simple simulation for speed
  • Assign subset of values to pandas dataframe with MultiIndex
  • How to group two sets of buttons on each top corner of the screen using Tkinter?
  • django login using class based for custom user
  • MRJob sort reducer output
  • Python Pandas Counts using rolling time window
  • Getting or editing a string from a column in a csv file with pandas
  • Python - Delete row in matrix/array if row contains
  • Using dicom Images with OpenCV in Python
  • Odoo ghost record
  • Creating and assigning multiple variables in a tkinter application
  • Graph dictionary
  • No changes to original dataframe after applying loop
  • AUC of Random forest model is lower after tuning parameters using hypergrid search and CV with 10 folds
  • Python: Reading multiple CSV files, and assigning each to a different variable
  • How to identify empty rectangle using OpenCV
  • How to iterate multilevel dataframe in python
  • How to limit the contour plot with a line plot?
  • Why subclassing a str or int behaves differently from subclising a list or dict?
  • Python decode with translation table
  • i need to click unordered links in the below URL using selenium, python
  • How to join pandas dataframe with itself?
  • How to apply a color cast to a video frame in OpenCV Python?
  • Is there any existing library for median filtering with kernel size greater then 5 using OpenCL acceleration in python?
  • Changing the color of points in scatter plot for different dummy values
  • Calculate center for each polygon in a list efficiently
  • Loading modules in the same Python package
  • replacing pixels in an imagewith pixels from another image python
  • Suggestion on picking the best options of two lists (minimum and maximum )python
  • Resetting Index in a Dataframe drops the Indexed column by 1 row
  • Convert number which are str from readlines to digits - python
  • Unable to authenitcate with python minds api
  • Print variables from a query in python
  • Ipython does not see the installed library
  • Javascript-like array-method chaining in Python?
  • PyQT: Get contents CustFormWidgetIem inside QListWidgetItem
  • Bottle server: HTTPResponse vs bottle.response
  • pytorch vgg model test on one image
  • Runtime scope and `main` symbol is different inside or outside a function
  • Use anaconda in pycharm (Import libraries error, updating anaconda and virtual environment)
  • how to get the sum of a CSV column list to print
  • Python plot drop lines with repeating value in column
  • receive binary file from POST request with BaseHTTPRequestHandler
  • D-Bus - 'ServiceUnknown' exception encountered while calling a remote procedure
  • Pandas .min() method doesn't seem fastest
  • Pandas: How to reference columns of structure: ('Name', n) ('Name', n+1)
  • Read a text file and remove all characters except alphabets & spaces in Python
  • Compute all powerset intersections of two lists
  • Applying literal_eval on string of lists of POS tags gives ValueError
  • Modelling a logic puzzle
  • What is the meaning of Copy_X in sklearn linear models
  • selenium.common.exceptions.ElementNotInteractableException: Message: Element is not displayed
  • pydev debugger does not stop in breakpoint
  • Python windows path regex
  • Flask and selenium-hub are not communicating when dockerised
  • How to use groupby on a single column and perform comparisons for multiple columns in Pandas?
  • Locate a python script without absolute path
  • shadow
    Privacy Policy - Terms - Contact Us © voile276.org