logo
Tags down

shadow

How to replace the last column of a csv file whre the value is greater than 0


By : Trung Trinh
Date : October 14 2020, 08:10 PM
wish helps you I have this large dataset where I want to replace the value of the last column with 1. , Simple and not using any external modules:
code :
with open('/path/to/data.txt', 'r') as f:
    data = [list(map(float, l.strip().split(',')))  for l in f.readlines()]
data = [x[:-1] + ([1] if x[-1] else [0]) for x in data]
data = [list(map(float, l.strip().split(',')))[:-1] + ([1] if list(map(int, l.strip().split(',')))[-1] else [0]) for l in open(r"C:\Users\ShlomiF\Documents\new 3.txt", 'r').readlines()]
with open('/path/to/file/of/choice.txt', 'w') as f:  # Can be same file
    for x in data:
        f.write(','.join(list(map(str, x))) + '\n')


Share : facebook icon twitter icon

Replace all varchar column declarations with a size greater than 8000


By : xiao.wan
Date : March 29 2020, 07:55 AM
fixed the issue. Will look into that further Regex to match a number greater than (or equal to) 8000: /^([89]\d{3}|\d{5,})$/
Where / is a delimiter for the regex, followed by start (^) followed by either ((...|...)) an 8 or 9 and 3 more digits ([89]\d{3}) or five or more digits (\d{5,}) followed by the end of the string ($) and regex delimiter (/).
code :
/varchar\(\s*([89]\d{3}|\d{5,})\s*\)/

How do I replace NA in one column if another column's value is greater than 0?


By : Michael Mann
Date : March 29 2020, 07:55 AM
will help you I have 8 census lines (L1:L8). Currently some of the records have NA rather than 0 when they have been censused. I would like to replace all NA with 0 in each of the columns (L1:L8) when their corresponding effort column (EFFORT_L1:EFFORT_L8) has a value greater than 0 (meaning they have been censused).
code :
df[paste0("L", 1:8)][is.na(df[paste0("L", 1:8)]) 
                     & df[paste0("EFFORT_L", 1:8)] > 0] <- 0
> df
  KARTA YEAR ART L1 L2 L3 L4 L5 L6 L7 L8 EFFORT_L1 EFFORT_L2
1 02C2H 1997 009  0  0  0  0  0  0  0  0        10        10
2 02C2H 1997 031  0  0  0  0  0  0  0  0        10        10
3 02C2H 1997 012  0  7  0  0  0  0  0  0        10        10
4 02C2H 1997 057  0  0  0  0  0  0  1  0        10        10
5 02C2H 1997 065  2  3  1  1  1  0  0  0        10        10
6 02C2H 1997 073  0  0  0  0  0  0  1  0        10        10
  EFFORT_L3 EFFORT_L4 EFFORT_L5 EFFORT_L6 EFFORT_L7 EFFORT_L8
1     9.625        10     9.125      9.75      9.75        10
2     9.625        10     9.125      9.75      9.75        10
3     9.625        10     9.125      9.75      9.75        10
4     9.625        10     9.125      9.75      9.75        10
5     9.625        10     9.125      9.75      9.75        10
6     9.625        10     9.125      9.75      9.75        10
  Total_Route_Effort
1              78.25
2              78.25
3              78.25
4              78.25
5              78.25
6              78.25

Matlab find values greater than thereshold among column 5 of all matrices and replace with NaN


By : Jian Cao
Date : March 29 2020, 07:55 AM
wish help you to fix your issue I have an array of n matrices (45x5). , Where all your matrices are in the 3D matrix YourData :
code :
YourData(YourData(:,5,:)>1000)=NaN;

Replace dates in data-frame column greater than reference date


By : itiongo
Date : March 29 2020, 07:55 AM
will help you I have a data-frame df which has dates and looks like: , Use pd.to_datetime()
code :
referencePeriodEndDate = pd.to_datetime('31/03/2019')
df['DATE_OF_ENTRY'] = pd.to_datetime(df['DATE_OF_ENTRY'])

df['DATE_OF_ENTRY'] = df['DATE_OF_ENTRY'].where(
    df['DATE_OF_ENTRY'] <= referencePeriodEndDate, 'NOT KNOWN'
)

Replace elements in Pandas column which have a distance measure greater than a threshold


By : Sawler
Date : March 29 2020, 07:55 AM
this will help Idea
The idea for this approach is to build an adjacency matrix from the threshold (ed) ratio matrix. From the adjacency matrix build a graph, and from it get the connected components (clusters). Getting to the desired output can be tricky, but it can be achieved with an (absolute) threshold of 0.49.
code :
from difflib import SequenceMatcher

import networkx as nx
import numpy as np
import pandas as pd

df = pd.DataFrame(data=[['comp1', 'fashion'],
                        ['comp2', 'fashionitem'],
                        ['comp3', 'fashionable'],
                        ['comp4', 'auto'],
                        ['comp5', 'autoindustry'],
                        ['comp6', 'automobile'],
                        ['comp6', 'food'],
                        ['comp7', 'delivery']], columns=['company', 'label'])


def distance(a, b):
    return SequenceMatcher(None, a, b).ratio()
# get unique labels
labels = df['label'].unique()

# compute ratios
result = np.array([[distance(li, lj) for lj in labels] for li in labels])

# set diagonal to zero
result[np.arange(8), np.arange(8)] = 0

# build adjacency matrix
adjacency_matrix = (result > 0.49).astype(int)

# create graph
dg = nx.from_numpy_array(adjacency_matrix, create_using=nx.Graph)

# create mapping dictionary from connected components
mapping = {}
for component in nx.connected_components(dg):
    group = labels[np.array(list(component))]
    value = min(group, key=len)
    mapping.update({label: value for label in group})

result = df.assign(label=df.label.map(mapping))

print(result)
  company     label
0   comp1   fashion
1   comp2   fashion
2   comp3   fashion
3   comp4      auto
4   comp5      auto
5   comp6      auto
6   comp6      food
7   comp7  delivery
Related Posts Related Posts :
  • Get mongod rs.status() results from a python script
  • ImportError: C extension: No module named 'parsing' not built
  • python pandas update column values related to previous updated row during iteration over it
  • 3 nested loops: Optimizing a simple simulation for speed
  • Assign subset of values to pandas dataframe with MultiIndex
  • How to group two sets of buttons on each top corner of the screen using Tkinter?
  • django login using class based for custom user
  • MRJob sort reducer output
  • Python Pandas Counts using rolling time window
  • Getting or editing a string from a column in a csv file with pandas
  • Python - Delete row in matrix/array if row contains
  • Using dicom Images with OpenCV in Python
  • Odoo ghost record
  • Creating and assigning multiple variables in a tkinter application
  • Graph dictionary
  • No changes to original dataframe after applying loop
  • AUC of Random forest model is lower after tuning parameters using hypergrid search and CV with 10 folds
  • Python: Reading multiple CSV files, and assigning each to a different variable
  • How to identify empty rectangle using OpenCV
  • How to iterate multilevel dataframe in python
  • How to limit the contour plot with a line plot?
  • Why subclassing a str or int behaves differently from subclising a list or dict?
  • Python decode with translation table
  • i need to click unordered links in the below URL using selenium, python
  • How to join pandas dataframe with itself?
  • How to apply a color cast to a video frame in OpenCV Python?
  • Is there any existing library for median filtering with kernel size greater then 5 using OpenCL acceleration in python?
  • Changing the color of points in scatter plot for different dummy values
  • Calculate center for each polygon in a list efficiently
  • Loading modules in the same Python package
  • replacing pixels in an imagewith pixels from another image python
  • Suggestion on picking the best options of two lists (minimum and maximum )python
  • Resetting Index in a Dataframe drops the Indexed column by 1 row
  • Convert number which are str from readlines to digits - python
  • Unable to authenitcate with python minds api
  • Print variables from a query in python
  • Ipython does not see the installed library
  • Javascript-like array-method chaining in Python?
  • PyQT: Get contents CustFormWidgetIem inside QListWidgetItem
  • Bottle server: HTTPResponse vs bottle.response
  • pytorch vgg model test on one image
  • Runtime scope and `main` symbol is different inside or outside a function
  • Use anaconda in pycharm (Import libraries error, updating anaconda and virtual environment)
  • how to get the sum of a CSV column list to print
  • Python plot drop lines with repeating value in column
  • receive binary file from POST request with BaseHTTPRequestHandler
  • D-Bus - 'ServiceUnknown' exception encountered while calling a remote procedure
  • Pandas .min() method doesn't seem fastest
  • Pandas: How to reference columns of structure: ('Name', n) ('Name', n+1)
  • Read a text file and remove all characters except alphabets & spaces in Python
  • Compute all powerset intersections of two lists
  • Applying literal_eval on string of lists of POS tags gives ValueError
  • Modelling a logic puzzle
  • What is the meaning of Copy_X in sklearn linear models
  • selenium.common.exceptions.ElementNotInteractableException: Message: Element is not displayed
  • pydev debugger does not stop in breakpoint
  • Python windows path regex
  • Flask and selenium-hub are not communicating when dockerised
  • How to use groupby on a single column and perform comparisons for multiple columns in Pandas?
  • Locate a python script without absolute path
  • shadow
    Privacy Policy - Terms - Contact Us © voile276.org