Python:Remove the rest of the words and only keep the first word

By : user2175083
Date : October 16 2020, 08:10 AM
around this issue You can use split and select first value of lists by indexing with drop_duplicates for remove duplicates:
code :
changed_data= pd.Series([x.split()[0] for x in df['Category'].drop_duplicates(), 
df = pd.DataFrame({'Category':['some way','nice', 'yop yop m', 
                               'be happy', 'nice', 'yop man']})

print (df)
0   some way
1       nice
2  yop yop m
3   be happy
4       nice
5    yop man
print (changed_data)
0    some
1    nice
2     yop
3      be
5     yop
Name: Category, dtype: object
print (changed_data)
0    some
1    nice
2     yop
3      be
Name: Category, dtype: object

Remove every word of after a bracket and append remaining words to Python list

By : user2224965
Date : March 29 2020, 07:55 AM
wish helps you I would like to remove every word of after a bracket (including bracket) in a line of string and append remaining words to Python list. , Try this:
code :
import re
s = "[AA BB, CC, DD, EE] [PP QQ, RR] [WW XX, YY, ZZ]"
response = re.findall('(?<![[\w])\w+', s)

Fix the words and remove the unwanted spaces between splitted word using python?

By : mistvan
Date : March 29 2020, 07:55 AM
Any of those help Here's a simple script that works for your example. Obviously you'd want a bigger corpus of valid words. Also, you'd probably want to have an elif branch that looked back at the previous word if joining the next word failed to fix a non-word.
code :
from string import punctuation

word_list = "big list of words including a programming language is general purpose"
valid_words = set(word_list.split())

bad = "Java is a prog ramming lan guage. C is a gen eral purpose la nguage."
words = bad.split()

out_words = []
i = 0
while i < len(words):
    word = words[i]
    if word not in valid_words and i+1 < len(words):
        next_word = words[i+1]
        joined = word + next_word
        if joined.strip(punctuation) in valid_words:
            word = joined
            i += 1
    i += 1

good = " ".join(out_words)

How to remove all words before specific word using Python (if there are multiple specific words)?

By : user2655912
Date : March 29 2020, 07:55 AM
hope this fix your issue I want to remove all words before a specific word. But in my sentence there are some specific word. the following example: , You can use this regex and replace with empty string:
code :
result = re.sub(r"^.+?(?=SELECT)", "", your_string)
result = re.sub(r"^.+?SELECT", "SELECT", your_string)
partitions = your_string.partition("SELECT")
result = partitions[1] + partitions[2]

How to clean csv file from non-word characters and remove words that contain them in python?

By : Eric W
Date : March 29 2020, 07:55 AM
seems to work fine You have to test each word of the list against the regex. As the expression will be used more than once, it is better to compile it first:
code :
reject = re.compile(r'\W+')
[w for w in words if not reject.search(w)]
clean = re.compile(r'\w+$')
[w for w in words if clean.match(w)]
['set', 'photo', 'recording', 'record', 'belief', 'institution', 'change']

Find the words in a list, then remove the word and any other trailing words in the column

By : user5891407
Date : March 29 2020, 07:55 AM
wish helps you Use split by all joined values by | for regex OR and select first lists by str[0]:
code :
remove_words = ['stack', 'over', 'flow']

#for more general solution with word boundary
pat = r'\b{}\b'.format('|'.join(remove_words))
df['col'] = df['col'].str.split(pat, n=1).str[0]
print (df)
0  abc test test 
1     cde test12 
2         def123 
3            yup 
