logo
down
shadow

Pandas: How to reference columns of structure: ('Name', n) ('Name', n+1)


Pandas: How to reference columns of structure: ('Name', n) ('Name', n+1)

By : Dineesh A V
Date : October 22 2020, 08:10 AM
I wish did fix the issue. It turns out that the column names were tuples containing a string and an integer resulting from a pivot. The thing that worked was replacing the column names by following a slightly modified version of the last answer in this: How to change the columns name from a tuple to string?
This went from tuple of (str, int) -> 'string'
code :
mydic = dict() 
for i,var in enumerate(df.columns):
    if isinstance(var, tuple): 
        mydic[var] = '{}_{}'.format(var[0], str(var[1]))
df.rename(columns = mydic, inplace=True)

list(df)


Share : facebook icon twitter icon
Structure dataset from rows to columns pandas python

Structure dataset from rows to columns pandas python


By : XiaoLong Yue
Date : March 29 2020, 07:55 AM
it should still fix some issue You are looking for the code to unpack the dataframe. The straightforward way is (with many features and possibly repeating productids):
code :
import pandas as pd
import numpy as np

def expand(frame):
    df = pd.DataFrame()
    for row in frame.iterrows():
        data = row[1]
        for feature_name, feature_value in zip(data[1::2], data[2::2]):
            if feature_name:
                df.loc[data.productid, feature_name] = feature_value
    return df.replace(np.nan, '')


df = pd.DataFrame([("100001", "weight", "130g", None, None, "price", "$140.50"),
("100002", "weight", "200g", "pieces", "12 pcs", "dimensions", "150X75cm"),
("100003", "dimensions", "70X30cm", "price", "$22.90"),
("100004", "price", "$12.90", "manufacturer", "ABC", "calories", "556Kcal"),
("100005", "calories", "1320Kcal", "dimensions", "20X20cm", "manufacturer", "XYZ")],
                  columns=["productid", "feature1", "value1", "feature2", "value2", "feature3", "value3"])

xdf = expand(df)
print(xdf)
       weight    price  pieces dimensions manufacturer  calories
100001   130g  $140.50                                          
100002   200g           12 pcs   150X75cm                       
100003          $22.90            70X30cm                       
100004          $12.90                             ABC   556Kcal
100005                            20X20cm          XYZ  1320Kcal
def expand2(frame):
    return pd.DataFrame.from_dict(
        {data.productid: {f: v for f, v in zip(data[1::2], data[2::2]) if f} for _, data in frame.iterrows()},
        orient='index')
def expand3(frame):
    return pd.DataFrame.from_records(
        ({f: v for f, v in itertools.chain((('productid', data.productid),), zip(data[1::2], data[2::2])) if f}
         for _, data
         in frame.iterrows()), index='productid').replace(np.nan, '')
def timeit(f):
    @functools.wraps(f)
    def timed(*args, **kwargs):
        try:
            start_time = time.time()
            return f(*args, **kwargs)
        finally:
            end_time = time.time()
            function_invocation = "x"
            sys.stdout.flush()
            print(f'Function {f.__name__}({function_invocation}), took: {end_time - start_time:2.4f} seconds.',
                  flush=True, file=sys.stderr)

    return timed

def generate_wide_df(n_rows, n_features):
    possible_labels = [f'label_{i}' for i in range(n_features)]
    columns = ['productid']
    for i in range(1, n_features):
        columns.append(f'feature_{i}')
        columns.append(f'value_{i}')

    df = pd.DataFrame(columns=columns)
    for row_n in range(n_rows):
        df.loc[row_n, 'productid'] = int(1000000 + row_n)
        for _ in range(n_features):
            feature_num = random.randint(1, n_features)
            df.loc[row_n, f'feature_{feature_num}'] = random.choice(possible_labels)
            df.loc[row_n, f'value_{feature_num}'] = random.randint(1, 10000)
    return df.where(df.notnull(), None)


df = generate_wide_df(4000, 30)


expand(df)
expand3(df)
expand2(df)
Function expand(x), took: 1.1576 seconds.
Function expand3(x), took: 1.1185 seconds.
Function expand2(x), took: 16.3055 seconds.
Creating a new columns in pandas based on the structure of the other

Creating a new columns in pandas based on the structure of the other


By : raghav singh
Date : March 29 2020, 07:55 AM
will help you Try groupby.cumcount which essentially gives the row number for each F, and then you can concatenate the row number with a letter such as b:
code :
df['B'] = 'b'+df.groupby('F').cumcount().add(1).astype(str)

df
#    F   M   B
#0  f1  m1  b1
#1  f1  m2  b2
#2  f1  m3  b3
#3  f2  m1  b1
#4  f3  m1  b1
#5  f3  m2  b2
How to convert columns values of a csv file to different format structure in pandas?

How to convert columns values of a csv file to different format structure in pandas?


By : sarcastichippo
Date : March 29 2020, 07:55 AM
I wish this helpful for you First loop by list of all files and create big DataFrame by concat, then reshape by cumcount for counter with unstack:
code :
import glob

files = glob.glob('files/*.csv')
dfs = [pd.read_csv(fp) for fp in files]

df = pd.concat(dfs, ignore_index=True)

df = df.set_index(['Name',df.groupby('Name').cumcount()])['Z'].unstack().reset_index()
Take unstructured df list in pandas and giving the data structure in two columns

Take unstructured df list in pandas and giving the data structure in two columns


By : Ashish Ranade
Date : March 29 2020, 07:55 AM
it should still fix some issue You can make do with str.extract and dropna() followed by drop_duplicates:
code :
pattern = '(?P<Country>[\w\s\.\,]*)\s+\((?P<value>\d+)\)'
(df.stack()
 .str.extract(pattern, expand=True)
 .dropna()
 .drop_duplicates()
)
            Country value
0  0   United States   105
1  1         Alabama     0
   2       Louisiana     2
   3            Ohio     4
2  1          Alaska     0
   2           Maine     0
   3        Oklahoma     0
3  1         Arizona     0
   2        Maryland     2
   3          Oregon     0
4  1        Arkansas     0
   2   Massachusetts     9
   3    Pennsylvania    28
5  1      California     0
How should i structure a function that takes a pandas dataframe and writes to its columns?

How should i structure a function that takes a pandas dataframe and writes to its columns?


By : CSeymour
Date : March 29 2020, 07:55 AM
I hope this helps you . Use Series.map for dictionary and compare with column sales:
code :
df['met_sales'] = df['sales'] >= df['category'].map(required_sales)
print (df)
  category  sales  met_sales
0    fruit    100      False
1    books    200       True
2    fruit    300       True
print (df['category'].map(required_sales))
0    150
1    200
2    150
Name: category, dtype: int64
def met_sales(df, d):
    df['met_sales'] = df['sales'] >= df['category'].map(d)
    return df

df1 = met_sales(df,required_sales)
print (df1)
  category  sales  met_sales
0    fruit    100      False
1    books    200       True
2    fruit    300       True
required_sales = {'fruit':150}

print (df['category'].map(required_sales))
0    150.0
1      NaN
2    150.0
Name: category, dtype: float64
Related Posts Related Posts :
  • Submitting login form with scrapy
  • How do i edit the favicon in the Browsable API in Django REST framework?
  • multiprocessing.Pool.map_async doesn't seem to... do anything at all?
  • Python Selenium: Stale Element Reference Exception Error
  • Datetime conversion - How to extract the inferred format?
  • Import YAML variables automatically?
  • How to create a powershell shortcut for my python file
  • Python's 'set' operator doesn't work with numpy.nan
  • Pass object fields and one2many fields on same method - Odoo v8
  • Select columns based on column name and location in Pandas
  • Standardizing timeseries in Pandas using interpolation
  • How many tweets can be collected?
  • how format specifier taking value while tuple list is passed
  • How to print a numpy array with data type?
  • Timeout child thread for python3
  • How can I regroup a dataframe and accumulate a colume's values?
  • Bulk Insert into SQL Server with Python not working
  • Removing last rows of each group based on condition in a pandas dataframe
  • Why the css file can not be found in Django template?
  • targeting center of mass - scipy / numpy
  • Foursquare - get tips from VENUE_ID
  • Unpack a dictionary to format
  • encoding special characters in python2
  • Replacing integers with NaN results in the entire column becoming float dtype
  • Python 3.6 - BeautifulSoup4, parse table AttributeError: ResultSet object has no attribute 'findAll'
  • Convert panda date list to python list of date strings
  • escape response from Scrapy to parse json
  • How to create a same dropdown menu for different labels?
  • Why are some python variables uppercase whereas others are lowercase?
  • Machine Learning, What are the common techniques for feature engineering and presenting the model?
  • Modify value of a Django form field during clean() and validate again
  • Heroku Django app can't start up -- 'No module named site'
  • Getting list of dates (excluding weekends)
  • Im trying to create the regular expression to include the text and not the href
  • Python file.readline(2) reads first 2 charectars
  • Groupby with handling empty bin in python
  • Modifying Gcode
  • calling a value in a dictionary within a dictionary (reading a json file)
  • Bouncing ball invalid syntax why is that?
  • Python making a counter
  • Python rstrip and split
  • What does the String mean in numpy.r_?
  • How to correctly extend variable __all__ in a __init__.py?
  • Python behaves weird with piped input
  • Python 3 two dimensional list comprehension
  • How to slice image by broadcasting slices? Error: 'only integer scalar arrays can be converted to a scalar index' in pyt
  • (Python Beginner) Need a start on classes
  • IndexError: At least one sheet must be visible
  • How to solve a system of linear equations over the nonnegative integers?
  • Pandas keep the most complete rows
  • "List index out of range" error in Python Memory Match game
  • Numpy: how to use argmax results to get the actual max?
  • Google Cloud Dataflow can't import 'google.cloud.datastore'
  • Calculate pandas DataFrame column by custom routine which accepts dictionary as input
  • Connect to a Class Method by it's method name holded into a var in a for loop in python
  • PyQt5 signals and threading.Timer
  • Replace 2 characters in a string in python
  • Passing command line arguments from a folder script to a file script
  • Understand the syntaxe X[Y == c] in Numpy
  • Optimize beginner python script about substring replacement
  • shadow
    Privacy Policy - Terms - Contact Us © voile276.org