Converting pandas dataframe to structured arrays

Converting pandas dataframe to structured arrays

By : Ahmad Ayub
Date : November 20 2020, 03:01 PM
Hope that helps Melt the DataFrame to make A and B (the column index) into a column. To get rid of the numeric index, make this new column the index. Then call to_records():
code :
import pandas as pd
a = [2.5,3.3]
b = [3.6,3.9]
D = {'A': a, 'B': b}
df = pd.DataFrame(D)
result = (pd.melt(df, var_name='Type', value_name='Value')
rec.array([('A',  2.5), ('A',  3.3), ('B',  3.6), ('B',  3.9)], 
          dtype=[('Type', 'O'), ('Value', '<f8')])
In [167]: df
     A    B
0  2.5  3.6
1  3.3  3.9

In [168]: pd.melt(df)
  variable  value
0        A    2.5
1        A    3.3
2        B    3.6
3        B    3.9
In [169]: pd.melt(df).to_records()
rec.array([(0, 'A',  2.5), (1, 'A',  3.3), (2, 'B',  3.6), (3, 'B',  3.9)], 
          dtype=[('index', '<i8'), ('variable', 'O'), ('value', '<f8')])

Share : facebook icon twitter icon
If I use python pandas, is there any need for structured arrays?

If I use python pandas, is there any need for structured arrays?

By : Pratik Gandhi
Date : March 29 2020, 07:55 AM
To fix the issue you can do pandas's DataFrame is a high level tool while structured arrays are a very low-level tool, enabling you to interpret a binary blob of data as a table-like structure. One thing that is hard to do in pandas is nested data types with the same semantics as structured arrays, though this can be imitated with hierarchical indexing (structured arrays can't do most things you can do with hierarchical indexing).
Structured arrays are also amenable to working with massive tabular data sets loaded via memory maps (np.memmap). This is a limitation that will be addressed in pandas eventually, though.
Pandas semi structured JSON data frame to simple Pandas dataframe

Pandas semi structured JSON data frame to simple Pandas dataframe

By : vjbdn
Date : March 29 2020, 07:55 AM
this will help Taking your input string above as a variable named 'data', this Python+pyparsing code will make some sense of it. Unfortunately, that stuff to the right of the fourth '|' isn't really JSON. Fortunately, it is well enough formatted that it can be parsed without undue discomfort. See the embedded comments in the program below:
code :
from pyparsing import *
from datetime import datetime

# for the most part, we suppress punctuation - it's important at parse time
# but just gets in the way afterwards

# define some scalar value expressions, including parse-time conversion parse actions
realnum = Regex(r'[+-]?\d+\.\d*').setParseAction(lambda t:float(t[0]))
integer = Regex(r'[+-]?\d+').setParseAction(lambda t:int(t[0]))
timestamp = Regex(r'""\d{4}-\d{2}-\d{2}T\d{2}:\d{2}""')
timestamp.setParseAction(lambda t: datetime.strptime(t[0][2:-2],'%Y-%m-%dT%H:%M'))
string_value = QuotedString('""')

# define our base key ':' value expression; use a Forward() placeholder
# for now for value, since these things can be recursive
key = Optional(DBLQ2) + Word(alphas, alphanums+'_') + DBLQ2
value = Forward()
key_value = Group(key + COLON + value)

# objects can be values too - use the Dict class to capture keys as field names
obj = Group(Dict(LBRACE + OneOrMore(key_value) + RBRACE))
objlist = (LBRACK + ZeroOrMore(obj) + RBRACK)

# define expression for previously-declared value, using <<= operator
value <<= timestamp | string_value | realnum | integer | obj | Group(objlist)

# the outermost objects are enclosed in "s, and list of them can be given with '|' delims
quotedObj = DBLQ + obj + DBLQ
obsList = delimitedList(quotedObj, delim='|')
fields = data.split('|',4)
result = obsList.parseString(fields[-1])

# we get back a list of objects, dump them out
for r in result:
    print r.dump()
[['currency', 'EUR'], ['item_id', '143'], ['type', 'FLIGHT'], ['name', 'PAR-FEZ'], ['price', 1111], ['origin', 'PAR'], ['destination', 'FEZ'], ['merchant', 'GOV'], ['flight_type', 'OW'], ['flight_segment', [[['origin', 'ORY'], ['destination', 'FEZ'], ['departure_date_time', datetime.datetime(2015, 8, 2, 7, 20)], ['arrival_date_time', datetime.datetime(2015, 8, 2, 9, 5)], ['carrier', 'AT'], ['f_class', 'ECONOMY']]]]]
- currency: EUR
- destination: FEZ
- flight_segment: 
    [['origin', 'ORY'], ['destination', 'FEZ'], ['departure_date_time', datetime.datetime(2015, 8, 2, 7, 20)], ['arrival_date_time', datetime.datetime(2015, 8, 2, 9, 5)], ['carrier', 'AT'], ['f_class', 'ECONOMY']]
    - arrival_date_time: 2015-08-02 09:05:00
    - carrier: AT
    - departure_date_time: 2015-08-02 07:20:00
    - destination: FEZ
    - f_class: ECONOMY
    - origin: ORY
- flight_type: OW
- item_id: 143
- merchant: GOV
- name: PAR-FEZ
- origin: PAR
- price: 1111
- type: FLIGHT

[['type', 'FLIGHT'], ['name', 'FI_ORY-OUD'], ['item_id', 'FLIGHT'], ['currency', 'EUR'], ['price', 111], ['origin', 'ORY'], ['destination', 'OUD'], ['flight_type', 'OW'], ['flight_segment', [[['origin', 'ORY'], ['destination', 'OUD'], ['departure_date_time', datetime.datetime(2015, 8, 2, 13, 55)], ['arrival_date_time', datetime.datetime(2015, 8, 2, 15, 30)], ['flight_number', 'AT625'], ['carrier', 'AT'], ['f_class', 'ECONOMIC_DISCOUNTED']]]]]
- currency: EUR
- destination: OUD
- flight_segment: 
    [['origin', 'ORY'], ['destination', 'OUD'], ['departure_date_time', datetime.datetime(2015, 8, 2, 13, 55)], ['arrival_date_time', datetime.datetime(2015, 8, 2, 15, 30)], ['flight_number', 'AT625'], ['carrier', 'AT'], ['f_class', 'ECONOMIC_DISCOUNTED']]
    - arrival_date_time: 2015-08-02 15:30:00
    - carrier: AT
    - departure_date_time: 2015-08-02 13:55:00
    - destination: OUD
    - flight_number: AT625
    - origin: ORY
- flight_type: OW
- item_id: FLIGHT
- name: FI_ORY-OUD
- origin: ORY
- price: 111
- type: FLIGHT
len(res[0].flight_segment) # gives how many segments
Converting a Dataframe into a Series with cells containing arrays in Pandas

Converting a Dataframe into a Series with cells containing arrays in Pandas

By : daniela
Date : March 29 2020, 07:55 AM
this one helps. Instantiate a new series using a dict comprehension (this should be faster than an apply based solution).
code :
pd.Series({c : df[c].dropna().unique().tolist() for c in df.columns})

asset             [a]
name     [john, dave]
id          [1, 2, 3]
dtype: object
    {c : df[c].dropna().unique().tolist() for c in df.columns}

  asset          name         id
0   [a]  [john, dave]  [1, 2, 3]
Converting dataframe to structured list

Converting dataframe to structured list

By : Jagadish Hadimani
Date : March 29 2020, 07:55 AM
may help you . In the following .RData file: , You can do
code :
setNames(b$Colour, b$CellType)
#>       Macrophages                DC         Microglia      B cells, pro 
#>         "#E9C825"         "#E77800"         "#E3B60B"         "#EE3900" 
#>                NA       Neutrophils         Monocytes        Mast cells 
#>         "#54A6BA"         "#E39700"         "#7EB8BC"         "#6EB2C2" 
#> Endothelial cells         Basophils           B cells        Stem cells 
#>         "#E7C21C"         "#E1B002"         "#3B9AB2"         "#AEC07B" 
#>           T cells               NKT               ILC               Tgd 
#>         "#E5BC13"         "#61ACBE"         "#96BC9C"         "#C6C55A" 
#>          NK cells  Epithelial cells       Fibroblasts     Stromal cells 
#>         "#EA5800"         "#47A0B6"         "#F21A00"         "#DEC93A" 
#> [1] "character"

#> $names
#>  [1] "Macrophages"       "DC"                "Microglia"         "B cells, pro"     
#>  [5] "NA"                "Neutrophils"       "Monocytes"         "Mast cells"       
#>  [9] "Endothelial cells" "Basophils"         "B cells"           "Stem cells"       
#> [13] "T cells"           "NKT"               "ILC"               "Tgd"              
#> [17] "NK cells"          "Epithelial cells"  "Fibroblasts"       "Stromal cells" 
Which Pandas dataframe is better: super long dataframe VS badly structured one with lists

Which Pandas dataframe is better: super long dataframe VS badly structured one with lists

By : user3720480
Date : March 29 2020, 07:55 AM
Any of those help like you mention, having a list inside a column of a dataframe is a bad structure, and long format dataframe is preferred. Let me attempt to answer the question from several aspects:
Added complexity for data manipulation & lack of native support functions for list-like column
code :

df1['NGRAM'] = df1['NGRAM'].str.capitalize()
# 1000 loops, best of 5: 1.49 ms per loop

df2['NGRAM'] = df2['NGRAM'].explode().str.capitalize().groupby(level=0).apply(list)
# 1000 loops, best of 5: 246 µs per loop
Related Posts Related Posts :
  • Element disappears when I add an {% include %} tag inside my for loop
  • Django Rest Framework with either a slug or a pk lookup field for the DetailAPIView
  • Flask doesn't stream on Lambda
  • Generate all permutations of fixed length where the elements come from two different sets
  • Making function for calculating distance
  • How to handle multiprocessing based on the limit of CPU's
  • Django - static files is not working
  • Remove x axis and y axis black lines with matplotlib
  • tkinter: assigning multiple functions to one button
  • flask-jwt-extended: Fake Authorization Header during testing (pytest)
  • Setting pandas dataframe value based on row and column conditions
  • swig char ** as a pointer to a char *
  • Confusion over `a` and `b` attributes from scipy.stats.uniform
  • How can I do groupy.apply() without sort my index?
  • Querying Google Cloud datastore with ancestor not returning anything
  • Read value from one thread in Python: queue or global variable?
  • Django - context process query being repeated 102 times
  • Convert a list of images and labels to np array to train tensorflow
  • Lambda not supporting NLTK file size
  • Numpy ndarray image pixel mean for pixel values greater than zero: Normalizing image
  • Understanding output of np.corrcoef for two matrices of different sizes
  • Finding longest perfect match between two strings
  • what is wrong with my cosine similarity? Tensorflow
  • How to manage user content in django?
  • Receiving unsupported operand error while comparing random number and user input.
  • How to wrap the process of creating start_urls in scrapy?
  • How to mark 'duplicated sequence' in pandas?
  • Boolean indexing on multidimensionnal array
  • Unmodified column name index in patsy
  • Cleaner way to unpack nested dictionaries
  • Importing a python module to enable a script to be run from command line
  • Maya Python read and set optionMenu value via variable
  • How can I bind a property to another property in Kivy?
  • Python extracting specific line in text file
  • How to implement n-body simulation with pymunk?
  • Python / matplotlib: print to resolution and without white space / borders / margins
  • Sum up the second value from one dictionary with all values from another dictionary
  • Robot Framework: Open a chrome browser without launching URL
  • Generate inline Bokeh scatterplots in Jupyter using a for loop
  • Group list of dictionaries python
  • Efficient way to apply multiple Boolean mask to set values in a column using pandas
  • Lazy evaluation of a Python dictionary
  • id of xpath is getting changed every time in selenium python 2.7 chrome
  • Matplotlib RuntimeWarning displaying a 3D plot
  • Cannot install pyqt5 for python3.4 on windows 10
  • Gravity Problems
  • Where to position `import` modules inside an class?
  • Python OpenCV: Cannot resize image
  • Print on the same spot in IPython console
  • Disable logging except in tests
  • Writing json to file in s3 bucket
  • Sorting numpy array created by laspy
  • Open an XML file through URL and save it
  • How to build a 2-level dictionary?
  • error installing scipy using pip on windows 10
  • __str__ from my own matrix, python
  • python re how to Extract fields use findall()?
  • how to read a value from text HI file using python?
  • How to use horizontal scrolling in treeview,here i use tree view to make a table
  • Dependant widgets in tkinter
  • shadow
    Privacy Policy - Terms - Contact Us © voile276.org