dataframe – panda apply function with a set of values of two columns

I have a data frame with two important columns source and target. I would like to detect inverse rows ie : for a pair of values (source, target) if it exist a pair of values (target, source) then assign it to true in a new columns.

cols = ('source', 'target')
_cols = ('target', 'source')
sub_edges = edges(cols)
sub_edges('oneway') = sub_edges.apply(lambda x: True if x(x.isin(x(_cols))) else False, axis=1)

python – Como retirar caracteres orientais (não sei qual ao certo são) do dataframe [closed]

Fala galera. To fazendo um trabalho da graduação PLN e tamo criando um dataframe a partir do QnA maker e surgiu algumas frases em uma lingua oriental (não sei ao certo qual). Tentamos retirar todos caracteres ASCII e UTF-8, mas, mesmo assim eles continuam. Alguém pode dar uma luz?

inserir a descrição da imagem aqui

python – Convert string-encoded list into new dataframe

I have that dataframe where in one column I have the following string-encoded list:
I would like to create new data frame with few new columns from it, for example:
Customer name, Cast_ID, Character.

({'cast_id': 14, 'character': 'Woody (voice)', 'credit_id': '52fe4284c3a36847f8024f95', 'gender': 2, 'id': 31, 'name': 'Tom Hanks', 'order': 0, 'profile_path': '/pQFoyx7rp09CJTAb932F2g8Nlho.jpg'}, {'cast_id': 15, 'character': 'Buzz Lightyear (voice)', 'credit_id': '52fe4284c3a36847f8024f99', 'gender': 2, 'id': 12898, 'name': 'Tim Allen', 'order': 1, 'profile_path': '/uX2xVf6pMmPepxnvFWyBtjexzgY.jpg'}, {'cast_id': 16, 'character': 'Mr. Potato Head (voice)', 'credit_id': '52fe4284c3a36847f8024f9d', 'gender': 2, 'id': 7167, 'name': 'Don Rickles', 'order': 2, 'profile_path': '/h5BcaDMPRVLHLDzbQavec4xfSdt.jpg'}, {'cast_id': 17, 'character': 'Slinky Dog (voice)', 'credit_id': '52fe4284c3a36847f8024fa1', 'gender': 2, 'id': 12899, 'name': 'Jim Varney', 'order': 3, 'profile_path': '/eIo2jVVXYgjDtaHoF19Ll9vtW7h.jpg'}, {'cast_id': 18, 'character': 'Rex (voice)', 'credit_id': '52fe4284c3a36847f8024fa5', 'gender': 2, 'id': 12900, 'name': 'Wallace Shawn', 'order': 4, 'profile_path': '/oGE6JqPP2xH4tNORKNqxbNPYi7u.jpg'}, {'cast_id': 19, 'character': 'Hamm (voice)', 'credit_id': '52fe4284c3a36847f8024fa9', 'gender': 2, 'id': 7907, 'name': 'John Ratzenberger', 'order': 5, 'profile_path': '/yGechiKWL6TJDfVE2KPSJYqdMsY.jpg'}, {'cast_id': 20, 'character': 'Bo Peep (voice)', 'credit_id': '52fe4284c3a36847f8024fad', 'gender': 1, 'id': 8873, 'name': 'Annie Potts', 'order': 6, 'profile_path': '/eryXT84RL41jHSJcMy4kS3u9y6w.jpg'}, {'cast_id': 26, 'character': 'Andy (voice)', 'credit_id': '52fe4284c3a36847f8024fc1', 'gender': 0, 'id': 1116442, 'name': 'John Morris', 'order': 7, 'profile_path': '/vYGyvK4LzeaUCoNSHtsuqJUY15M.jpg'}, {'cast_id': 22, 'character': 'Sid (voice)', 'credit_id': '52fe4284c3a36847f8024fb1', 'gender': 2, 'id': 12901, 'name': 'Erik von Detten', 'order': 8, 'profile_path': '/twnF1ZaJ1FUNUuo6xLXwcxjayBE.jpg'}, {'cast_id': 23, 'character': 'Mrs. Davis (voice)', 'credit_id': '52fe4284c3a36847f8024fb5', 'gender': 1, 'id': 12133, 'name': 'Laurie Metcalf', 'order': 9, 'profile_path': '/unMMIT60eoBM2sN2nyR7EZ2BvvD.jpg'}, {'cast_id': 24, 'character': 'Sergeant (voice)', 'credit_id': '52fe4284c3a36847f8024fb9', 'gender': 2, 'id': 8655, 'name': 'R. Lee Ermey', 'order': 10, 'profile_path': '/r8GBqFBjypLUP9VVqDqfZ7wYbSs.jpg'}, {'cast_id': 25, 'character': 'Hannah (voice)', 'credit_id': '52fe4284c3a36847f8024fbd', 'gender': 1, 'id': 12903, 'name': 'Sarah Freeman', 'order': 11, 'profile_path': None}, {'cast_id': 27, 'character': 'TV Announcer (voice)', 'credit_id': '52fe4284c3a36847f8024fc5', 'gender': 2, 'id': 37221, 'name': 'Penn Jillette', 'order': 12, 'profile_path': '/zmAaXUdx12NRsssgHbk1T31j2x9.jpg'})

I started by splitting:

new_cast = credits('cast').str.split(',')

but have no idea where to move from there.

¡Como Iterar en un DataFrame que cumpla con cierta condición de suma y de agrupamiento en python?

En esta ocasión cuento con el siguiente DataFrame “ascend”, el cual ordeno de manera ascendente por la columna “GEOCOD_SEG”:

    ascend = MZ_0806.sort_values(("GEOCOD_SEG"))
    ascend.head(10)

        ID  COD_MZA NRO_VIV DOMINIO TOTAL_VIV  COD_BAR_2     GEOCOD_SEG
    579 0   Mz-001  V-022     2         22         1       080601001001N00x001
    390 0   Mz-002  V-013     2         13         1       080601001001N00x002
    389 0   Mz-003  V-011     2         11         1       080601001001N00x003
    658 0   Mz-004  V-007     2         7          1       080601001001N00x004
    388 0   Mz-005  V-001     2         1          1       080601001001N00x005
    659 0   Mz-006  V-004     2         4          1       080601001001N00x006
    704 0   Mz-007  V-015     2         15         1       080601001001N00x007
    580 0   Mz-008  V-005     2         5          1       080601001001N00x008
    582 0   Mz-001  V-000     2         3          2       080601001002N00x001
    583 0   Mz-002  V-001     2         1          2       080601001002N00x002

Lo siguiente es crear una nueva columna “SUM_VIV”, en la que se aplicará la suma de los elementos del campo “TOTAL_VIV”, agrupado por el campo “COD_BAR_2”, y con la ayuda de la función transform, obtendré para cada fila del df original el resultado de la suma total de cada grupo.

    ascend('SUM_VIV') = ascend.groupby('COD_BAR_2')('TOTAL_VIV').transform(sum)
    ascend.head(10)

        ID  COD_MZA NRO_VIV TOTAL_VIV   COD_BAR_2     GEOCOD_SEG          SUM_VIV
    579 0   Mz-001  V-022       22         1      080601001001N00x001       78
    390 0   Mz-002  V-013       13         1      080601001001N00x002       78
    389 0   Mz-003  V-011       11         1      080601001001N00x003       78
    658 0   Mz-004  V-007       7          1      080601001001N00x004       78
    388 0   Mz-005  V-001       1          1      080601001001N00x005       78
    659 0   Mz-006  V-004       4          1      080601001001N00x006       78
    704 0   Mz-007  V-015       15         1      080601001001N00x007       78
    580 0   Mz-008  V-005       5          1      080601001001N00x008       78
    582 0   Mz-001  V-000       0          2      080601001002N00x001       72
    583 0   Mz-002  V-001       1          2      080601001002N00x002       72

Ahora bien.!! lo que busco es poder iterar para cada elemento de cada grupo “COD_BAR_2”, sumar los elementos de “TOTAL_VIV” respectivamente, y aplicar una condición de suma <= 50, osea que, habrán grupos que la suma total sera mayor a 50, por lo que necesito sub-dividir estos grupos en base a esta condición de suma. Lo que esperaría obtener seria algo así:

    ascend.head(10)

        ID  COD_MZA NRO_VIV TOTAL_VIV   COD_BAR_2     GEOCOD_SEG          SUM_VIV                  
    579 0   Mz-001  V-022       22         1      080601001001N00x001       46
    390 0   Mz-002  V-013       13         1      080601001001N00x002       46
    389 0   Mz-003  V-011       11         1      080601001001N00x003       46
    658 0   Mz-004  V-007       7          1      080601001001N00x004       32
    388 0   Mz-005  V-001       1          1      080601001001N00x005       32
    659 0   Mz-006  V-004       4          1      080601001001N00x006       32
    704 0   Mz-007  V-015       15         1      080601001001N00x007       32
    580 0   Mz-008  V-005       5          1      080601001001N00x008       32
    582 0   Mz-001  V-000       0          2      080601001002N00x001       35
    583 0   Mz-002  V-001       1          2      080601001002N00x002       35

Que operación o función necesito aplicar al df en cuestión?? Agradeceré mucho sus valiosos aportes, gracias de antemano.!!

Beginner question: Why are the columns on this dataframe not aligned? Date and High are pushed to the left

Why is the column of Date and High pushed to one side? How do I even out the dataframe so High, Low, etc… Adj Close gets pushed by 1 column to the right?

python – Swap columns in dataframe in Pandas doesn’t work

        def columnchange (df_old, a=None, b=None):
             x=df_old(a)
             df_old(a)=df_old(b)
             df_old(b)=x
             return df_old

I am wondering why this column swapping is not working. It makes both the columns of the df_old equal. An explanation for this would be helpful. I am able to swap the columns using column index though. But don’t know why this is not working.

dataframe – Crear un data frame de una lista de listas en Python

Tengo la lista de listas:

   test = (('S0004-06142005000500011-1',
          (('E11.9', 'Diabetes Mellitus tipo II'),
           ('F10.20', 'enolismo'),
           ('F17.210', 'fumador'),
           ('', 'pseudodiverticulosis ureteral'),
           ('C07', 'carcinoma'))),
         ('S0004-06142005000900014-1', (('', 'leiomioma vesical'))),
         ('S0004-06142006000500012-1',
          (('', 'neoplasia oculta'),
           ('I82.90', 'trombosis'),
           ('C78.1', 'metástasis'),
           ('C64.9', 'carcinoma de células renales'))))
       

Y quiero obtener un data frame de esta forma:

    Archivo                     CEI10   Antecedente
0   S0004-06142005000500011-1   E11.9   Diabetes Mellitus tipo II
1   S0004-06142005000500011-1   F10.20  enolismo
2   S0004-06142005000500011-1   F17.210 fumador
3   S0004-06142005000500011-1           pseudodiverticulosis ureteral
4   S0004-06142005000500011-1   C07     carcinoma
5   S0004-06142005000900014-1           leiomioma vesical
6   S0004-06142006000500012-1           neoplasia oculta
7   S0004-06142006000500012-1   I82.90  trombosis
8   S0004-06142006000500012-1   C78.1   metástasis
9   S0004-06142006000500012-1   C64.9   carcinoma de células renales

Si utilizo df = pd.DataFrame(test) no consigo el resultado esperado pues el dataframe queda:

                  0                         1
0   S0004-06142005000500011-1   ((E11.9, Diabetes Mellitus tipo II), (F10.20, enolismo...
1   S0004-06142005000900014-1   ((, leiomioma vesical))
2   S0004-06142006000500012-1   ((, neoplasia oculta), (I82.90, trombosis...

¿Me pueden ayudar a conseguir el data frame esperado?
Gracias.

python 3.x – Does hashing a dataframe in pandas maintain the hash when comparing a new row in another dataframe?

Everyday, I am Extracting and transforming, 10rows of data from 2 api endpoints, iterating through 5 ids. This data is in a pandas dataframe.

I have a table (10k+rows) in a db, that I want to read into pandas dataframe and compare the 10rows i get everyday to the rows in the db.

To do this i have come across this link, and discovered hashing. I would hash the rows in both dataframes and compare. If the hashes in any of the 10rows match the hashes from the larger dataframe then do not append to the final table in our db.

My question is within the link its unclear if I keep appending and reading the table from my db into dataframes everyday, would the hash values change over time or would the integrity of the hash values maintain?

I plan to use this utility: from pandas.util import hash_pandas_object

dataframe – Guardar etiquetas de variables en R para exportarlo a Stata

Tienen alguna idea de como hacer eso, pues yo al exportarlo nunca me salva las etiquetas, solo los numeros

fl = list.files(pattern = “dta”, path = “C:/Users/anton/OneDrive/Escritorio/Eglantina/Data/”,
full.names = TRUE)

datafiles1 = lapply(fl, rio::import_list)

names(datafiles1) = tools::file_path_sans_ext(fl)
str(datafiles1)

#Conver into dataframes

df1 <- plyr::ldply(datafiles1, data.frame)

haven::write_dta(df1, “mydata.dta”)

r – Summarizing a dataframe using a reference column

Consider the following dataframe:

df <- data.frame(waypoint = 1:10, pnt1 = NA, pnt2 = NA, pnt3 = NA)

x <- c("A", "B", "C", "D")

df$pnt1 <- as.factor(sample(x, 10, replace = T))
df$pnt2 <- as.factor(sample(x, 10, replace = T))
df$pnt3 <- as.factor(sample(x, 10, replace = T))

df

I’d like to summarize this dataframe in the following manner, where the values from the “pnt1, pnt2, and pnt3” columns in the original df are summed for each waypoint and placed into new columns “A, B, C, D”. The shell of the result would look like this:

df2 <- data.frame(waypoint = 1:10, A = NA, B = NA, C = NA, D = NA)

How can I code this to produce a result similar to df2 that is filled with the correct values?