python - pandas Read undelimited text file to dataframe -


this question has answer here:

i new pandas. until i've been learning pandas using csv files , excel spreadsheets.

now faced converting text file dataframe. text files call sequential data. format of file is:

state name city name state name city name city name city name ... 

all 50 states plus territories listed number of cities varies. need convert dataframe like

[[state name, city name1],[state name, city name2],...] 

using pandas read_table() method, i've been able @ least read file dataframe, i'm not how correct state name city name format.

i have dictionary of state name/state 2 letter abbreviations available. format of dictionary is

{'oh':'ohio', 'ky':'kentucky',...} 

is there way can use dictionary, loop on file , separate state , city? or there easier way accomplish this?

thank you

edit - sample of text file sample of text file listed below. also, please not unable alter file.

alabama[edit]   auburn (auburn university)[1] florence (university of north alabama)  jacksonville (jacksonville state university)[2] livingston (university of west alabama)[2]  montevallo (university of montevallo)[2]  troy (troy university)[2]  tuscaloosa (university of alabama, stillman college, shelton state)[3][4]   tuskegee (tuskegee university)[5]  alaska[edit]  fairbanks (university of alaska fairbanks)[2]  arizona[edit]  flagstaff (northern arizona university)[6]  tempe (arizona state university)  tucson (university of arizona) 

say columns called a. first find states this:

df.a.str.contains('\[edit\]') out[25]:  0      true 1     false 2     false 3     false 4     false 5     false 6     false 7     false 8     false 9      true 10    false 11     true 12    false 13    false 14    false 

use cumsum define index per state+cities:

csum = df.a.str.contains('\[edit\]').cumsum() csum out[26]:  0     1 1     1 2     1 3     1 4     1 5     1 6     1 7     1 8     1 9     2 10    2 11    3 12    3 13    3 14    3 

now can states , cities:

states = df.groupby(csum).first() states out[38]:                   a                  1  alabama[edit]   2    alaska[edit]  3   arizona[edit]   cities = df.groupby(csum).apply(lambda g: g[1:]) cities out[39]:                                                        a                                                       1 1                       auburn (auburn university)[1]   2             florence (university of north alabama)    3     jacksonville (jacksonville state university)[2]   4         livingston (university of west alabama)[2]    5           montevallo (university of montevallo)[2]    6                          troy (troy university)[2]    7   tuscaloosa (university of alabama, stillman co...   8                  tuskegee (tuskegee university)[5]  2 10     fairbanks (university of alaska fairbanks)[2]  3 12        flagstaff (northern arizona university)[6]    13                  tempe (arizona state university)    14                     tucson (university of arizona) 

now join dataframes:

states.join(cities, rsuffix='_cities') out[49]:                                                                a_cities                                                                        1 1   alabama[edit]                        auburn (auburn university)[1]   2   alabama[edit]              florence (university of north alabama)    3   alabama[edit]      jacksonville (jacksonville state university)[2]   4   alabama[edit]          livingston (university of west alabama)[2]    5   alabama[edit]            montevallo (university of montevallo)[2]    6   alabama[edit]                           troy (troy university)[2]    7   alabama[edit]    tuscaloosa (university of alabama, stillman co...   8   alabama[edit]                   tuskegee (tuskegee university)[5]  2 10    alaska[edit]      fairbanks (university of alaska fairbanks)[2]  3 12   arizona[edit]         flagstaff (northern arizona university)[6]    13   arizona[edit]                   tempe (arizona state university)    14   arizona[edit]                      tucson (university of arizona) 

Comments

Popular posts from this blog

account - Script error login visual studio DefaultLogin_PCore.js -

xcode - CocoaPod Storyboard error: -