how to index or group data of a file which has irregular structure

607 views python
1

I have a file contain the data like following:

10 2 3
8 2
9 1
10 1
9.7 2.1 4
6 10
7 -0.5
8 4
9 1
77 21 3
12 5
13 4
14 9 

and I can use pandas to read:

data = pd.read_table('test.dat',delim_whitespace=True,names=['A','B','C'])

       A     B    C
0   10.0   2.0  3.0
1    8.0   2.0  NaN
2    9.0   1.0  NaN
3   10.0   1.0  NaN
4    9.7   2.1  4.0
5    6.0  10.0  NaN
6    7.0  -0.5  NaN
7    8.0   4.0  NaN
8    9.0   1.0  NaN
9   77.0  21.0  3.0
10  12.0   5.0  NaN
11  13.0   4.0  NaN
12  14.0   9.0  NaN

The rows which have 3 numbers represent location (lon,lat)(e.g 9.7,2.1) and number of observation(e.g. 4 ), and the next 4 rows which have a np.nan data represent observation of this location at some time (1st data is time,2nd data is observation, 3rd is nothing )

if I have a lot of this type of data,and I want to use the location and number of observation to indexing their data,for example, I want to use the data of second location,how can I get it?

If I want to use all of the data one by one, is there any efficient method?

for example , key == location and number of observation, and value == data

answered question

1 Answer

1

First, split into your groups using cumsum and notnull:

s = data.groupby(data.C.notnull().cumsum())

Now use a dictionary comprehension. I used a tuple of the first row per group as the key, then the first two columns of the remaining values as the value.

dct = {tuple(g.iloc[0]): g.iloc[1:, 0:2] for i, g in s}

for k, v in dct.items():
    print(f'{k}\n\n{v}\n')

(10.0, 2.0, 3.0)

      A    B
1   8.0  2.0
2   9.0  1.0
3  10.0  1.0

(9.7, 2.1, 4.0)

     A     B
5  6.0  10.0
6  7.0  -0.5
7  8.0   4.0
8  9.0   1.0

(77.0, 21.0, 3.0)

       A    B
10  12.0  5.0
11  13.0  4.0
12  14.0  9.0

posted this

Have an answer?

JD

Please login first before posting an answer.