Dividing single pandas dataframe into multiple csv files with predefined naming convention

1068 views python

I am facing an issue where I have to load a huge CSV file, split the file into multiple files based on the unique values in the columns and outputting the files to a multiple Csv's with a predefined name pattern.

The example of the original CSV is as below.

date     place  type    product value   zone
09/10/16 NY     Zo      shirt   19       1
09/10/16 NY     Mo      jeans   18       2
09/10/16 CA     Zo      trouser 13       3
09/10/16 CA     Co      tie     17       4
09/10/16 WA     Wo      bat     11       1
09/10/16 FL     Zo      ball    12       2
09/10/16 NC     Mo      belt    13       3
09/10/16 WA     Zo      buckle  15       4
09/10/16 WA     Co      glass   16       1
09/10/16 FL     Zo      cup     19       2

I have to filer this massive pandas dataframe into multiple pandas dataframes based on place, type and zone and the output dataframes should be converted into multiple csv file with the naming convention place_type_product_zone.csv.

The code I have got till now is as below.

def list_of_dataframes(df, col_list):
    df_list = [df]
    name_list = []
    for _, i in enumerate(col_list):
        df_list, names = _split_dataframes(df_list, i)

file_name = zip(name_list, df)
_ = dict(zip(names, df))
for k, v in _:

Print("CSV files created")
return df, file_name

def _split_dataframes(df_list, col):
    names = []
    dfs = []
    for df in df_list:
        for c in df[col].unique():
            dfs.append(df.loc[df[col] == c])
    return dfs, names


It output csv files with the title 1.csv, 2.csv etc. How do I create a loop in the function to get the naming convention as NY_zo_shirt_1.csv, CA_Zo_trouser_3.csv etc. should I be creating a dictionary where it stores all the keys?

Thanks in advance.

answered question

Do you have to create a csv for each unique combination of product, type and place?

yes. I will have to create a seperate csv for every combination using the above naming convention.

1 Answer


Here it is -

# Part 1
places = df['place'].unique()
types = df['type'].unique()
products = df['product'].unique()
zones = df['zone'].unique()

# Part 2
import itertools
combs = list(itertools.product(*[places, types, products, zones]))

#Part 3
for comb in combs:
    place, type_, prod, zone = comb
    df_subset = df[(df['place']==place) & (df['type']==type_) & (df['product']==prod) & (df['zone']==zone)]
    if df_subset.shape[0] > 0:
        df_subset.to_csv('temp1/{}_{}_{}_{}.csv'.format(place, type_, prod, zone), index=False)


enter image description here

Each file contains something similar -

posted this

Have an answer?


Please login first before posting an answer.