I have a CSV file with some entries and fields, one of them is causing me trouble because when loading it in, it is loaded as "factors". This particular column is a time column in the 24H format and is written like
Time 09:39 14:00
I don't know how to "treat" this particular column, I have other columns such as item no, receipttext, amount, total, cash, card, and itemgroup, since the data is a from a cash register software.
I would like to choose a specif amount of entries from over 20.000 entries, to see at which times of the day do I sell mostly on cash or card, and which items. But I have to know what I should do with the time column, since many of the suggestions are in a format of HH:MM:SS, and mine is only HH:MM. By investigating the CSV file I came to know that the first day starts from the 1 line to the 579th.
Card [1:579] Cash [1:579]
The questions I would like to investigate are these below, I wouldnt mind any advice on how to approach question 3,3.1, and. 3.2
1. A histogram over the amounts (Done) 2. A Histogram for card/cash payments in generat (Done) 3. How does the amount of payments change over time 3.1 How does the payment pattern look from day to day 3.2 What is the "preferred" payment method in a given time
In regards to your 1st question, when I had a similar problem in python the solution was to use the first entry as time t=0 and reference the rest of the entries from there in seconds. This post has code also:
I don't have much experience in time series analysis to answer your other questions however.