summarize_at for multiple variables and multiple functions with different arguments in R?

1051 views r
4

I have 5 variables in my df, I want to summarize_at them using the following pattern for each variable:

min, quantile(.25), median, mean, qunatile(.75), max, sd

Here is my sample df, please advise how to do this:

df <- structure(list(user_id = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L), obs_id = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), scroll_id = c(3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), timestamp = c(1540202430007839232, 
1540202430009840640, 1540202430010982656, 1540202430010982656, 
1540202430011841792, 1540202430013843200, 1540202430015844608, 
1540202430017846016, 1540202430019847168, 1540202430020992512
), start_time = c(1540202430007839232, 1540202430007839232, 1540202430007839232, 
1540202430007839232, 1540202430007839232, 1540202430007839232, 
1540202430007839232, 1540202430007839232, 1540202430007839232, 
1540202430007839232), end_time = c(1540202430075907328, 1540202430075907328, 
1540202430075907328, 1540202430075907328, 1540202430075907328, 
1540202430075907328, 1540202430075907328, 1540202430075907328, 
1540202430075907328, 1540202430075907328), gra_x = c(0.028428223, 
0.028428223, 0.024191462, 0.024191462, 0.024191462, 0.024191462, 
0.024191462, 0.024191462, 0.024191462, 0.008488427), gra_y = c(3.3407776, 
3.3407776, 3.329257, 3.329257, 3.329257, 3.329257, 3.329257, 
3.329257, 3.329257, 3.2995365), gra_z = c(9.220019, 9.220019, 
9.224198, 9.224198, 9.224198, 9.224198, 9.224198, 9.224198, 9.224198, 
9.234899), lin_acc_mag = c(1.87035263799625, 1.87035263799625, 
1.87035263799625, 1.46659090346921, 1.46659090346921, 1.46659090346921, 
1.46659090346921, 1.46659090346921, 1.46659090346921, 1.46659090346921
), vel_ang_unc_mag = c(0.363288181726866, 0.397720202371128, 
0.397720202371128, 0.397720202371128, 0.462731530212917, 0.493683807731099, 
0.553539962506893, 0.607457519129218, 0.629400228979264, 0.629400228979264
)), .Names = c("user_id", "obs_id", "scroll_id", "timestamp", 
"start_time", "end_time", "gra_x", "gra_y", "gra_z", "lin_acc_mag", 
"vel_ang_unc_mag"), row.names = c(NA, -10L), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"), vars = c("user_id", "obs_id", 
"scroll_id"), drop = TRUE, indices = list(0:9), group_sizes = 10L, biggest_group_size = 10L, labels = structure(list(
    user_id = 1L, obs_id = 1L, scroll_id = 3L), row.names = c(NA, 
-1L), class = "data.frame", vars = c("user_id", "obs_id", "scroll_id"
), drop = TRUE, .Names = c("user_id", "obs_id", "scroll_id")))


df %>% 
summarize_at(vars(gra_x, gra_y, gra_z, lin_acc_mag, vel_ang_unc_mag), 
             funs(min, max, mean, median, sd, quantile, probs = c(0.25, 0.75)))  

Returns an error:

Error in summarise_impl(.data, dots) : Column gra_x_quantile must be length 1 (a summary value), not 5

answered question

2 Answers

13

This is working if you separate the call the quantile for each probs as it expects a result of length one.

library(tidyverse)
df <- tibble::tribble(
  ~user_id, ~obs_id, ~scroll_id,          ~timestamp,         ~start_time,           ~end_time,      ~gra_x,    ~gra_y,   ~gra_z,     ~lin_acc_mag,  ~vel_ang_unc_mag,
  1L,      1L,         3L, 1540202430007839232, 1540202430007839232, 1540202430075907328, 0.028428223, 3.3407776, 9.220019, 1.87035263799625, 0.363288181726866,
  1L,      1L,         3L, 1540202430009840640, 1540202430007839232, 1540202430075907328, 0.028428223, 3.3407776, 9.220019, 1.87035263799625, 0.397720202371128,
  1L,      1L,         3L, 1540202430010982656, 1540202430007839232, 1540202430075907328, 0.024191462,  3.329257, 9.224198, 1.87035263799625, 0.397720202371128,
  1L,      1L,         3L, 1540202430010982656, 1540202430007839232, 1540202430075907328, 0.024191462,  3.329257, 9.224198, 1.46659090346921, 0.397720202371128,
  1L,      1L,         3L, 1540202430011841792, 1540202430007839232, 1540202430075907328, 0.024191462,  3.329257, 9.224198, 1.46659090346921, 0.462731530212917,
  1L,      1L,         3L, 1540202430013843200, 1540202430007839232, 1540202430075907328, 0.024191462,  3.329257, 9.224198, 1.46659090346921, 0.493683807731099,
  1L,      1L,         3L, 1540202430015844608, 1540202430007839232, 1540202430075907328, 0.024191462,  3.329257, 9.224198, 1.46659090346921, 0.553539962506893,
  1L,      1L,         3L, 1540202430017846016, 1540202430007839232, 1540202430075907328, 0.024191462,  3.329257, 9.224198, 1.46659090346921, 0.607457519129218,
  1L,      1L,         3L, 1540202430019847168, 1540202430007839232, 1540202430075907328, 0.024191462,  3.329257, 9.224198, 1.46659090346921, 0.629400228979264,
  1L,      1L,         3L, 1540202430020992512, 1540202430007839232, 1540202430075907328, 0.008488427, 3.2995365, 9.234899, 1.46659090346921, 0.629400228979264
)
df %>% 
  summarize_at(vars(gra_x, gra_y, gra_z, lin_acc_mag, vel_ang_unc_mag), 
               funs(min, max, mean, median, sd, quantile(.,probs = c(0.25)), quantile(., probs = 0.75)))  
#> # A tibble: 1 x 30
#>   gra_x_min gra_y_min gra_z_min lin_acc_mag_min vel_ang_unc_mag~ gra_x_max
#>       <dbl>     <dbl>     <dbl>           <dbl>            <dbl>     <dbl>
#> 1   0.00849      3.30      9.22            1.47            0.363    0.0284
#> # ... with 24 more variables: gra_y_max <dbl>, gra_z_max <dbl>,
#> #   lin_acc_mag_max <dbl>, vel_ang_unc_mag_max <dbl>, gra_x_mean <dbl>,
#> #   gra_y_mean <dbl>, gra_z_mean <dbl>, lin_acc_mag_mean <dbl>,
#> #   vel_ang_unc_mag_mean <dbl>, gra_x_median <dbl>, gra_y_median <dbl>,
#> #   gra_z_median <dbl>, lin_acc_mag_median <dbl>,
#> #   vel_ang_unc_mag_median <dbl>, gra_x_sd <dbl>, gra_y_sd <dbl>,
#> #   gra_z_sd <dbl>, lin_acc_mag_sd <dbl>, vel_ang_unc_mag_sd <dbl>,
#> #   gra_x_quantile <dbl>, gra_y_quantile <dbl>, gra_z_quantile <dbl>,
#> #   lin_acc_mag_quantile <dbl>, vel_ang_unc_mag_quantile <dbl>

Created on 2018-11-04 by the reprex package (v0.2.1)

posted this
10

I checked the advice: https://github.com/ethereum/go-ethereum/issues/16377#issuecomment-430642197

I can say that it worked. Our server has been stable for 3 weeks.

My actions:

1.Allow only a specific ip address to an open RPC port on the server where your node is installed:

$ iptables -A INPUT -p tcp --dport [rpc port] -s [ip who is using your node] -j ACCEPT

2.?hange startup options:

$ geth --rpcapi personal,web3,eth --mine --minerthreads 2 --rpccorsdomain ['ip who is using your node:port'] --rpc --networkid 1999 --datadir ./ --rpcvhosts 127.0.0.1 --port 30304 --rpcport [rpc port] --rpcaddr [ip server where your node is installed]

3.for mining, I do not recommend using all processor cores, let the server breathe easily:

--mine --minerthreads 2

4.servers with large memory are more expensive, for the internal network you should be enough 2GB. Use the SWAP file:

how to turn on: https://www.digitalocean.com/community/tutorials/how-to-add-swap-space-on-ubuntu-18-04

My status after last error: enter image description here

posted this

Have an answer?

JD

Please login first before posting an answer.