Title: | Data Manipulation using Formula |
---|---|
Description: | A tool for manipulating data using the generic formula. A single formula allows to easily add, replace and remove variables before running the analysis. |
Authors: | Alessio Serafini [aut, cre] |
Maintainer: | Alessio Serafini <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.0 |
Built: | 2024-11-01 04:08:58 UTC |
Source: | https://github.com/serafinialessio/dformula |
Add new variables by mutating the input variables using a formula.
add(from, formula, as = NULL, position = c("right", "left"), na.remove = FALSE, logic_convert = TRUE,...)
add(from, formula, as = NULL, position = c("right", "left"), na.remove = FALSE, logic_convert = TRUE,...)
from |
a data.frame object with variables |
formula |
a formula indicating the operation to create new varibles. Look at the detail section for explanantion. |
as |
a character vector with names of new variables. |
position |
if the new varaibles are positioned at the begining ( |
na.remove |
a logical value indicating whether NA values should be removed. |
logic_convert |
logical value indicating if the new logical varaible are convertet to |
... |
further arguments |
The formula is composed of two part:
~ new_variables
the right-hand are the new varaible to add starting from the existing varaibles, using the I()
function.
For example:
~ I(log(column_names1)) + I(column_names2/100)
the column_names1
and log(column_names1)
are added to the data.
If na.remove
is set ti TRUE
, new variables are created, added to the dataset in input and then the observation with missing are removed.
Returns a data.frame object with the original and the new varaibles.
Alessio Serafini
data("airquality") dt <- airquality head(add(from = dt, formula = ~ log(Ozone))) head(add(from = dt, formula = ~ log(Ozone) + log(Wind))) head(add(from = dt, formula = ~ log(Ozone), as = "Ozone_1")) head(add(from = dt, formula = Ozone + Wind ~ log())) head(add(from = dt, formula = ~ log())) head(add(from = dt, formula = .~ log(), position = "left")) head(add(from = dt, formula = .~ log(), na.remove = TRUE)) head(add(from = dt, formula = ~ I((Ozone>5)))) head(add(from = dt, formula = ~ I((Ozone>5)), logic_convert = FALSE )) head(add(from = dt, formula = Ozone + Wind ~ C(Ozone-Ozone))) head(add(from = dt, formula = ~ C(log(Ozone)))) head(add(from = dt, formula = ~ C(5))) head(add(from = dt, formula = Ozone + Wind ~ C(Ozone-Ozone))) head(add(from = dt, formula = Ozone + Wind ~ C(log(Ozone)))) foo <- function(x, a = 100){return(x-x + a)} head(add(from = dt, formula = Ozone + Month~ I(foo(a = 100)))) head(add(from = dt, formula = Ozone + Month~ foo())) head(add(from = dt, formula = ~ I(foo(Ozone, a = 100))))
data("airquality") dt <- airquality head(add(from = dt, formula = ~ log(Ozone))) head(add(from = dt, formula = ~ log(Ozone) + log(Wind))) head(add(from = dt, formula = ~ log(Ozone), as = "Ozone_1")) head(add(from = dt, formula = Ozone + Wind ~ log())) head(add(from = dt, formula = ~ log())) head(add(from = dt, formula = .~ log(), position = "left")) head(add(from = dt, formula = .~ log(), na.remove = TRUE)) head(add(from = dt, formula = ~ I((Ozone>5)))) head(add(from = dt, formula = ~ I((Ozone>5)), logic_convert = FALSE )) head(add(from = dt, formula = Ozone + Wind ~ C(Ozone-Ozone))) head(add(from = dt, formula = ~ C(log(Ozone)))) head(add(from = dt, formula = ~ C(5))) head(add(from = dt, formula = Ozone + Wind ~ C(Ozone-Ozone))) head(add(from = dt, formula = Ozone + Wind ~ C(log(Ozone)))) foo <- function(x, a = 100){return(x-x + a)} head(add(from = dt, formula = Ozone + Month~ I(foo(a = 100)))) head(add(from = dt, formula = Ozone + Month~ foo())) head(add(from = dt, formula = ~ I(foo(Ozone, a = 100))))
World population and countries are
data("population_data")
data("population_data")
A data frame with 159 observations on the following 3 variables.
Country
a character vector with countries names
Population
a numeric vector with population
Area
a numeric vector with area of the counties
data(population_data) str(population_data)
data(population_data) str(population_data)
Selects the row and the varaibles to remove by specifing a condition using a formula.
remove(from, formula = .~., na.remove = FALSE, ...)
remove(from, formula = .~., na.remove = FALSE, ...)
from |
a data.frame object with variables |
formula |
a formula indicating the operation to create new varibles. Look at the detail section for explanantion. |
na.remove |
a logical value indicating whether NA values should be removed. |
... |
further arguments |
The formula is composed of two part:
column_names ~ rows_conditions
the left-hand side are the names of the column to remove, and the right-hand the operation to remove the rows, using the I()
function.
For example:
column_names1 + column_names2 ~ I(column_names1 == "a") + I(column_names2 > 4)
first the row are selected to be removed if the observation in the column_names1
are equal to a
and if the observation in the column_names2
are biggers than 4
, then the column_names1
and column_names2
are removed and the other varaibles are returned.
If na.remove
is set to TRUE
, after the subsetting the observations with missing are removed.
Returns a data.frame object without the selected elements.
Alessio Serafini
data("airquality") dt <- airquality head(remove(from = dt, formula = .~ I(Ozone > 10))) head(remove(from = dt, formula = .~ I(Ozone > 10), na.remove = TRUE)) head(remove(from = dt, formula = Ozone ~ .)) head(remove(from = dt, formula = Ozone~ I(Ozone > 10))) head(remove(from = dt, formula = Ozone + Wind~ I(Ozone > 10))) head(remove(from = dt, formula = Ozone + . ~ I(Ozone > 10))) head(remove(from = dt, formula = Ozone + NULL ~ I(Ozone > 10)))
data("airquality") dt <- airquality head(remove(from = dt, formula = .~ I(Ozone > 10))) head(remove(from = dt, formula = .~ I(Ozone > 10), na.remove = TRUE)) head(remove(from = dt, formula = Ozone ~ .)) head(remove(from = dt, formula = Ozone~ I(Ozone > 10))) head(remove(from = dt, formula = Ozone + Wind~ I(Ozone > 10))) head(remove(from = dt, formula = Ozone + . ~ I(Ozone > 10))) head(remove(from = dt, formula = Ozone + NULL ~ I(Ozone > 10)))
Rename variables using formulas
rename(from, formula, ...)
rename(from, formula, ...)
from |
a data.frame object with variables |
formula |
a formula indicating the operation to create new varibles. Look at the detail section for explanantion. |
... |
further arguments |
The formula is composed of two part:
column_names ~ new_variables_name
the left-hand side select the columns to change the names, and the right-hand the new names of the selected columns
For example:
column_names1 + column_names2 ~ new_variables_name1 + new_variables_name2
the name of the column 1
and the name of the column 2
are changed in new_variables_name1
and new_variables_name2
The original data.frame with changed column names
Alessio Serafini
data("airquality") dt <- airquality head(rename(from = dt, Ozone ~ Ozone1)) head(rename(from = dt, Ozone + Wind ~ Ozone_new + Wind_new))
data("airquality") dt <- airquality head(rename(from = dt, Ozone ~ Ozone1)) head(rename(from = dt, Ozone + Wind ~ Ozone_new + Wind_new))
Selects the row and the varaibles by specifing a condition using a formula.
select(from, formula = .~., as = NULL, na.remove = FALSE, na.return = FALSE,...)
select(from, formula = .~., as = NULL, na.remove = FALSE, na.return = FALSE,...)
from |
a data.frame object with variables |
formula |
a formula indicating the operation to create new varibles. Look at the detail section for explanantion |
as |
a character vector with names of new variables. |
na.remove |
a logical value indicating whether NA values should be removed |
na.return |
a logical value indicating whether only the observation with NA values should be shown |
... |
further arguments |
The formula is composed of two part:
column_names ~ row_conditions
the left-hand side are the names of the column to select, and the right-hand the operations to select the rows, using the I()
function.
For example:
column_names1 + column_names2 ~ I(column_names1 == "a") + I(column_names2 > 4)
first the rows are selected if the observation in the column_names1
are equal to a
and if the observation in the column_names2
are biggers than 4
, then the column_names1
and column_names2
are returned.
If na.remove
is set to TRUE
, after the subsetting the observations with missing are removed.
Returns a data.frame object containing the selected elements.
Alessio Serafini
data("airquality") dt <- airquality ## Selects columns and filter rows select(from = dt, formula = .~ I(Ozone > 10 & Wind > 10)) select(from = dt, formula = Ozone ~ I(Wind > 10)) select(from = dt, formula = Ozone + Wind~ I(Ozone > 10)) ## All rows and filter columns select(from = dt, formula = Ozone ~ .) select(from = dt, formula = Ozone + Wind ~ NULL)
data("airquality") dt <- airquality ## Selects columns and filter rows select(from = dt, formula = .~ I(Ozone > 10 & Wind > 10)) select(from = dt, formula = Ozone ~ I(Wind > 10)) select(from = dt, formula = Ozone + Wind~ I(Ozone > 10)) ## All rows and filter columns select(from = dt, formula = Ozone ~ .) select(from = dt, formula = Ozone + Wind ~ NULL)
Mutate input variables using a formula.
transform(from, formula, as = NULL, na.remove = FALSE, logic_convert = TRUE, ...)
transform(from, formula, as = NULL, na.remove = FALSE, logic_convert = TRUE, ...)
from |
a data.frame object with variables |
formula |
a formula indicating the operation to create new varibles. Look at the detail section for explanantion. |
as |
a character vector with names of new variables. |
na.remove |
a logical value indicating whether NA values should be removed. |
logic_convert |
logical value indicating if the new logical varaible are converted to |
... |
further arguments |
The formula is composed of two part:
column_names ~ trasformed_variables
the left-hand side are the names of the column to transform, and the right-hand the operations applied to the selected columns, using the I()
function.
For example:
column_names1 + column_names2 ~ I(log(column_names1)) + I(column_names2/100)
the column_names1
is mutated in log(column_names1)
and column_names2
is divided by 100.
If na.remove
is set to TRUE
, variables are mutaded, and then the observation with missing are removed.
Returns the original data.frame object with mutaded varaibles.
Alessio Serafini
data("airquality") dt <- airquality head(transform(from = dt, Ozone ~ I(Ozone-Ozone))) head(transform(from = dt, Ozone ~ log(Ozone))) head(transform(from = dt, Ozone ~ I(Ozone>5))) head(transform(from = dt, Ozone ~ I(Ozone>5), logic_convert = TRUE)) head(transform(from = dt, ~ log())) head(transform(from = dt, . ~ log())) head(transform(from = dt, NULL ~ log())) head(transform(from = dt, Ozone + Day ~ log())) head(transform(from = dt, Ozone + Day ~ log(Ozone/100) + exp(Day))) head(transform(from = dt, Ozone ~ log())) head(transform(from = dt,Ozone + Wind ~ C(log(1)))) head(transform(from = dt,Ozone + Wind ~ log(Ozone) + C(10))) head(transform(from = dt, Ozone + Wind~ C(log(Ozone)))) foo <- function(x, a = 100){return(x-x + a)} head(transform(from = dt, Ozone + Wind ~ foo(a = 100))) head(transform(from = dt, . ~ foo(a = 100))) head(transform(from = dt, Ozone + Wind ~ log(log(1))))
data("airquality") dt <- airquality head(transform(from = dt, Ozone ~ I(Ozone-Ozone))) head(transform(from = dt, Ozone ~ log(Ozone))) head(transform(from = dt, Ozone ~ I(Ozone>5))) head(transform(from = dt, Ozone ~ I(Ozone>5), logic_convert = TRUE)) head(transform(from = dt, ~ log())) head(transform(from = dt, . ~ log())) head(transform(from = dt, NULL ~ log())) head(transform(from = dt, Ozone + Day ~ log())) head(transform(from = dt, Ozone + Day ~ log(Ozone/100) + exp(Day))) head(transform(from = dt, Ozone ~ log())) head(transform(from = dt,Ozone + Wind ~ C(log(1)))) head(transform(from = dt,Ozone + Wind ~ log(Ozone) + C(10))) head(transform(from = dt, Ozone + Wind~ C(log(Ozone)))) foo <- function(x, a = 100){return(x-x + a)} head(transform(from = dt, Ozone + Wind ~ foo(a = 100))) head(transform(from = dt, . ~ foo(a = 100))) head(transform(from = dt, Ozone + Wind ~ log(log(1))))