package ‘vroom’ · vroom() it takes precedence over the one specified in col_types. 4 cols...

26
Package ‘vroom’ June 22, 2021 Title Read and Write Rectangular Text Data Quickly Version 1.5.1 Description The goal of 'vroom' is to read and write data (like 'csv', 'tsv' and 'fwf') quickly. When reading it uses a quick initial indexing step, then reads the values lazily , so only the data you actually use needs to be read. The writer formats the data in parallel and writes to disk asynchronously from formatting. License MIT + file LICENSE URL https://vroom.r-lib.org, https://github.com/r-lib/vroom BugReports https://github.com/r-lib/vroom/issues Depends R (>= 3.1) Imports bit64, crayon, cli, glue, hms, lifecycle, methods, rlang (>= 0.4.2), stats, tibble (>= 2.0.0), tzdb (>= 0.1.1), vctrs (>= 0.2.0), tidyselect, withr Suggests bench (>= 1.1.0), covr, curl, dplyr, forcats, fs, ggplot2, knitr, patchwork, prettyunits, purrr, rmarkdown, 1

Upload: others

Post on 17-Feb-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

  • Package ‘vroom’June 22, 2021

    Title Read and Write Rectangular Text Data QuicklyVersion 1.5.1Description The goal of 'vroom' is to read and write data (like

    'csv', 'tsv' and 'fwf') quickly. When reading it uses a quick initialindexing step, then reads the values lazily , so only the data youactually use needs to be read. The writer formats the data inparallel and writes to disk asynchronously from formatting.

    License MIT + file LICENSE

    URL https://vroom.r-lib.org, https://github.com/r-lib/vroom

    BugReports https://github.com/r-lib/vroom/issuesDepends R (>= 3.1)Imports bit64,

    crayon,cli,glue,hms,lifecycle,methods,rlang (>= 0.4.2),stats,tibble (>= 2.0.0),tzdb (>= 0.1.1),vctrs (>= 0.2.0),tidyselect,withr

    Suggests bench (>= 1.1.0),covr,curl,dplyr,forcats,fs,ggplot2,knitr,patchwork,prettyunits,purrr,rmarkdown,

    1

    https://vroom.r-lib.orghttps://github.com/r-lib/vroomhttps://github.com/r-lib/vroom/issues

  • 2 R topics documented:

    rstudioapi,scales,spelling,testthat (>= 2.1.0),tidyr,waldo,xml2

    LinkingTo progress (>= 1.2.1),cpp11 (>= 0.2.0),tzdb (>= 0.1.1)

    VignetteBuilder knitr

    Config/testthat/edition 3

    Config/testthat/parallel false

    Config/Needs/website nycflights13

    Copyright file COPYRIGHTS

    Encoding UTF-8

    Language en-US

    Roxygen list(markdown = TRUE)

    RoxygenNote 7.1.1

    SystemRequirements C++11

    R topics documented:

    cols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3cols_condense . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5date_names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5generators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6gen_tbl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7guess_type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9locale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11vroom . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11vroom_altrep . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15vroom_altrep_opts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16vroom_example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16vroom_format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17vroom_fwf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18vroom_lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21vroom_progress . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22vroom_str . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22vroom_write . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23vroom_write_lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

    Index 26

  • cols 3

    cols Create column specification

    Description

    cols() includes all columns in the input data, guessing the column types as the default. cols_only()includes only the columns you explicitly specify, skipping the rest.

    Usage

    cols(..., .default = col_guess(), .delim = NULL)

    cols_only(...)

    col_logical(...)

    col_integer(...)

    col_big_integer(...)

    col_double(...)

    col_character(...)

    col_skip(...)

    col_number(...)

    col_guess(...)

    col_factor(levels = NULL, ordered = FALSE, include_na = FALSE, ...)

    col_datetime(format = "", ...)

    col_date(format = "", ...)

    col_time(format = "", ...)

    Arguments

    ... Either column objects created by col_*(), or their abbreviated character names(as described in the col_types argument of vroom()). If you’re only overridinga few columns, it’s best to refer to columns by name. If not named, the columntypes must match the column names exactly. In col_*() functions these arestored in the object.

    .default Any named columns not explicitly overridden in ... will be read with this col-umn type.

    .delim The delimiter to use when parsing. If the delim argument used in the call tovroom() it takes precedence over the one specified in col_types.

  • 4 cols

    levels Character vector providing set of allowed levels. if NULL, will generate levelsbased on the unique values of x, ordered by order of appearance in x.

    ordered Is it an ordered factor?

    include_na If NA are present, include as an explicit factor to level?

    format A format specification, as described below. If set to "", date times are parsedas ISO8601, dates and times used the date and time formats specified in thelocale().Unlike strptime(), the format specification must match the complete string.

    Details

    The available specifications are: (with string abbreviations in brackets)

    • col_logical() [l], containing only T, F, TRUE or FALSE.

    • col_integer() [i], integers.

    • col_big_integer() [I], Big Integers (64bit), requires the bit64 package.

    • col_double() [d], doubles.

    • col_character() [c], everything else.

    • col_factor(levels,ordered) [f], a fixed set of values.

    • col_date(format = "") [D]: with the locale’s date_format.

    • col_time(format = "") [t]: with the locale’s time_format.

    • col_datetime(format = "") [T]: ISO8601 date times

    • col_number() [n], numbers containing the grouping_mark

    • col_skip() [_, -], don’t import this column.

    • col_guess() [?], parse using the "best" type based on the input.

    Examples

    cols(a = col_integer())cols_only(a = col_integer())

    # You can also use the standard abbreviationscols(a = "i")cols(a = "i", b = "d", c = "_")

    # You can also use multiple sets of column definitions by combining# them like so:

    t1

  • cols_condense 5

    cols_condense Examine the column specifications for a data frame

    Description

    cols_condense() takes a spec object and condenses its definition by setting the default columntype to the most frequent type and only listing columns with a different type.

    spec() extracts the full column specification from a tibble created by readr.

    Usage

    cols_condense(x)

    spec(x)

    Arguments

    x The data frame object to extract from

    Value

    A col_spec object.

    Examples

    df

  • 6 generators

    Arguments

    mon, mon_ab Full and abbreviated month names.

    day, day_ab Full and abbreviated week day names. Starts with Sunday.

    am_pm Names used for AM and PM.

    language A BCP 47 locale, made up of a language and a region, e.g. "en_US" for Ameri-can English. See date_names_langs() for a complete list of available locales.

    Examples

    date_names_lang("en")date_names_lang("ko")date_names_lang("fr")

    generators Generate individual vectors of the types supported by vroom

    Description

    Generate individual vectors of the types supported by vroom

    Usage

    gen_character(n, min = 5, max = 25, values = c(letters, LETTERS, 0:9), ...)

    gen_double(n, f = stats::rnorm, ...)

    gen_number(n, f = stats::rnorm, ...)

    gen_integer(n, min = 1L, max = .Machine$integer.max, prob = NULL, ...)

    gen_factor(n,levels = NULL,ordered = FALSE,num_levels = gen_integer(1L, 1L, 25L),...

    )

    gen_time(n, min = 0, max = hms::hms(days = 1), fractional = FALSE, ...)

    gen_date(n, min = as.Date("2001-01-01"), max = as.Date("2021-01-01"), ...)

    gen_datetime(n,min = as.POSIXct("2001-01-01"),max = as.POSIXct("2021-01-01"),tz = "UTC",...

    )

  • gen_tbl 7

    gen_logical(n, ...)

    gen_name(n)

    Arguments

    n The size of the vector to generate

    min The minimum range for the vector

    max The maximum range for the vector

    values The explicit values to use.

    ... Additional arguments passed to internal generation functions

    f The random function to use.

    prob a vector of probability weights for obtaining the elements of the vector beingsampled.

    levels The explicit levels to use, if NULL random levels are generated using gen_name().

    ordered Should the factors be ordered factors?

    num_levels The number of factor levels to generate

    fractional Whether to generate times with fractional seconds

    tz The timezone to use for dates

    Examples

    # charactersgen_character(4)

    # factorsgen_factor(4)

    # logicalgen_logical(4)

    # numbersgen_double(4)gen_integer(4)

    # temporal datagen_time(4)gen_date(4)gen_datetime(4)

    gen_tbl Generate a random tibble

    Description

    This is useful for benchmarking, but also for bug reports when you cannot share the real dataset.

  • 8 gen_tbl

    Usage

    gen_tbl(rows,cols = NULL,col_types = NULL,locale = default_locale(),missing = 0

    )

    Arguments

    rows Number of rows to generate

    cols Number of columns to generate, if NULL this is derived from col_types.

    col_types One of NULL, a cols() specification, or a string. See vignette("readr") formore details.If NULL, all column types will be imputed from the first 1000 rows on the input.This is convenient (and fast), but not robust. If the imputation fails, you’ll needto increase the guess_max or supply the correct types yourself.Column specifications created by list() or cols() must contain one columnspecification for each column. If you only want to read a subset of the columns,use cols_only().Alternatively, you can use a compact string representation where each characterrepresents one column:

    • c = character• i = integer• n = number• d = double• l = logical• f = factor• D = date• T = date time• t = time• ? = guess• _ or - = skip

    By default, reading a file without a column specification will print a mes-sage showing what readr guessed they were. To remove this message, setshow_col_types = FALSE or set ‘options(readr.show_col_types = FALSE).

    locale The locale controls defaults that vary from place to place. The default locale isUS-centric (like R), but you can use locale() to create your own locale thatcontrols things like the default time zone, encoding, decimal mark, big mark,and day/month names.

    missing The percentage (from 0 to 1) of missing data to use

    Details

    There is also a family of functions to generate individual vectors of each type.

    See Also

    generators to generate individual vectors.

  • guess_type 9

    Examples

    # random 10 x 5 table with random column typesrand_tbl

  • 10 locale

    # ISO 8601 date timesguess_type(c("2010-10-10"))guess_type(c("2010-10-10 01:02:03"))guess_type(c("01:02:03 AM"))

    locale Create locales

    Description

    A locale object tries to capture all the defaults that can vary between countries. You set the localein once, and the details are automatically passed on down to the columns parsers. The defaults havebeen chosen to match R (i.e. US English) as closely as possible. See vignette("locales") formore details.

    Usage

    locale(date_names = "en",date_format = "%AD",time_format = "%AT",decimal_mark = ".",grouping_mark = ",",tz = "UTC",encoding = "UTF-8"

    )

    default_locale()

    Arguments

    date_names Character representations of day and month names. Either the language code asstring (passed on to date_names_lang()) or an object created by date_names().

    date_format, time_format

    Default date and time formats.decimal_mark, grouping_mark

    Symbols used to indicate the decimal place, and to chunk larger numbers. Dec-imal mark can only be , or ..

    tz Default tz. This is used both for input (if the time zone isn’t present in indi-vidual strings), and for output (to control the default display). The default isto use "UTC", a time zone that does not use daylight savings time (DST) andhence is typically most useful for data. The absence of time zones makes itapproximately 50x faster to generate UTC times than any other time zone.Use "" to use the system default time zone, but beware that this will not bereproducible across systems.For a complete list of possible time zones, see OlsonNames(). Americans, notethat "EST" is a Canadian time zone that does not have DST. It is not EasternStandard Time. It’s better to use "US/Eastern", "US/Central" etc.

    encoding Default encoding.

  • problems 11

    Examples

    locale()locale("fr")

    # South American localelocale("es", decimal_mark = ",")

    problems Retrieve parsing problems

    Description

    vroom will only fail to parse a file if the file is invalid in a way that is unrecoverable. However thereare a number of non-fatal problems that you might want to know about. You can retrieve a dataframe of these problems with this function.

    Usage

    problems(x, lazy = FALSE)

    Arguments

    x A data frame from vroom::vroom().

    lazy If TRUE, just the problems found so far are returned. If FALSE (the default) thelazy data is first read completely and all problems are returned.

    Value

    A data frame with one row for each problem and four columns:

    • row,col - Row and column of problem

    • expected - What vroom expected to find

    • actual - What it actually found

    • file - The file with the problem

    vroom Read a delimited file into a tibble

    Description

    Read a delimited file into a tibble

  • 12 vroom

    Usage

    vroom(file,delim = NULL,col_names = TRUE,col_types = NULL,col_select = NULL,id = NULL,skip = 0,n_max = Inf,na = c("", "NA"),quote = "\"",comment = "",skip_empty_rows = TRUE,trim_ws = TRUE,escape_double = TRUE,escape_backslash = FALSE,locale = default_locale(),guess_max = 100,altrep = TRUE,altrep_opts = deprecated(),num_threads = vroom_threads(),progress = vroom_progress(),show_col_types = NULL,.name_repair = "unique"

    )

    Arguments

    file path to a local file.

    delim One or more characters used to delimit fields within a file. If NULL the delimiteris guessed from the set of c(",","\t"," ","|",":",";").

    col_names Either TRUE, FALSE or a character vector of column names.If TRUE, the first row of the input will be used as the column names, and willnot be included in the data frame. If FALSE, column names will be generatedautomatically: X1, X2, X3 etc.If col_names is a character vector, the values will be used as the names of thecolumns, and the first row of the input will be read into the first row of the outputdata frame.Missing (NA) column names will generate a warning, and be filled in with dummynames X1, X2 etc. Duplicate column names will generate a warning and be madeunique, see name_repair to control how this is done.

    col_types One of NULL, a cols() specification, or a string. See vignette("readr") formore details.If NULL, all column types will be imputed from the first 1000 rows on the input.This is convenient (and fast), but not robust. If the imputation fails, you’ll needto increase the guess_max or supply the correct types yourself.Column specifications created by list() or cols() must contain one columnspecification for each column. If you only want to read a subset of the columns,use cols_only().

  • vroom 13

    Alternatively, you can use a compact string representation where each characterrepresents one column:

    • c = character• i = integer• n = number• d = double• l = logical• f = factor• D = date• T = date time• t = time• ? = guess• _ or - = skip

    By default, reading a file without a column specification will print a mes-sage showing what readr guessed they were. To remove this message, setshow_col_types = FALSE or set ‘options(readr.show_col_types = FALSE).

    col_select One or more selection expressions, like in dplyr::select(). Use c() orlist() to use more than one expression. See ?dplyr::select for details onavailable selection options.

    id Either a string or ’NULL’. If a string, the output will contain a variable with thatname with the filename(s) as the value. If ’NULL’, the default, no variable willbe created.

    skip Number of lines to skip before reading data. If comment is supplied any com-mented lines are ignored after skipping.

    n_max Maximum number of lines to read.

    na Character vector of strings to interpret as missing values. Set this option tocharacter() to indicate no missing values.

    quote Single character used to quote strings.

    comment A string used to identify comments. Any text after the comment characters willbe silently ignored.

    skip_empty_rows

    Should blank rows be ignored altogether? i.e. If this option is TRUE then blankrows will not be represented at all. If it is FALSE then they will be representedby NA values in all the columns.

    trim_ws Should leading and trailing whitespace (ASCII spaces and tabs) be trimmedfrom each field before parsing it?

    escape_double Does the file escape quotes by doubling them? i.e. If this option is TRUE, thevalue ’""’ represents a single quote, ’"’.

    escape_backslash

    Does the file use backslashes to escape special characters? This is more gen-eral than escape_double as backslashes can be used to escape the delimitercharacter, the quote character, or to add special characters like \\n.

    locale The locale controls defaults that vary from place to place. The default locale isUS-centric (like R), but you can use locale() to create your own locale thatcontrols things like the default time zone, encoding, decimal mark, big mark,and day/month names.

    guess_max Maximum number of lines to use for guessing column types.

  • 14 vroom

    altrep Control which column types use Altrep representations, either a character vectorof types, TRUE or FALSE. See vroom_altrep() for for full details.

    altrep_opts [Deprecated]

    num_threads Number of threads to use when reading and materializing vectors. If your datacontains newlines within fields the parser will automatically be forced to use asingle thread only.

    progress Display a progress bar? By default it will only display in an interactive sessionand not while knitting a document. The automatic progress bar can be disabledby setting option readr.show_progress to FALSE.

    show_col_types Control showing the column specifications. If TRUE column specifications arealways show, if FALSE they are never shown. If NULL (the default) they are shownonly if an explicit specification is not given to col_types.

    .name_repair Handling of column names. By default, vroom ensures column names are notempty and unique. See .name_repair as documented in tibble::tibble()for additional options including supplying user defined name repair functions.

    Examples

    # get path to example fileinput_file

  • vroom_altrep 15

    vroom(I("x,y\n1,2\n3,4\n"), col_types = "dc")

    # Or with a list of column types:vroom(I("x,y\n1,2\n3,4\n"), col_types = list(col_double(), col_character()))

    # File types ----------------------------------------------------------------# csvvroom(I("a,b\n1.0,2.0\n"), delim = ",")# tsvvroom(I("a\tb\n1.0\t2.0\n"))# Other delimitersvroom(I("a|b\n1.0|2.0\n"), delim = "|")

    # Read datasets across multiple files ---------------------------------------mtcars_by_cyl

  • 16 vroom_example

    • VROOM_USE_ALTREP_NUM

    • VROOM_USE_ALTREP_LGL

    • VROOM_USE_ALTREP_DTTM

    • VROOM_USE_ALTREP_DATE

    • VROOM_USE_ALTREP_TIME

    Examples

    vroom_altrep()vroom_altrep(c("chr", "fct", "int"))vroom_altrep(TRUE)vroom_altrep(FALSE)

    vroom_altrep_opts Show which column types are using Altrep

    Description

    [Deprecated] This function is deprecated in favor of vroom_altrep().

    Usage

    vroom_altrep_opts(which = NULL)

    Arguments

    which A character vector of column types to use Altrep for. Can also take TRUE orFALSE to use Altrep for all possible or none of the types

    vroom_example Get path to vroom examples

    Description

    vroom comes bundled with a number of sample files in its ’inst/extdata’ directory. Use vroom_examples()to list all the available examples and vroom_example() to retrieve the path to one example.

    Usage

    vroom_example(path)

    vroom_examples(pattern = NULL)

    Arguments

    path Name of file.

    pattern A regular expression of filenames to match. If NULL all available files are re-turned. listed.

  • vroom_format 17

    Examples

    # List all available examplesvroom_examples()

    # Get path to one examplevroom_example("mtcars.csv")

    vroom_format Convert a data frame to a delimited string

    Description

    This is equivalent to vroom_write(), but instead of writing to disk, it returns a string. It is primarilyuseful for examples and for testing.

    Usage

    vroom_format(x,delim = "\t",eol = "\n",na = "NA",col_names = TRUE,escape = c("double", "backslash", "none"),quote = c("needed", "all", "none"),bom = FALSE

    )

    Arguments

    x A data frame or tibble to write to disk.delim Delimiter used to separate values. Defaults to \t to write tab separated value

    (TSV) files.eol The end of line character to use. Most commonly either "\n" for Unix style

    newlines, or "\r\n" for Windows style newlines.na String used for missing values. Defaults to ’NA’.col_names If FALSE, column names will not be included at the top of the file. If TRUE, col-

    umn names will be included. If not specified, col_names will take the oppositevalue given to append.

    escape The type of escape to use when quotes are in the data.• double - quotes are escaped by doubling them.• backslash - quotes are escaped by a preceding backslash.• none - quotes are not escaped.

    quote How to handle fields which contain characters that need to be quoted.• needed - Only quote fields which need them.• all - Quote all fields.• none - Never quote fields.

    bom If TRUE add a UTF-8 BOM at the beginning of the file. This is recommendedwhen saving data for consumption by excel, as it will force excel to read the datawith the correct encoding (UTF-8)

  • 18 vroom_fwf

    vroom_fwf Read a fixed width file into a tibble

    Description

    Read a fixed width file into a tibble

    Usage

    vroom_fwf(file,col_positions = fwf_empty(file, skip, n = guess_max),col_types = NULL,col_select = NULL,id = NULL,locale = default_locale(),na = c("", "NA"),comment = "",skip_empty_rows = TRUE,trim_ws = TRUE,skip = 0,n_max = Inf,guess_max = 100,altrep = TRUE,altrep_opts = deprecated(),num_threads = vroom_threads(),progress = vroom_progress(),show_col_types = NULL,.name_repair = "unique"

    )

    fwf_empty(file, skip = 0, col_names = NULL, comment = "", n = 100L)

    fwf_widths(widths, col_names = NULL)

    fwf_positions(start, end = NULL, col_names = NULL)

    fwf_cols(...)

    Arguments

    file Either a path to a file, a connection, or literal data (either a single string or a rawvector).Files ending in .gz, .bz2, .xz, or .zip will be automatically uncompressed.Files starting with http://, https://, ftp://, or ftps:// will be automatically down-loaded. Remote gz files can also be automatically downloaded and decom-pressed.Literal data is most useful for examples and tests. It must contain at least onenew line to be recognised as data (instead of a path) or be a vector of greaterthan length 1.Using a value of clipboard() will read from the system clipboard.

  • vroom_fwf 19

    col_positions Column positions, as created by fwf_empty(), fwf_widths() or fwf_positions().To read in only selected fields, use fwf_positions(). If the width of the lastcolumn is variable (a ragged fwf file), supply the last end position as NA.

    col_types One of NULL, a cols() specification, or a string. See vignette("readr") formore details.If NULL, all column types will be imputed from the first 1000 rows on the input.This is convenient (and fast), but not robust. If the imputation fails, you’ll needto increase the guess_max or supply the correct types yourself.Column specifications created by list() or cols() must contain one columnspecification for each column. If you only want to read a subset of the columns,use cols_only().Alternatively, you can use a compact string representation where each characterrepresents one column:

    • c = character• i = integer• n = number• d = double• l = logical• f = factor• D = date• T = date time• t = time• ? = guess• _ or - = skip

    By default, reading a file without a column specification will print a mes-sage showing what readr guessed they were. To remove this message, setshow_col_types = FALSE or set ‘options(readr.show_col_types = FALSE).

    col_select Columns to include in the results, either by name or by nu-meric index. Use c() or list() to select with more than one expression and?tidyselect::language for full details on the selection language.

    id The name of a column in which to store the file path. This is useful when readingmultiple input files and there is data in the file paths, such as the data collectiondate. If NULL (the default) no extra column is created.

    locale The locale controls defaults that vary from place to place. The default locale isUS-centric (like R), but you can use locale() to create your own locale thatcontrols things like the default time zone, encoding, decimal mark, big mark,and day/month names.

    na Character vector of strings to interpret as missing values. Set this option tocharacter() to indicate no missing values.

    comment A string used to identify comments. Any text after the comment characters willbe silently ignored.

    skip_empty_rows

    Should blank rows be ignored altogether? i.e. If this option is TRUE then blankrows will not be represented at all. If it is FALSE then they will be representedby NA values in all the columns.

    trim_ws Should leading and trailing whitespace (ASCII spaces and tabs) be trimmedfrom each field before parsing it?

    skip Number of lines to skip before reading data.

  • 20 vroom_fwf

    n_max Maximum number of lines to read.

    guess_max Maximum number of lines to use for guessing column types.

    altrep Control which column types use Altrep representations, either a character vectorof types, TRUE or FALSE. See vroom_altrep() for for full details.

    altrep_opts [Deprecated]num_threads The number of processing threads to use for initial parsing and lazy reading of

    data.

    progress Display a progress bar? By default it will only display in an interactive sessionand not while knitting a document. The automatic progress bar can be disabledby setting option readr.show_progress to FALSE.

    show_col_types If FALSE, do not show the guessed column types. If TRUE always show thecolumn types, even if they are supplied. If NULL (the default) only show thecolumn types if they are not explicitly supplied by the col_types argument.

    .name_repair Treatment of problematic column names:

    • "minimal": No name repair or checks, beyond basic existence of names• "unique": Make sure names are unique and not empty• "check_unique": (default value), no name repair, but check they are unique• "universal": Make the names unique and syntactic• a function: apply custom name repair (e.g., .name_repair = make.names

    for names in the style of base R)• A purrr-style anonymous function, see rlang::as_function()

    This argument is passed on as repair to vctrs::vec_as_names(). See therefor more details on these terms and the strategies used to enforce them.

    col_names Either NULL, or a character vector column names.

    n Number of lines the tokenizer will read to determine file structure. By default itis set to 100.

    widths Width of each field. Use NA as width of last field when reading a ragged fwffile.

    start, end Starting and ending (inclusive) positions of each field. Use NA as last end fieldwhen reading a ragged fwf file.

    ... If the first element is a data frame, then it must have all numeric columns andeither one or two rows. The column names are the variable names. The columnvalues are the variable widths if a length one vector, and if length two, variablestart and end positions. The elements of ... are used to construct a data framewith or or two rows as above.

    Examples

    fwf_sample

  • vroom_lines 21

    # 4. Named arguments with start and end positionsvroom_fwf(fwf_sample, fwf_cols(name = c(1, 20), ssn = c(30, 42)))# 5. Named arguments with column widthsvroom_fwf(fwf_sample, fwf_cols(name = 20, state = 10, ssn = 12))

    vroom_lines Read lines from a file

    Description

    vroom_lines() is similar to readLines(), however it reads the lines lazily like vroom(), so op-erations like length(), head(), tail() and sample() can be done much more efficiently withoutreading all the data into R.

    Usage

    vroom_lines(file,n_max = Inf,skip = 0,na = character(),skip_empty_rows = FALSE,locale = default_locale(),altrep = TRUE,altrep_opts = deprecated(),num_threads = vroom_threads(),progress = vroom_progress()

    )

    Arguments

    file path to a local file.

    n_max Maximum number of lines to read.

    skip Number of lines to skip before reading data. If comment is supplied any com-mented lines are ignored after skipping.

    na Character vector of strings to interpret as missing values. Set this option tocharacter() to indicate no missing values.

    skip_empty_rows

    Should blank rows be ignored altogether? i.e. If this option is TRUE then blankrows will not be represented at all. If it is FALSE then they will be representedby NA values in all the columns.

    locale The locale controls defaults that vary from place to place. The default locale isUS-centric (like R), but you can use locale() to create your own locale thatcontrols things like the default time zone, encoding, decimal mark, big mark,and day/month names.

    altrep Control which column types use Altrep representations, either a character vectorof types, TRUE or FALSE. See vroom_altrep() for for full details.

    altrep_opts [Deprecated]

  • 22 vroom_str

    num_threads Number of threads to use when reading and materializing vectors. If your datacontains newlines within fields the parser will automatically be forced to use asingle thread only.

    progress Display a progress bar? By default it will only display in an interactive sessionand not while knitting a document. The automatic progress bar can be disabledby setting option readr.show_progress to FALSE.

    Examples

    lines

  • vroom_write 23

    Arguments

    x a vector

    Examples

    # when used on non-altrep objects altrep will always be falsevroom_str(mtcars)

    mt

  • 24 vroom_write_lines

    quote How to handle fields which contain characters that need to be quoted.

    • needed - Only quote fields which need them.

    • all - Quote all fields.

    • none - Never quote fields.

    escape The type of escape to use when quotes are in the data.

    • double - quotes are escaped by doubling them.

    • backslash - quotes are escaped by a preceding backslash.

    • none - quotes are not escaped.

    bom If TRUE add a UTF-8 BOM at the beginning of the file. This is recommendedwhen saving data for consumption by excel, as it will force excel to read the datawith the correct encoding (UTF-8)

    num_threads Number of threads to use when reading and materializing vectors. If your datacontains newlines within fields the parser will automatically be forced to use asingle thread only.

    progress Display a progress bar? By default it will only display in an interactive sessionand not while knitting a document. The display is updated every 50,000 valuesand will only display if estimated reading time is 5 seconds or more. The auto-matic progress bar can be disabled by setting option readr.show_progress toFALSE.

    path [Deprecated] is no longer supported, use file instead.

    Examples

    # If you only specify a file name, vroom_write() will write# the file to your current working directory.out_file

  • vroom_write_lines 25

    Usage

    vroom_write_lines(x,file,eol = "\n",na = "NA",append = FALSE,num_threads = vroom_threads()

    )

    Arguments

    x A data frame or tibble to write to disk.

    file File or connection to write to.

    eol The end of line character to use. Most commonly either "\n" for Unix stylenewlines, or "\r\n" for Windows style newlines.

    na String used for missing values. Defaults to ’NA’.

    append If FALSE, will overwrite existing file. If TRUE, will append to existing file. Inboth cases, if the file does not exist a new file is created.

    num_threads Number of threads to use when reading and materializing vectors. If your datacontains newlines within fields the parser will automatically be forced to use asingle thread only.

  • Index

    ∗ parserscols_condense, 5

    ?tidyselect::language, 19

    c(), 19clipboard(), 18col_big_integer (cols), 3col_character (cols), 3col_date (cols), 3col_datetime (cols), 3col_double (cols), 3col_factor (cols), 3col_guess (cols), 3col_integer (cols), 3col_logical (cols), 3col_number (cols), 3col_skip (cols), 3col_time (cols), 3cols, 3cols(), 8, 12, 19cols_condense, 5cols_only (cols), 3cols_only(), 8, 12, 19

    date_names, 5date_names(), 10date_names_lang (date_names), 5date_names_lang(), 10date_names_langs (date_names), 5default_locale (locale), 10

    fwf_cols (vroom_fwf), 18fwf_empty (vroom_fwf), 18fwf_empty(), 19fwf_positions (vroom_fwf), 18fwf_positions(), 19fwf_widths (vroom_fwf), 18fwf_widths(), 19

    gen_character (generators), 6gen_date (generators), 6gen_datetime (generators), 6gen_double (generators), 6gen_factor (generators), 6

    gen_integer (generators), 6gen_logical (generators), 6gen_name (generators), 6gen_name(), 7gen_number (generators), 6gen_tbl, 7gen_time (generators), 6generators, 6, 8guess_type, 9

    list(), 8, 12, 19locale, 10locale(), 4, 8, 9, 13, 19, 21

    OlsonNames(), 10

    problems, 11

    rlang::as_function(), 20

    spec (cols_condense), 5strptime(), 4

    tibble::tibble(), 14

    vctrs::vec_as_names(), 20vroom, 11vroom(), 3, 15, 21vroom_altrep, 15vroom_altrep(), 14, 20, 21vroom_altrep_opts, 16vroom_example, 16vroom_examples (vroom_example), 16vroom_format, 17vroom_fwf, 18vroom_lines, 21vroom_progress, 22vroom_str, 22vroom_write, 23vroom_write(), 17vroom_write_lines, 24

    26

    colscols_condensedate_namesgeneratorsgen_tblguess_typelocaleproblemsvroomvroom_altrepvroom_altrep_optsvroom_examplevroom_formatvroom_fwfvroom_linesvroom_progressvroom_strvroom_writevroom_write_linesIndex