The GloVe pre-trained word vectors provide word embeddings created using varying numbers of tokens.

embedding_glove6b(
  dir = NULL,
  dimensions = c(50, 100, 200, 300),
  delete = FALSE,
  return_path = FALSE,
  clean = FALSE,
  manual_download = FALSE
)

embedding_glove27b(
  dir = NULL,
  dimensions = c(25, 50, 100, 200),
  delete = FALSE,
  return_path = FALSE,
  clean = FALSE,
  manual_download = FALSE
)

embedding_glove42b(
  dir = NULL,
  delete = FALSE,
  return_path = FALSE,
  clean = FALSE,
  manual_download = FALSE
)

embedding_glove840b(
  dir = NULL,
  delete = FALSE,
  return_path = FALSE,
  clean = FALSE,
  manual_download = FALSE
)

Arguments

dir

Character, path to directory where data will be stored. If NULL, user_cache_dir will be used to determine path.

dimensions

A number indicating the number of vectors to include. One of 50, 100, 200, or 300 for glove6b, or one of 25, 50, 100, or 200 for glove27b.

delete

Logical, set TRUE to delete dataset.

return_path

Logical, set TRUE to return the path of the dataset.

clean

Logical, set TRUE to remove intermediate files. This can greatly reduce the size. Defaults to FALSE.

manual_download

Logical, set TRUE if you have manually downloaded the file and placed it in the folder designated by running this function with return_path = TRUE.

Source

https://nlp.stanford.edu/projects/glove/

Value

A tibble with 400k, 1.9m, 2.2m, or 1.2m rows (one row for each unique token in the vocabulary) and the following variables:

token

An individual token (usually a word)

d1, d2, etc

The embeddings for that token.

Details

Citation info:

InProceedings{pennington2014glove,
author = {Jeffrey Pennington and Richard Socher and Christopher D.
Manning},
title = {GloVe: Global Vectors for Word Representation},
booktitle = {Empirical Methods in Natural Language Processing (EMNLP)},
year = 2014
pages = {1532-1543}
url = {http://www.aclweb.org/anthology/D14-1162}
}

References

Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global Vectors for Word Representation.

Examples

# \donttest{ embedding_glove6b(dimensions = 50)
#> Do you want to download: #> Name: GloVe 6B #> URL: https://nlp.stanford.edu/projects/glove/ #> License: Public Domain Dedication and License v1.0 #> Size: 822.2 MB (158MB, 311MB, 616MB, and 921MB processed) #> Download mechanism: https #> Citation info: #> inproceedings{pennington2014glove, #> author = {Jeffrey Pennington and Richard Socher and Christopher D. Manning}, #> booktitle = {Empirical Methods in Natural Language Processing (EMNLP)}, #> title = {GloVe: Global Vectors for Word Representation}, #> year = {2014}, #> pages = {1532--1543}, #> url = {http://www.aclweb.org/anthology/D14-1162}, #> }
#> Error in menu(choices = c("Yes", "No"), title = title): menu() cannot be used non-interactively
# Custom directory embedding_glove42b(dir = "data/")
#> Do you want to download: #> Name: GloVe Common Crawl 42B #> URL: https://nlp.stanford.edu/projects/glove/ #> License: Public Domain Dedication and License v1.0 #> Size: 1.75 GB (4.31GB processed) #> Download mechanism: https #> Citation info: #> inproceedings{pennington2014glove, #> author = {Jeffrey Pennington and Richard Socher and Christopher D. Manning}, #> booktitle = {Empirical Methods in Natural Language Processing (EMNLP)}, #> title = {GloVe: Global Vectors for Word Representation}, #> year = {2014}, #> pages = {1532--1543}, #> url = {http://www.aclweb.org/anthology/D14-1162}, #> }
#> Error in menu(choices = c("Yes", "No"), title = title): menu() cannot be used non-interactively
# Deleting dataset embedding_glove6b(delete = TRUE, dimensions = 300)
#> Error: [ENOENT] Failed to search directory '/home/travis/.cache/textdata/glove6b': no such file or directory
# Returning filepath of data embedding_glove840b(return_path = TRUE)
#> /home/travis/.cache/textdata/glove840b
# }