The GloVe pre-trained word vectors provide word embeddings created using varying numbers of tokens.
Usage
embedding_glove6b(
dir = NULL,
dimensions = c(50, 100, 200, 300),
delete = FALSE,
return_path = FALSE,
clean = FALSE,
manual_download = FALSE
)
embedding_glove27b(
dir = NULL,
dimensions = c(25, 50, 100, 200),
delete = FALSE,
return_path = FALSE,
clean = FALSE,
manual_download = FALSE
)
embedding_glove42b(
dir = NULL,
delete = FALSE,
return_path = FALSE,
clean = FALSE,
manual_download = FALSE
)
embedding_glove840b(
dir = NULL,
delete = FALSE,
return_path = FALSE,
clean = FALSE,
manual_download = FALSE
)
Arguments
- dir
Character, path to directory where data will be stored. If
NULL
, user_cache_dir will be used to determine path.- dimensions
A number indicating the number of vectors to include. One of 50, 100, 200, or 300 for glove6b, or one of 25, 50, 100, or 200 for glove27b.
- delete
Logical, set
TRUE
to delete dataset.- return_path
Logical, set
TRUE
to return the path of the dataset.- clean
Logical, set
TRUE
to remove intermediate files. This can greatly reduce the size. Defaults to FALSE.- manual_download
Logical, set
TRUE
if you have manually downloaded the file and placed it in the folder designated by running this function withreturn_path = TRUE
.
Value
A tibble with 400k, 1.9m, 2.2m, or 1.2m rows (one row for each unique token in the vocabulary) and the following variables:
- token
An individual token (usually a word)
- d1, d2, etc
The embeddings for that token.
Details
Citation info:
InProceedings{pennington2014glove,
author = {Jeffrey Pennington and Richard Socher and Christopher D.
Manning},
title = {GloVe: Global Vectors for Word Representation},
booktitle = {Empirical Methods in Natural Language Processing (EMNLP)},
year = 2014
pages = {1532-1543}
url = {http://www.aclweb.org/anthology/D14-1162}
}