Skip to contents

The GloVe pre-trained word vectors provide word embeddings created using varying numbers of tokens.

Usage

embedding_glove6b(
  dir = NULL,
  dimensions = c(50, 100, 200, 300),
  delete = FALSE,
  return_path = FALSE,
  clean = FALSE,
  manual_download = FALSE
)

embedding_glove27b(
  dir = NULL,
  dimensions = c(25, 50, 100, 200),
  delete = FALSE,
  return_path = FALSE,
  clean = FALSE,
  manual_download = FALSE
)

embedding_glove42b(
  dir = NULL,
  delete = FALSE,
  return_path = FALSE,
  clean = FALSE,
  manual_download = FALSE
)

embedding_glove840b(
  dir = NULL,
  delete = FALSE,
  return_path = FALSE,
  clean = FALSE,
  manual_download = FALSE
)

Arguments

dir

Character, path to directory where data will be stored. If NULL, user_cache_dir will be used to determine path.

dimensions

A number indicating the number of vectors to include. One of 50, 100, 200, or 300 for glove6b, or one of 25, 50, 100, or 200 for glove27b.

delete

Logical, set TRUE to delete dataset.

return_path

Logical, set TRUE to return the path of the dataset.

clean

Logical, set TRUE to remove intermediate files. This can greatly reduce the size. Defaults to FALSE.

manual_download

Logical, set TRUE if you have manually downloaded the file and placed it in the folder designated by running this function with return_path = TRUE.

Value

A tibble with 400k, 1.9m, 2.2m, or 1.2m rows (one row for each unique token in the vocabulary) and the following variables:

token

An individual token (usually a word)

d1, d2, etc

The embeddings for that token.

Details

Citation info:

InProceedings{pennington2014glove,
author = {Jeffrey Pennington and Richard Socher and Christopher D.
Manning},
title = {GloVe: Global Vectors for Word Representation},
booktitle = {Empirical Methods in Natural Language Processing (EMNLP)},
year = 2014
pages = {1532-1543}
url = {http://www.aclweb.org/anthology/D14-1162}
}

References

Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global Vectors for Word Representation.

Examples

if (FALSE) {
embedding_glove6b(dimensions = 50)

# Custom directory
embedding_glove42b(dir = "data/")

# Deleting dataset
embedding_glove6b(delete = TRUE, dimensions = 300)

# Returning filepath of data
embedding_glove840b(return_path = TRUE)
}