The calculations are done with the text2vec package.
glove( text, tokenizer = text2vec::space_tokenizer, dim = 10L, window = 5L, min_count = 5L, n_iter = 10L, x_max = 10L, stopwords = character(), convergence_tol = -1, threads = 1, composition = c("tibble", "data.frame", "matrix"), verbose = FALSE )
text | Character string. |
---|---|
tokenizer | Function, function to perform tokenization. Defaults to text2vec::space_tokenizer. |
dim | Integer, number of dimension of the resulting word vectors. |
window | Integer, skip length between words. Defaults to 5. |
min_count | Integer, number of times a token should appear to be considered in the model. Defaults to 5. |
n_iter | Integer, number of training iterations. Defaults to 10. |
x_max | Integer, maximum number of co-occurrences to use in the weighting function. Defaults to 10. |
stopwords | Character, a vector of stop words to exclude from training. |
convergence_tol | Numeric, value determining the convergence criteria.
|
threads | number of CPU threads to use. Defaults to 1. |
composition | Character, Either "tibble", "matrix", or "data.frame" for the format out the resulting word vectors. |
verbose | Logical, controls whether progress is reported as operations are executed. |
https://nlp.stanford.edu/projects/glove/
A tibble, data.frame or matrix containing the token in the first column and word vectors in the remaining columns.
Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global Vectors for Word Representation.
glove(fairy_tales, x_max = 5)#> # A tibble: 451 x 11 #> tokens V1 V2 V3 V4 V5 V6 V7 V8 V9 #> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 "\"Do" -0.136 -0.319 -0.364 -1.14 0.567 0.376 -0.229 -0.840 -0.0496 #> 2 "\"Go… 0.605 -0.764 -0.887 0.0500 1.34 0.185 0.207 -0.543 -0.453 #> 3 "\"He" -0.483 0.0754 -0.194 -0.838 0.689 0.229 -0.621 0.328 -0.679 #> 4 "\"He… 0.263 -0.265 -0.499 -0.827 0.149 -0.00565 -0.170 1.05 -1.11 #> 5 "\"Oh" 0.927 -0.599 -0.0827 -0.762 0.967 -0.158 -0.319 0.452 0.207 #> 6 "\"Th… 0.433 -0.0803 0.510 -1.33 0.557 -0.0535 -0.127 1.07 -1.56 #> 7 "\"Ye… -0.162 -0.421 0.149 -1.20 0.833 0.132 0.325 -0.457 -0.516 #> 8 "-" 0.217 -0.0786 -0.472 -1.29 0.935 -0.212 -0.0861 0.175 -0.425 #> 9 "All" 0.495 -0.602 -0.0697 -0.301 0.437 0.256 0.571 -0.0976 -0.691 #> 10 "You" 0.435 -0.783 -0.264 -1.29 0.646 -0.987 -0.535 -0.0246 -1.20 #> # … with 441 more rows, and 1 more variable: V10 <dbl>