DBpedia ontology dataset classification dataset. It contains 560,000 training samples and 70,000 testing samples for each of 14 nonoverlapping classes from DBpedia.
Usage
dataset_dbpedia(
dir = NULL,
split = c("train", "test"),
delete = FALSE,
return_path = FALSE,
clean = FALSE,
manual_download = FALSE
)
Source
https://papers.nips.cc/paper/5782-character-level-convolutional-networks-for-text-classification.pdf
https://github.com/srhrshr/torchDatasets/raw/master/dbpedia_csv.tar.gz
Arguments
- dir
Character, path to directory where data will be stored. If
NULL
, user_cache_dir will be used to determine path.- split
Character. Return training ("train") data or testing ("test") data. Defaults to "train".
- delete
Logical, set
TRUE
to delete dataset.- return_path
Logical, set
TRUE
to return the path of the dataset.- clean
Logical, set
TRUE
to remove intermediate files. This can greatly reduce the size. Defaults to FALSE.- manual_download
Logical, set
TRUE
if you have manually downloaded the file and placed it in the folder designated by running this function withreturn_path = TRUE
.
Value
A tibble with 560,000 or 70,000 rows for "train" and "test" respectively and 3 variables:
- class
Character, denoting the class class
- title
Character, title of article
- description
Character, description of article
Details
The classes are
Company
EducationalInstitution
Artist
Athlete
OfficeHolder
MeanOfTransportation
Building
NaturalPlace
Village
Animal
Plant
Album
Film
WrittenWork
See also
Other topic:
dataset_ag_news()
,
dataset_trec()
Examples
if (FALSE) {
dataset_dbpedia()
# Custom directory
dataset_dbpedia(dir = "data/")
# Deleting dataset
dataset_dbpedia(delete = TRUE)
# Returning filepath of data
dataset_dbpedia(return_path = TRUE)
# Access both training and testing dataset
train <- dataset_dbpedia(split = "train")
test <- dataset_dbpedia(split = "test")
}