Skip to contents

DBpedia ontology dataset classification dataset. It contains 560,000 training samples and 70,000 testing samples for each of 14 nonoverlapping classes from DBpedia.

Usage

dataset_dbpedia(
  dir = NULL,
  split = c("train", "test"),
  delete = FALSE,
  return_path = FALSE,
  clean = FALSE,
  manual_download = FALSE
)

Arguments

dir

Character, path to directory where data will be stored. If NULL, user_cache_dir will be used to determine path.

split

Character. Return training ("train") data or testing ("test") data. Defaults to "train".

delete

Logical, set TRUE to delete dataset.

return_path

Logical, set TRUE to return the path of the dataset.

clean

Logical, set TRUE to remove intermediate files. This can greatly reduce the size. Defaults to FALSE.

manual_download

Logical, set TRUE if you have manually downloaded the file and placed it in the folder designated by running this function with return_path = TRUE.

Value

A tibble with 560,000 or 70,000 rows for "train" and "test" respectively and 3 variables:

class

Character, denoting the class class

title

Character, title of article

description

Character, description of article

Details

The classes are

  • Company

  • EducationalInstitution

  • Artist

  • Athlete

  • OfficeHolder

  • MeanOfTransportation

  • Building

  • NaturalPlace

  • Village

  • Animal

  • Plant

  • Album

  • Film

  • WrittenWork

See also

Other topic: dataset_ag_news(), dataset_trec()

Examples

if (FALSE) {
dataset_dbpedia()

# Custom directory
dataset_dbpedia(dir = "data/")

# Deleting dataset
dataset_dbpedia(delete = TRUE)

# Returning filepath of data
dataset_dbpedia(return_path = TRUE)

# Access both training and testing dataset
train <- dataset_dbpedia(split = "train")
test <- dataset_dbpedia(split = "test")
}