DBpedia Ontology Dataset

DBpedia ontology dataset classification dataset. It contains 560,000 training samples and 70,000 testing samples for each of 14 nonoverlapping classes from DBpedia.

Usage

dataset_dbpedia(
  dir = NULL,
  split = c("train", "test"),
  delete = FALSE,
  return_path = FALSE,
  clean = FALSE,
  manual_download = FALSE
)

Source

https://papers.nips.cc/paper/5782-character-level-convolutional-networks-for-text-classification.pdf

https://www.dbpedia.org/

https://github.com/srhrshr/torchDatasets/raw/master/dbpedia_csv.tar.gz

Arguments

dir: Character, path to directory where data will be stored. If NULL, user_cache_dir will be used to determine path.
split: Character. Return training ("train") data or testing ("test") data. Defaults to "train".
delete: Logical, set TRUE to delete dataset.
return_path: Logical, set TRUE to return the path of the dataset.
clean: Logical, set TRUE to remove intermediate files. This can greatly reduce the size. Defaults to FALSE.
manual_download: Logical, set TRUE if you have manually downloaded the file and placed it in the folder designated by running this function with return_path = TRUE.

Value

A tibble with 560,000 or 70,000 rows for "train" and "test" respectively and 3 variables:

class: Character, denoting the class class
title: Character, title of article
description: Character, description of article

Details

The classes are

Company
EducationalInstitution
Artist
Athlete
OfficeHolder
MeanOfTransportation
Building
NaturalPlace
Village
Animal
Plant
Album
Film
WrittenWork

Examples

if (FALSE) {
dataset_dbpedia()

# Custom directory
dataset_dbpedia(dir = "data/")

# Deleting dataset
dataset_dbpedia(delete = TRUE)

# Returning filepath of data
dataset_dbpedia(return_path = TRUE)

# Access both training and testing dataset
train <- dataset_dbpedia(split = "train")
test <- dataset_dbpedia(split = "test")
}