Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support for more datasets? #104

Open
tdhock opened this issue Feb 5, 2024 · 7 comments
Open

support for more datasets? #104

tdhock opened this issue Feb 5, 2024 · 7 comments

Comments

@tdhock
Copy link

tdhock commented Feb 5, 2024

Hi! I see that R torch support mnist and cifar data sets, but I was wondering if there are plans to support the other data sets which are present in pytorch? https://pytorch.org/vision/stable/datasets.html

@cregouby
Copy link
Contributor

cregouby commented Aug 31, 2024

Hello @tdhock,

Which one should be a priority for you ?

Don't you want to try to do it on your own and propose a pull-request? Depending on which dataset, you may need to simply augment dataset-mnist.R or dataset_cifar.R or take inspiration out of both files.

@tdhock
Copy link
Author

tdhock commented Sep 1, 2024

all of them would be great but perhaps it would be easiest to start with the mnist variants?
I expected that the R package would provide the same/analogous functionality as the python module.
More generally, is there documentation for what features of the python version are not implemented? And which will be implemented at some point? And which will never be implemented?

@cregouby
Copy link
Contributor

cregouby commented Sep 1, 2024

Hello @tdhock

MNist variant are indeed a good first issue to try: you have to duplicate the block of code

#' Kuzushiji-MNIST
#'
#' Prepares the [Kuzushiji-MNIST](https://github.com/rois-codh/kmnist) dataset
#' and optionally downloads it.
#'
#' @param root (string): Root directory of dataset where
#' `KMNIST/processed/training.pt` and `KMNIST/processed/test.pt` exist.
#' @param train (bool, optional): If TRUE, creates dataset from `training.pt`,
#' otherwise from `test.pt`.
#' @param download (bool, optional): If true, downloads the dataset from the
#' internet and puts it in root directory. If dataset is already downloaded,
#' it is not downloaded again.
#' @param transform (callable, optional): A function/transform that takes in an
#' PIL image and returns a transformed version. E.g, `transforms.RandomCrop`.
#' @param target_transform (callable, optional): A function/transform that takes
#' in the target and transforms it.
#'
#' @export
kmnist_dataset <- dataset(
name = "kminst_dataset",
inherit = mnist_dataset,
resources = list(
c("http://codh.rois.ac.jp/kmnist/dataset/kmnist/train-images-idx3-ubyte.gz", "bdb82020997e1d708af4cf47b453dcf7"),
c("http://codh.rois.ac.jp/kmnist/dataset/kmnist/train-labels-idx1-ubyte.gz", "e144d726b3acfaa3e44228e80efcd344"),
c("http://codh.rois.ac.jp/kmnist/dataset/kmnist/t10k-images-idx3-ubyte.gz", "5c965bf0a639b31b8f53240b1b52f4d7"),
c("http://codh.rois.ac.jp/kmnist/dataset/kmnist/t10k-labels-idx1-ubyte.gz", "7320c461ea6c1c855c0b718fb2a4b134")
),
classes = c('o', 'ki', 'su', 'tsu', 'na', 'ha', 'ma', 'ya', 're', 'wo')
)
and modify the few varables name, ressources and classes . Do you want to give a try ?

For the difference in between implementations, there is not such thing AFAIK. And there is no implementation plan nor a plan to "never implement", but only your and the community spare time to contribute...

@tdhock
Copy link
Author

tdhock commented Sep 2, 2024

I don't have time myself but it sounds like a reasonable target for a gsoc'25 contributor, I wrote a project, https://github.com/rstats-gsoc/gsoc2025/wiki/torchvision-in-R-improvements could you co-mentor next summer? If so could you please add your name/email under EVALUATING MENTOR on that wiki page?
are there any other tasks where someone could contribute? If so please add them to "Details of your coding project"

@Vishwas-10
Copy link

Vishwas-10 commented Dec 17, 2024

Hello @tdhock and @cregouby,

I modified the existing package to include the FashionMNIST dataset as a starting point. I tested it locally, and it works fine. I have attached the code snippet along with a test case.

#' @param root (string): Root directory of dataset.
#' @param train (bool, optional): If TRUE, creates dataset from `training.rds`, otherwise from `test.rds`.
#' @param download (bool, optional): If TRUE, downloads the dataset from the internet.
#' @param transform (callable, optional): Function to transform input data.
#' @param target_transform (callable, optional): Function to transform target labels.
#'
#' @export
fashion_mnist_dataset <- dataset(
  name = "fashion_mnist",
  resources = list(
    c("http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz", "8d4fb7e6c68d591d4c3dfef9ec88bf0d"),
    c("http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz", "25c81989df183df01b3e8a0aad5dffbe"),
    c("http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz", "bef4ecab320f06d8554ea6380940ec79"),
    c("http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz", "bb300cfdad3c16e7a12a480ee83cd310")
  ),
  training_file = 'training.rds',
  test_file = 'test.rds',
  classes = c('T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
              'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot'),

TEST CASE:

train_data <- fashion_mnist_dataset(
  root = "./data",
  train = TRUE,
  transform = function(x) torch_tensor(as.array(x), dtype = torch_float())$div(255),
  target_transform = NULL,
  download = TRUE
)

I am planning to implement similar functionality for other datasets listed in PyTorch. Is there any protocol I should follow before raising a PR?

@cregouby
Copy link
Contributor

cregouby commented Dec 18, 2024

Hello @Vishwas-10

That's a great move, thanks for the effort !
Please put that into a P.R. and we'll be happy to review it.
Your test case shall be turn into a test into the tests/testthat/ folder where you have dataset tests to inspire from.
Don't forget to update NEWS.md and add your name as contributor in the DESCRIPTION file

@Prateek0xeo
Copy link

Hello @cregouby,

i am trying to add Datasets form the https://pytorch.org/vision/stable/datasets.html like Inaturalist and Eurosat datasets but experiencing this problem "MD5 mismatch" for both. Can you pls help me with this?

@Prateek0xeo Prateek0xeo mentioned this issue Jan 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants