Dataset Download

Training dataset
Description: This dataset contains 3909 data points with 809 immuno-positive peptides and 3100 immuno-negative peptides. The immunogenicity training data were collected from the T Cell Assay of IEDB (http://www.iedb.org/downloader.php?file_name=doc/tcell_full_v3.zip), dated before May 27th, 2018. For the 337248 peptide records in the primary dataset, we performed filtering under Homo Sapiens and MHC-I subtypes and restrained the peptide length 9, as well as merging identical records and mapping to human reference genome hg19. Note that the cross validation in the paper was perfomed by partitioning the training dataset further in to training and testing datasets.

Validation dataset
Description: This dataset contains 430 data points with 125 immuno-positive peptides and 305 immuno-negative peptides. The immunogenicity validation data were collected from the T Cell Assay of IEDB after 2018 and  we performed same operations to obtain the final qualified validation dataset.

Other resource (available upon request)