About


News and updates

Overview

Welcome to the CornSeeds project! This is an ongoing research effort to provide agricultural researchers around the world with corn seeds image data for training seeds classification/detection models. The data is available for free to researchers for non-commercial use.

Can I download the images?

Yes. For details visit link given below.
github.com/Naagar/Seeds_Classification

Research Team

Sandeep Nagar, ML Lab, IIIT-Hyderabad

Prateek Pani, ML Lab, IIIT-Hyderabad

Raj Nair, AdTech Corp.

Prof. Girish Varma, CSTAR Lab and ML Lab, IIIT-Hyderabad

For students, advisors, and other contributors to the project please see the list of publications below.

Publications

Nagar, S., Pani, P., Nair, R., Varma, G., :Automated Seed Quality Testing System using GAN & Active Learning, In: PReMI 2021, ISI Kolkata

Inside dataset folder

Dataset (folder_name)


Train
-broken
-discolored
-pure
-silkcut
Test
-broken
-discolored
-pure
-silkcut

Sponsers

We are grateful to support from Adtech and International Institute of Information Techonology (IIIT)-Hyderabad which enabled this project.

What is Corn Seed Dataset?

This dataset is the images of corn seed's considring the top and bottom view independently (two images for one corn seed : top and bottom). There are four classes of the corn seed (Broken-B, Discolored-D, Silkcut-S, and Pure-P) 17802 images are labled by the experts at the AdTech Corp. and 26K images were unlablled out of which 9k images were labled using the Active Learning (BatchBALD)

We have created three different datasets: (1). Primary dataset: contains the 17802 images labeled by the experts. Top-view(8901) and Bottom-view(8901).

(2). Dataset with fake images: We generated fake images using Conditional GAN (BigGAN) as follows: broken-2937, discolored-5823, pure-2937, silkcut-5823 instances and added them into the train set to balance the data set.

(3). Balanced dataset: In this case of adding newly captured images labeled using the Batch Active Learning method, new 9000 labeled images are added into the primary dataset. This new dataset contains 26,802 images split into train and validation set 80: 20, respectively. Contains the 17802 images and the 9K images labeled by the Active Learning (BatchBALD).

Why Corn Seed Dataset?

Machine vision for precision agriculture has attracted research interest in recent years. Plant health monitoring approaches are addressed, including weed, insect, and disease detection. With the success of DNNs, different methods have been proposed to tackle problems of corn seed classification. Fine-grained objects (seeds) are visually similar by a rough glimpse, and details can correctly recognize them in discriminative local regions. We hope that this dataset set focus of computer vision and machine learning researchers on the problems related to automation and agriculture.

download the dataset
(1). Primary dataset: 17801 lablled images (unbalanced) click here
(2.) Dataset with fake images : primary dataset + fake images generated using BigGAN (Cond. GAN), 5K images for each class click here
(3). Balanced dataset(26802): 26802 :- balanced and 9K lablled using Active learning (BatchBALD) click here

Image pre processing python's code click here

Sample Images