
Bridging Data Gap for AI Excellence
DataHub KGP, an initiative by IIT Kharagpur, aims to establish a centralized repository of high-quality, curated, and annotated datasets tailored to India’s unique societal needs. AI models in India often struggle due to the scarcity of localized, well-annotated datasets, as most data originates from Western contexts that overlook India’s diverse linguistic, cultural, and socio-economic landscape. DataHub KGP strives to fill this gap, delivering datasets that authentically capture local nuances across critical sectors.
Governance
Public policy datasets, citizen feedback analysis, and digital governance tools.
Education
Lecture videos, transcripts and contextual question-answers on core engineering subjects.
Law
Transcripts of court proceedings, case laws.
Tourism
Tourist attraction datasets, visitor demographics, and travel behavior analytics.
Agriculture
Crop yield history, UAV-captured field imagery, and insect infestation sounds.
Low Resource Language
Speech and text data covering marginalized Indian languages like Santhali, Garo, and Khasi.
Renewable Energy
Solar irradiation and NO2 measurements related to Photovoltaic cells.
Responsible AI
Ethical AI datasets, bias detection models, and explainable AI frameworks.
Smart City
Urban mobility data, smart grid analytics, and IoT sensor datasets for city planning.
DataHub in Numbers
No. of Datasets
States Covered
Districts Covered
Households Covered
