Data-Driven Analytics For Exploring Predict Network Datasets And Threat Sharing
Datasets are the essential part of most practical research in any scientific area. Improving accessibility to datasets has a huge impact on quality of research. PREDICT, as a major dataset repository in network security area, provides developers and evaluators with regularly updated network operations data relevant to cyber defense technology development with more than 397 datasets of total size 350 TB. While PREDICT provides some meta-data as a basic catalog for exploring these datasets, the searching effort required for finding the appropriate data can be extremely costly, given that PREDICT includes hundreds of datasets. This can reduce the usability of PREDICT and limit community participation in sharing datasets and experiences. Hence, this project goal is to improve the usability of exploring PREDICT repository by focusing on metadata extraction, classification and visualization. Our objectives in this project are to (1) investigate extending the existing meta-data and extracting new meta-data (keywords) by mining the project description and other available information and (2) creating an interactive logical or visual analytic engine to explore and analyze metadata of large datasets through usable interfaces. The ultimate goal is to offer an automated and highly usable environment for security domain repositories. The analysis will be based on domain-specific data mining of datasets and their metadata and correlation of such data with other sources.
Data Driven Intelligence Analytics for Security Data Repositories
Data Driven Intelligence Analytics Presentation