The European Commission has just released the final report of a two-year study, undertaken by Science-Metrix, on applying data mining techniques to inform policy. A primary aim of the study was to develop and apply a methodological framework for conducting data mining projects—with particular focus on key research & innovation policy issues for the private sector—and report findings and recommendations back to the Commission’s Directorate-General for Research and Innovation to guide their future use of data mining approaches. Not only did the study develop a practical tool for designing and implementing data mining projects, it also returned some surprising findings.
To coincide with the release of the report, we’ve been rolling out a series of blog posts to share lessons learned and encourage discussion on this highly topical subject, see details below. To give just a teaser of the extensive study results, the case study findings put several assumptions underlying contemporary R&I policy to the test and found them wanting for quantitative evidence, including the assumptions that research that crosses disciplinary and sectoral boundaries is more likely to lead to innovation, and that innovation drives economic competitiveness and employment. The implications of these underpinned the first recommendation of the study: that the reproducibility of the policy-relevant case study findings be tested. In light of the novelty of data mining projects and the field in general, the second recommendation responded to the need to embrace “failed” data mining projects as learning experiences, and the third recommendation called for adopting the report’s proposed two-phase framework to help mitigate the risk of project failure. The fourth recommendation urged support for creating and maintaining an inclusive community of practice for data mining within the field of SciSIP—the science of science and innovation policy.
In practical terms, the study entailed first documenting the uses, best practices, benefits and limitations of data mining and big data in the private and public sectors. A framework was then developed to guide the design and implementation of data mining and big data projects in the R&I policy context. An expert workshop was conducted to validate the framework, and suggested improvements were implemented before the framework was applied to six case studies for testing. After further iterative development, the study team presented the case study findings, four recommendations and the final two-phase framework to a second expert workshop for validation.
The final report and its accompanying application report, which details the processes and results of the six case studies, are substantial in both length and depth. To complement them, we are releasing a series of accessible and granular blog posts on ScienceMetrics.org. The series will document the chronological development of the study, including limitations and issues encountered. Each post will discuss a key finding or practical consideration in developing and applying the study’s data mining framework, building a valuable resource for others working in this field. The study’s principal authors, David Campbell, Chantale Tippett and Brooke Struck, will each present various aspects of the project, and they welcome feedback and discussion via the blog’s comments section.
To read the first blog post in the series, please visit ScienceMetrics.org
Data Mining. Knowledge and technology flows in priority domains within the private sector and between the public and private sectors. (2017). Prepared by Science-Metrix for the European Commission. ISBN978-92-79-68029-8; DOI 10.2777/089