The 1st International Conference on Data-driven Knowledge Discovery: When Data Science Meets Information Science. Beijing June 19-22 2016
The 1st International Conference on Data-driven Knowledge Discovery: When Data Science Meets Information Science took place at the National Science Library, Chinese Academy of Sciences in Beijing from 19 June till 21 June 2016. The conference was opened by director Huang of the NSL/CAS who placed the event within the goals of the Library, and lauded the spirit of international collaboration in the area of data science and knowledge discovery. The whole event was an encouraging success with over 370 registered participants and highly enlightening presentations. The conference was organized by the Journal of Data and Information Science (JDIS) to bring the journal to the attention of an international and local audience.
As it is impossible to discuss each presentation one by one we just mention some highlights and general trends.
Over the decades computers have become more and more powerful and as such had a huge influence on research as they have made investigations possible that were in the past not or hardly possible. Yet, users (scientists, the general public) still have many demands. Examples of questions that are hard to answer nowadays include: automatically collecting the method(s) used in a study, detecting the main finding(s) (assuming there is one) of a study, identifying implicit information, automatically constructing training sets for automated learning, finding all capitalized words in a text,detecting the structure of papers and automatic function recognition. Also merging heterogeneous data has been mentioned, but here a possible answer was provided in the sense that using ontologies may provide a step towards a solution.
Big data and data science
What is data science? One possible answer provided by a speaker is: a collection of computational techniques and decision making approaches applicable to massive amounts of data. It was stated that Big Data innovated the approach to state governance. It should be mentioned though, that data for the future are – by definition – not available.
Other colleagues observed that Big Data encompass among other things: human activities, mobility and shifts of research interests. The availability of massive data has further led to new methods for data governance and new techniques for decision making.
Data driven applications
The conference provided several examples of practical use of big data: developing agricultural sciences; applications in bio-informatics; and data management for precision medicine being the most striking ones.
Other colleagues discussed data driven decision making, while data facilitating research, for instance in data-intensive scientific discovery received quite some attention. In terms of storage of big (and other) data a call was made to create a “commons” such as the NCI Genomic Data commons.
When studying data driven applications one should distinguish between inputs, activities, outputs and outcomes. Among the outcomes the following were mentioned: skilled employees, social change, health benefits, policy papers, ecological benefits, and influence on legislation.
When big data are to be used efficiently cooperation should be the main approach, beating the competitive approach to science.
Metrics – the structure of science
Among the metrics-related presentations we mention the ‘eternal’ question of field delineation (what is a field?), and the problem of how to collect relevant data in or about a field. Other aspects discussed during the conference were: labelling and updating clusters; combining the macro, meso and micro level; and describing the structure of science. It was stated that when trying to find the structure of science this should be based on what researchers actually do, not on a theoretical framework. Yet, others emphasized the role of theory.
Only domain experts can answer the question: “How to use available data”. Yet, in all practical applications, e.g. when dealing with data related to research evaluation, and derived metrics, these data must be interpreted by experts. Colleagues from CWTS (Leiden, the Netherlands) presented VOSViewer and CitNetExplorer, two software tools developed for network and data analysis. An example of the use of these tools in the study of the biomarker HER2 was provided by another participant.
A presentation on research fronts included the essential question: What is a significant research front? It was suggested that a significant research front is a hot, fast moving research front, with high growth potential. Presenters also paid attention to the issue of modelling.
In a typical example of citation analysis it was noted that only one quarter of all citations are essential. Essential citations are often re-citations in the same publication. Bridging the relation between metrics and social implications (see further) the question “Does more team work lead to more retractions?”was given as food for thought to the audience. Combining different aspects a participant introduced the notion of convergence. Here convergence is meant in the sense of the coming together of insights and approaches from originally distinct fields. This led to a discussion of the question: what is the relationship between "data driven" discovery and convergence.
Points made about digital innovation include:
Queries can result in too much or too little information;
How do we find the origin of ideas or of disciplines?
The path from digital library to digital librarian;
Smart cities (characterized among others by many digital innovations);
Going from systems to cloud services;
Can digital innovations help to battle data (information?) overload? On this point it was suggested that constructing representative subsets may be an important step forward.
Other contributors discussed the assessment of indicators for the economic impact of universities. Science-industry linkages in patents; various types of non-patent linkages besides scientific articles, such as books, handbooks, webpages, reports. Also the following topics received due attention: Tech Mining, data mining, foresight and technology roadmapping.
An interesting observation made in this context was the difference between a professional referent (patent examiner) and a non-professional (the scientist when drawing a reference list for an article).
When studying networks a multi-layer approach should in many cases be the preferred approach. One should, moreover, include indirect connections provided by neighbors of neighbors etc.
Another contribution emphasized how to find key nodes in collaborative networks: using amplitude and intensity.
Next to informetrics, bibliometrics, and altmetrics attention was given to the notion of entitymetrics, referring to the measurement of the impact of knowledge units. Such entities are embedded units in knowledge databases. Network features of such entities may lead to the detection of outstanding interactions between these entities.
How did scientists move from the semantic web to the Knowledge Web?
Finally, it was correctly observed that, when it comes to practical social networks and social media there often are no costs associated with use and non-use of social media. So how can brand loyalty be enhanced? The example of KAIXIN001 provided a case in China where, obviously, brand loyalty was very low.
Information and data operate among and between humans. Hence they involve social aspects of different kinds. Among these the following were explicitly mentioned:
Social aspects vs. technical aspects: knowledge science vs. knowledge engineers;
Any technology has some weakness, which can be exploited by those looking for it (cyber criminals);
It was stated that “Everyone you meet knows something you do not know”, illustrating the importance of life-long learning;
Data is not (just) a road to commercialization;
Importance of data science to society;
The network society is a society of data: big data and big noises;
Data handling involves ethical issues e.g. for those dealing with medical and other sensitive data.
Retractions vs. integrity.
In short, participants of the conference emphasized the convergence of data science, computer science, and information science, enabling data-driven knowledge discovery to support research, learning, governance, and social &economic development in a big data environment. Yet, they also placed this in the context of future libraries.