This website uses cookies and similar technologies to understand visitors' experiences. By continuing to use this website, you accept our use of cookies and similar technologies,Terms of Use, and Privacy Policy.

Jun 18 2014 - 08:00 PM
Wikipedia And The Library
article.title

Title: Towards Linking Libraries and Wikipedia: Automatic Subject Indexing of Library Records With Wikipedia Concepts (2014)

Authors: Arash Joorabchi & Abdulhussain E. Mahdi

Source: Journal of Information Science

Research Question: Is library-Wikipedia integration possible?

Study Design: When you have a question on a particular topic, where do you go to find the answer? Although libraries have historically been one of the top resources for information seekers, the invention of search engines like Google and information repositories like Wikipedia have pushed library resources to the very back of the mind. Less than 1% of information searches start on a library website, with the majority (84%) of information-seeking behaviors beginning on search engines. Of those searches that begin on search engines, 62% start on Google, and 60% of informational queries result in relevant Wikipedia articles in high ranking positions. The authors of this study believe that integrating catalog records, like the ones on WorldCat, with Wikipedia articles could be an important step in ensuring the library’s continued use as a local, reliable information source.

In order to implement a mass library-Wikipedia integration, the authors proposed an automatic subject indexing with Wikipedia concept. To test the possibility, they used an open-source toolkit called Wikipedia-Miner for detecting Wikipedia concepts, on topic the researchers were familiar with, that also would occur in the content of library metadata records. Next they needed to distinguish the concepts that were "key" in terms of reflecting the core subject(s) of the items represented in the record. They made this distinction through 15 statistical, positional, and semantic features: position, frequency, length, lexical diversity, average link probability, maximum link probability, average disambiguation confidence, maximum disambiguation confidence, link-based relatedness to other concepts, link-based relatedness to context, category-based relatedness to other concepts, generality, in links, and translations count. After this data was collected, the authors tested a collection of library metadata records manually with the Wikipedia concepts discovered. They randomly selected 100 records (using the same topic as the Wikipedia concepts) from the WorldCat-Million dataset and set to work.

Findings: After running the 100 records through the Wikipedia-Miner, a total of 1,762 candidate concepts were identified, with a minimum of 2 per record and a maximum of 69. All the candidate concepts were manually examined and labeled "key" or "non-key" in respect to their corresponding records. After the analysis, a total of 469 concepts were deemed "key" and 1293 were "non-key" concepts. On average, each record in the dataset had been indexed by 4.7 key concepts on Wikipedia. The files were then stored in an ARFF (attribute-relation file format) and imported into Weka, an open-source software with a collection of machine learning algorithms geared toward data mining tasks, and examined in further experiments.

Moving Forward: The idea of integrating library metadata records with Wikipedia concepts would have tremendous benefits to libraries across the world. For those who use Wikipedia as an information source, the ability to automatically see and link to relevant materials at a local library would prove infinitely valuable in their research endeavors. The results of this small experiment prove that with the proper indexing and analysis, a Wikipedia-library integration is possible. Further development of the concept analysis process is critical, as the current manual implementation would be extremely labor-intensive. The authors eventually envision the employment of a Wikipedia bot to analyze and edit articles on a large scale.

Image: Giula Forsythe Flickr

Follow Dana Haugh on Twitter

|By: Dana Haugh|744 Reads