Why Wikidata? Over the past three weeks, I’ve had the pleasure of participating in the November Wikidata Institute, a three week online course designed to introduce participants to Wikidata. In preparing to blog about my experience (Q164359), I struggled with how to clearly communicate all that I’ve gained from working with Wikidata, and why I sought the financial aid generously provided by Wiki Education to pursue the course. The importance and overwhelming potential of an open repository of linked data on nearly everything in the universe (Q1) is and was fairly clear to me, with my goals for the course centered on learning how to best communicate my enthusiasm for Wikidata and the reasons behind it, in hopes of integrating Wikidata projects into my library work alongside colleagues from around the world engaged in similar efforts.
For those not immediately excited by the idea of a unique identifier for everything, from planet Earth (Q2) to Grace Hoadley Dodge (Q5591212), this article by Tom Simonite in Wired eloquently outlines the Wikidata behind Wikipedia and Amazon’s Alexa services, and provides a fun introduction. A blog post written by Will Kent, instructor for our Wikidata Institute cohort, notes that “you may not know it yet, but Wikidata is very important to you,” before detailing the different ways Wikidata impacts research, libraries, and society while drawing parallels to the global impact of Wikipedia and other Wikimedia initiatives.
What is Wikidata? Wikidata is an open, multilingual structured knowledge base that can be read and edited by both humans (Q5) and machines (Q11012). In fact, you can download the entire database here. All of the content in Wikidata is in the public domain under a CC0 license, which means that you may freely use, share, and remix data. Wikidata is multilingual, and data can be entered in any language. The relationships between statements in the knowledge base make it possible to search for information in a structured way. Every item is known by its Q-number as a unique identifier, and labeled in different languages and variations. Every statement consists of a three-part structure that connects an item, property, and value.
Wikidata & Libraries In the OCLC report Transitioning to the Next Generation of Metadata, Karen Smith-Yoshimura outlines six years of research forecasting metadata practices, and stemming from the OCLC Metadata Managers Focus Group. In the introduction, Smith-Yoshimura notes that “format-specific metadata management based on curated text strings in bibliographic records understood only by library systems is nearing obsolescence, both conceptually and technically”. In libraries, Wikidata has opened up new ways for unlocking the MARC (Q722609) record, and bringing library collections out of traditional library systems.
Smith-Yoshimura goes on to explain that transitioning to identity management, and away from authority control, “poses a change in focus, from providing access points in resource descriptions to describing the entities in the resource (work, persons, corporate bodies, places, events) and establishing the relationships and links among them”. Using Wikidata for unique identifiers opens up more possibilities for items not represented by authority files, and for embedding multilingual labels. The Program for Cooperative Cataloging (PCC) Wikidata Pilot Project aims to “gather together, organize, and expand resources for librarians interested in editing Wikidata, and to provide a space to develop shared data models and best practices” for “publishing, linking, and enriching library linked data”. I plan on continuing to follow their work to learn more about how these librarians are using Wikidata with their collections.
In weaving together lessons learned at this year’s DLF Forum, and thinking about library instruction, Wikidata provides an easy introduction to linked data and serves as an excellent resource for teaching data justice. Metadata Librarian Karly Wildenhaus, Cataloging Librarian Sarah Hammerman, and David Hecht of the Cybernetics Library recently hosted a Wikidata Meetup aimed at adding and editing Wikidata items for women, trans, and non-binary people in computing. Wikidata, like most databases, contains disparities in race, sexuality, and gender representation. Classification systems, like libraries, are not neutral, and carry the political, ethical, and societal values of their creators. In a keynote address for Wikidatacon 2019, Os Keyes discusses how Wikidata “risks further enabling already-damaging systems of power”. Contributing to Wikidata, and teaching others how to contribute to Wikidata, can help.
Wikidata & Research Projects like Wikicite and Scholia offer open citations, linked bibliographic data, and scholarly profiles, all built with Wikidata and by the Wikidata community. Ruth Kitchin Tillman, Cataloging Systems and Linked Data Strategist, created a workshop for cataloging librarians on how to add faculty to Wikidata. When included in Wikidata, faculty can aggregate identifiers like ORCiD, Google Scholar, VIAF, and Library of Congress, while enhancing Google search and Wikipedia infoboxes. Having research and publication data in Wikidata allows for enhanced visibility on Scholia, with researcher and Wikidata founder Denny Vrandečić linked here as an example. These tools build new connections across people and research, as Wikicite and Scholia do not require a subscription or membership.
The research community has developed unique project pages with data models and sample SPARQL queries. A strong example is the Wikidata WikiProject on COVID-19. On the day that I started writing this post, COVID-19 deaths surpassed 250,000 in the United States. The Wikidata item for the COVID-19 pandemic in the United States (Q83873577) had not yet included this figure, so I added it with a reference URL. I also added the number of recovered cases, which reached 4,294,743. The group has developed a SARS-CoV-2-Queries book, written by Addshore, Daniel Mietchen, and Egon Willighagen, in response to a tweet by Senior Data Scientist at Elsevier Health Dr. Maulik Kamdar who asked about SPARQL queries to query Wikidata for information on pandemics. If you ever wanted to query Wikidata for all genes and proteins of all SARSr viruses, you could use this query. You can identify vaccines in development in Wikidata using this query.
What’s next? During the Wikidata Institute, I’ve been able to practice working with different data models and get a feel for how I can best contribute to the community and weave Wikidata into Gottesman Libraries' (Q85852060) workflows. After our system migration, integrating URIs into our catalog records will be an essential part of rehabilitating our bibliographic data. Adding pilot collections to Wikidata is also an interesting way to increase the discoverability of select collections, and potentially integrate our collection with other related collections. I’d like to consider adding faculty profiles to Wikidata for better discoverability of faculty research in Scholia and Wikicite, and for increasing representation of Teachers College’s scholarly output.
A major Wikidata ah-ha moment arrived after conducting a simple Wikidata query for women educated at Teachers College (Q7691246). I was happy to find Shirley Chisholm. Because of my previous work at Boys and Girls High School with MyLibraryNYC, I knew that Chisholm also attended Girls’ High School, which eventually merged into Boys and Girls High School in the 1970s. Her portrait is included in a collage of notable alumni proudly hanging in the opening foyer to Boys and Girls. In navigating to the Wikidata item for Chisholm, I noticed that Girls’ High School was not included as a value for the educated at property (P69). I added this value with a reference, and noticed how sparse the information for Girls’ High School was in Wikidata. This led me to add 20 references to the item for Girls’ High School (Q5564567), and start thinking about the visibility of New York City Public Schools (Q408230) in Wikidata.
As a result of these explorations, I created a data model in a tool called Cradle for creating new items for schools in Wikidata. I’ve also started cleaning NYC Open Data sets for batch uploading of public schools to Wikidata, and am excited by the possibilities present. You can check in on my progress here. A recurring question that stemmed from this three-week intensive asked what will happen to K-12 school libraries in the shift to linked data. A few queries suggested that little to no New York City elementary or middle schools are in Wikidata, with a few notable high schools included. What can Wikidata offer us in terms of tracking K-12 school mergers and closures? It was through Wikidata that I learned that the high school my cousins attended in Chicago closed this year due to a lack of funding amid the Coronavirus pandemic. Will K-12 schools be left behind, or eclipsed by higher education’s presence in Wikidata? How can librarians help, and what would it look like for middle and high school students to edit Wikidata for their own schools and communities? Rivka Genesen’s blog post on her experiences as a middle and high school teacher using Wikidata offers some helpful insights. What would Wikidata professional development sessions for K-12 teachers look like? What other city portal data can be integrated into Wikidata? I’m interested in finding out.