Module 3: Using Open Data and Open Tools – Part 2
Introduction
Open scholarship and open knowledge are only possible when information–data–is available to everyone. Data can refer to numerical information like statistics or to textual information like social media posts and survey results, as well as the metadata describing that information. According to the Open Data Institute, data becomes open when it is licensed in a way that makes it freely available to anyone to use, study, and share. Licences may require attribution, which means giving credit to the data’s publishers when you use it, or that publications based on the data must be sharealike, meaning that they are also open and freely available.
Why Open Data
We live in the age of big data, but much of that data is locked behind paywalls or otherwise inaccessible to researchers and anyone else interested in accessing and using it, including policy-makers and the public.
Many discussions of open data refer to government data because it is used to inform public policy and is public by law, but is not always easy for citizens to access. Open data is important in many other contexts as well, including scientific research, since it allows researchers to replicate studies and verify findings while ensuring less duplication of work. Making data of all kinds open also opens the door to new ways of understanding and using the data, and therefore, to innovations and discoveries.
Using Open Data
This video continues the discussion from Module 2 and shows an example of how open data can be used for textual analysis. It walks through the process of encoding a plain text file in XML using the lxml library for Python.
Video: Writing XML with Python
By Luis Meneses
The files Luis refers to in this video are available to download and use in your own Jupyter Notebook. Contact Luis: luis.meneses@viu.ca
Activity
Many types of open data are available on various different topics and from many different sources. Browse the resources below to find open data that’s relevant to your research:
- Wikidata: data from all Wikimedia Foundation projects, including Wikipedia
- Google Dataset Search: a search tool for finding datasets
- Dataverse: a collection of open-source repositories for scholarly data
- Government of Canada Open Data: data collected by the Canadian government
- Internet Archive eBooks and Text (includes Project Gutenberg): text and literature
- The Art Institute of Chicago Collection: visual art images