Module 2: Using Open Data and Open Tools – Part 1


Analyzing open data and sharing findings online require some digital tools, and there are a wide variety of tools freely available for anyone to use for just about any application you can think of. Open scholarship can also include developing open source tools for other researchers to use.

Many of these open tools depend on open source code, another critical element of the Open movement. According to the Open Source Initiative, open source code is similar to Open Data in that it can be reused and redistributed for free, must be available in a form that is easy to use and modify, and cannot restrict who can use it or for what purpose. Open source code should also not restrict the type of technology it can be used with.

The video in this module discusses a few widely used open tools and shows an example of how they can be used in a research context. It describes how to parse–or analyze–XML documents created in Visual Studio Code (an open source text editor) using Python and a Python library called Beautiful Soup, all within a Jupyter Notebook.


Python is one of the most popular programming languages in the world and is used for everything from web development to game development to research and education. It is widely used among humanities researchers because it is open source, readable, and flexible and has a strong community of practice. Another advantage of Python is that it has many libraries and packages–collections of functions for completing various tasks–such as the Natural Language Toolkit for linguistic and textual analysis.



Jupyter Notebook is an open source platform that allows you to keep different types of information together, including code, text, notes and documentation, data, and visualizations. These notebooks are shareable and interactive, allowing researchers to tell stories about their work.



XML (eXtensible Markup Language) is a tool for marking the structural features of a text. It is called a descriptive markup language because it describes the text it marks up rather than indicating how it should be displayed as HTML does, for example. 

Because XML is extensible and stores data as plain text, it is highly interoperable. This means that it is compatible with many different computer systems with different hardware and software and is also readable to humans and to machines.

Video: Parsing XML with Python

By Luis Meneses

The files Luis refers to in this video are available to download and use in your own Jupyter Notebook. Contact Luis:


Think about (and discuss, if you are completing this module with others) the tools you use in your research. Are any of them open source? What are some advantages and disadvantages of using open source tools in your research?

Think about some potential applications of the tools discussed in the video. What kind of research questions could you answer using these tools?