This post is the corresponding write-up for a WDSS project in which a pair of society members collaborated to produce a web app for visualizing university league table data. You can view the final product at this link.
The power of data manifests itself not only in accessiblity, but in usability. Without the ability to present trends and patterns in an understandable form, we cannot answer the questions we care about, and so data provides us less value than it is capable of.
This scenario presents itself in regard to university ranking data. There is a lot of year-on-year variation in university rankings; take as an example, the overall ranking of the University of Warwick in the last five years, spanning the range from 8th to 11th. This temporal evolution is often overlooked in making university choices, despite it being likely to have a significant impact on how one’s degree is viewed in the future.
The reason for this oversight is that all the most popular ranking websites display only the current year’s data by default. Any long-term comparison has to be done by manually accessing data for different years, requiring considerable time and effort. This led us to question whether we could collate this data and present it in a more usable form, capable of displaying these long-term trends.
This project demonstrates the usefulness of web scraping in obtaining larger datasets for comparative purposes than we would be able to otherwise. It also highlights the importance of effective visualizations in data science and how they make data more interpretable and accessible. Two members of WDSS collaborated on this project: Tim Hargreaves, who developed the app and scraped the ranking data, and I (Janique Krasnowska), who performed the initial data exploration and communicated the findings.
The source code for the web app and scraping scripts have been open-sourced in this repository.
The entire project was developed using the language R. We began by scraping data from the Complete University Guide using the
rvest package. In doing this, we obtained data for 187 universities and 70 courses, spanning the past 13 years; 45,580 individual observations in total. Needless to say, it would be counterproductive to attempt to combine the insights from this data in just one graph. Instead, we opted for an interactive web app that allows the user to choose the universities and courses they may be interested in.
There are two types of comparisons a user can make: the same course at multiple universities or multiple courses at the same university. More complex comparisons would also be possible with more complicated code, but one might question the usefulness of comparing Archaeology at Oxford with Economics at Warwick, for example.
To make discovering trends easier, we included a LOESS smoothing feature in the app. LOESS is a non-parametric method based on moving averages that constructs a smooth curve of best fit for the data points. It helps us to filter out noise and focus instead on general trends. For example, we can see in recent years that Warwick’s mathematical prowess is decreasing whilst Durham overtakes. By choosing to show extra features when hovering, we can see that Durham now outperforms Warwick across all the Complete University Guide metrics (entry standards, student satisfaction, research intensity, and graduate prospects).
This small coding project demonstrates some of the core values of data science—obtaining data of interest with the help of programming software and presenting the gained insights in a way that is accessible to a lay person. These skills could be applied to any problem where data is stored poorly or in multiple sources and has to be transformed to answer questions from people without any coding background.
The final product wouldn’t be possible without WDSS resources, including its Shiny server and blogging platform.
Thank you for reading. We hope you find other interesting trends in higher education rankings with our app.