July 5th, 2017, by RAW Graphs Team
Wikidata is massive database of information encoded as linked open data. In our opinion, this is an incredible project, that surely will heavily influence the field of data visualisation (and not only this). During the Wikicite Hackathon conference, we had a chance to meet some of the great people beyond the project, and after the event, we immediately started to experiment on how to visualise this data.
What do we mean with “a huge database of information encoded as linked open data”? Basically, each item represents a concept identified by a unique ID, such as “Earth” identified by “Q2”, “Human being” by “Q5” or “Nelson Mandela” by “Q8023”. Items are connected between each others through their properties: the item Q8023 “Nelson Mandela” is an instance of the item “Human being”, and its place of birth is “Mvezo” which is part of “South Africa”, which is itself connected to other items through other properties.
To put it in simple words, Wikidata is nothing but a huge network of connected items that we are able to query in order to extract data of different kinds and with different structures.
Wikidata provide graphic interface for performing queries using the SPARQL language, which is quite particular. Although is not a dramatically easy to search for something using this language, just keep in mind the same basic concept of before, which is that items on Wikidata are linked by properties.
subject → property → object
Let’s make an example:
Nelson Mandela → place of birth → Mvezo subject property object
In this case, the two different objects – the person N. Mandela on the left and the place Mvezo on the right – are linked by the property “place of birth”.
Furthermore, the SPARQL language allows to make variable part of those parameters, for example:
?people → place of birth → Mvezo variable property object
Do you notice the word “people” preceded by the question mark? This is a variable object that Wikidata will “fill up” with the values if finds. For instance, thanks to this formula, we can query any “object” in Wikidata that “was born” in the village of Mvezo and Nelson Mandela seems to be the only one born there – considering the data available.
On Wikidata you can even formulate more complex queries, asking for a list of elements depending on multiple properties of your choice, such as all the people and their year of birth who were born in Milan and that were politician. Something like this:
?person → place of birth → Milan ?person → occupation → politician ?person → date of birth → ?date
Again, look at the query to see the data.
Excited by the possibilities offered by Wikidata, we made a first query asking all the bands available on Wikidata (query). How many bands are present on Wikidata, and how are they distributed over time?
After a first attempt, as usually happens in these situations, we found errors and inconsistencies, like bands created in 196 b.c. or millennia before Christ. It is interesting to notice that it is possible to come up with the correct data after a quick research on the web and that Wikidata enables you to go and fix those mistakes almost instantly. So if you are excited by the the idea of giving your contribution to the giant database, you really have to look into it.
A part from this fact, for our exploration we decided to focus on contemporary music, so we cut the data only to the last century. As you can probably imagine, there is a rising trend – apart 1974, what happened that year? – and a constant decreasing after 2005 – we believe that this is partly due to the fact that the most recent musical band formed still needs to make their way to success and so to Wikidata.
Afterwards, the question we asked ourselves was: to which genres do these bands belongs? Easily, instead of the year of foundation, we can query Wikidata for genre property. Here is the query, and following is the bubble chart representing them.
793 results in less than 10 seconds, without this collaborative miracle gathering this data could have taken years of research on magazines, webzines and other kind of publications, awesome. Of course, things such as Alternative Rock, Punk Rock, Rock and Pop music are taking over the scene, but we are still able to see less major genres such as J-pop, Tecno, Technical Death Metal or Cowpunk, Afrobeat or Raggaeton.
Since the genres are so many we have to drill down on the main ones to continue the exploration, so we kept only the top 20 ranking them by total amount of belonging bands. After this selection has been made, we went on with our exploration: how do genres evolve during time? For each band it is possible to query multiple data, in this case we wanted the year and the genre for each band. Here is the query, following the visualisation:
What we can see is that it seems that most of the genre had their hot moments in the history of music. New wave, for example, is concentrated across the ’80s, while indie rock have seen its peak around 2005. Anyway, we have to keep in mind that we are looking at bands foundations, not bands activities: some lasted more than 40 years and produced many songs and many albums. The data underlying this first area graph doesn’t give us a good perspective on the activity of each genre so we performed a newer query addressed to find an answer to this question.
In Wikidata is possible to follow properties in order to add details to our dataset: we can ask the list of albums made by those bands and ask for the year in which they where produced. Here is the query.
Using albums as units, we have a different perspective on it: Punk Rock is now less relevant, while Prog-Rock seems to be more active, especially in the early ’70s.
Finally, since there was a huge number of genres, we were curios to see if it is possible to find a hierarchy among them. Following the property “subclass of” it was possible for us to see how genres could be aggregated (here is the query).
This example of exploration is somehow useful for getting an idea of the potential of the database. As said at the beginning, learning how to correctly query the database is not a totally easy task, but after you manage to do so you can eventually go and explore several different fields: cinema, politics, literature or history and so on. You could even directly drill down to things relevant for you, depending on your interests: airports around Berlin, the most beautiful villages in France, birthplace of astronauts or where the Eiffel Tower shows up in famous paintings.
We have just started to explore Wikidata and what we discovered is that it is far more bigger than we expected. Potentially, in the future, more and more things could have a corresponding record within the database, leading it to grow closer and closer to a digital representation of the human knowledge.
Willing to enable people to use and visualise this data more easily and more efficiently, we’re working on a new interesting feature for RAWGraphs, so stay tuned to discover more about it!