Skip to content

Visualisation of the lak dataset using Unfolding – Geert

by on March 15, 2013

The lak dataset

As the dataset was available in rdf format and I had no experience whatsoever using this format, it took me some time to get acquainted with it. To use the dataset in Java I used the Jena library to search in the rdf model for the needed data. As data to process I used the name of the country authors are linked to.

Retrieving geodata

As the lak dataset did not contain any geo data apart from the name of the country authors are based and the names of the organisation an author is linked to another method had to be used to retrieve geodata that can be visualized. I used Sparql queries to DBpedia to find coordinates of the countries. Names of countries can be written in a lot of different ways unfortunately, for example The United States of America was found in the dataset as “USA” . DBPedia could not find any coordinates for “USA” as the correct name in DBPedia is United_States (because the page that is found when searching for “USA” redirects to the page for “United_States”). For countries like this where I found no coordinates using DBPedia, I used Geonames on the name of the country found in the dataset.

The visualisation

The visualisation itself is very simple and just shows the number of authors that is linked to a specific country. When the number of authors for a country is higher, the color turns from green to yellow to red and the size of the marker increases. I also experimented with hovering over markers, but this is not visible in the following screenshot of my visualisation.

lakvisualisation

From → Uncategorized

2 Comments
  1. Nice experiment! Two comments. First, regarding your geocoding of place names (countries of affiliation). As you have experienced, authors tend to use various spellings. To be able to lookup these in more variations than Geonames supports, you might want to check out Google Maps API (https://developers.google.com/maps/documentation/geocoding/) – it is much more tolerant.

    Second, I like your idea of aggregating the data, i.e. by showing attendees by country. While it can be fine to double encode a value (color and size, in your case) for clarity reasons, I would, however, recommend to not use a qualitative color schema for continuous values. I also did not fully understand your usage of rings (e.g. why is the green ring thinner than a thick small red ring?).

    Again, I like your country aggregation, and would suggest to keep digging into this direction. Maybe have a country based marker, but when users zoom in, de-cluster, and show single markers for each affiliation and/or state or such?

  2. Thank you for this ideas, I definitely have to look into the Google Maps API for more tolerant look ups.
    The double encoding of the value had no specific reason unless to check out the visual features of Markers, this will definitely not become our final way of visualizing markers, it was only a test.
    At the moment we are combining the three visualizations and we are still thinking whether we will give the user the choice to watch the data in different ways (per country or per university) or use an idea similar to your idea, giving the users a view that is different depending on the zoom-level of the map.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: