Getting started

Web API Overview

If you just landed here without having red the main page you should definitely check it out by clicking here, especially the "examples" section. It offers you a rapid overview of the API (whether in browser or not). Once you are done with this little getting started tutorial you can come back here and learn more deeply how to use the API.

If you already have red the main page you can go on and grab more details about the API and how it works. Read on !


Compute keys

The main purpose of the melinda keys extraction tool API is to compute subjects identification keys on RDF graphs. To do so you have to input the graph web-based URI (there is no direct file input) via a http POST request at least specifying the required "dataset" keyword having the URI as a value. A POST request returns you a keys extraction instance which gives you all the details about it such as its identification number (ID).

The ID is indispensable because a keys extraction is a resource so it has a unique location which is : keyextraction/{ID}/ where you can find all the extraction parameters but also the URI of the actual keys (which will be : keyextraction/{ID}/keys/). What lies in this location is the same instance as the one retrieved directly by the POST request, but permanent.

As shown in the getting started section, the request can be done via the web browser's interface or any other http tool. Each keys extraction comes with a set of parameters, you can read about it below. The POST request can only be done at /keyextraction/.


A keys extraction instance has quite a lot of parameters : "id", "runtime_date" and "keys_URI" cannot be set manually because the first one in generated automatically, the second one is always the date on which the computation was launched and the third one is the permanent keys URI ; all the other parameters can be set manually. Here are the details about each parameter :

  • key_type : either "simple" (checking values one by one when computing) or "ads" (default, checking values as set when computing).
  • support_treshold : percentage of individuals concerned by the key ; value is between 0 and 1, default is 0.0
  • discriminability_treshold : percentage of individuals not concerned by the key (this can be considered as error margin) ; value is between 0 and 1, default is 1.0
  • undefined : take into account (or don't) subjects having no value for the current property ; default is false.
  • rdf_types : list of types (RDF type URI) to filter the computation by. The computation will be done only for thoses types ; default is none (compute not taking the type into account).
  • rdf_properties : list of properties (RDF property URI) to filter the computation by. The computation will be done only for thoses properties (which can be seen as reducing the graph's size) ; default is none (compute for the whole graph, not some properties).
  • each_type : boolean that indicates (if no type in rdf_types is provided) whether to compute keys for each RDF type one by one or directly for the whole graph.

Input format

To compute keys you have to input your dataset's URI and any manually set parameter at /keyextraction/ using POST request. This input data can be encoded either in json or in a urlencoded form. Here are some input samples :

Json :

     "dataset": "", 
     "key_type": "simple", 
     "support_treshold": 0.8, 
     "undefined": true, 
     "each_type": true

Urlencoded :


The keys extraction process is mainly asynchronous because the computation can take quite a long time depending on the size of the graph. This means that the keys will not be immediatly available at the returned keys URI right after the POST request. It can take up to hours to compute but less than one day. If you try to get back the keys after 24 hours and they are still unavailable the keys extraction instance will be deleted and you will have to re-run the computation : this means that the server encountered an unknown error or went to maintenance or whatever.


Some errors may come up when trying to compute keys, here are some errors you may encounter and correct :

  1. Malformed data input : always check that your json or urlencoding is properly written
  2. Unknown parameter : don't add unexisting parameter
  3. Bad URI : dataset argument is required and must be an existing URI.


Keys extraction

You can get the keys extraction instance returned after you did the POST request any time you want by making a simple GET request to /keyextraction/{ID}/ This will return you the instance serialized as json, providing you informations on all the parameters you used to compute the keys (or the default one) and some extra informations (computation date, ID and keys URI). Using any HTTP tool you can get the key extraction as html by appending .api to the key extraction uri. You can make sure to get only json by appending .json to the key extraction uri as well. Key extraction is immutable.

RDF keys

When the computation of RDF keys is done you can actually get them by making a GET request to the keys URI which will be /keyextraction/{ID}/keys/ Keys may not be avalaible directly because it is mostly asynchronous (read the asynchronousness section). All informations to know about the RDF keys themselves including context and vocabulary are provided in the keys as RDF statements. Same as keys extractions instances, RDF are serialized in json(-ld) by default : you can get them as pure json by appending .json to the URI or wrapped in html interface by appending .api. RDF keys are immutable as well.

RDF format

By default RDF keys are serialized in json-ld but you can get them in basically any RDF format you want. You can specify the RDF format you want in two ways :

  1. directly in the URL by appending .my_chosen_rdf_format to the keys URI
  2. in the HTTP header of the GET request in keys URI :
    Accept: application/rdf+my_chosen_rdf_format

Here are the supported formats (they should be written exactly as follow) :

  • json (default, json-ld)
  • n3
  • nt
  • turtle
  • xml
  • rdf (same as xml)

You should notice that "rdf" is only available via url extension. In HTTP Accept header you should always use application/rdf+xml as application/rdf is not recommended by w3c so not available.


Keys extraction

You can delete any keys extraction instance you want just by making a DELETE HTTP request at its URI. When you delete a keys extraction instance you delete its associated keys as well, but keys cannot be deleted directly : if you want to do so, delete the keys extraction. On the browser side there is a delete button at each keys extraction URI.

Other stuff

Put and patch ?

HTTP PUT and PATCH requests are not supported because all the ressources are immutable. Keys extraction instances are records of keys computations that happened so since it already happened it cannot be changed. This is the same for the keys : the computed keys are specific to a certain set of parameters and a particular dataset, so for the same set of parameters and same dataset it will be the same keys, that is why keys are immutable and cannot be put or patched as well.

User restrictions

There is no user restrictions at all, because there is no user authentification. The web interface uses sessions but that is about it. There is no authentification because this API exists to improve the semantic web, help people link datas, and share information so everything is public. This means that everybody can get or delete whatever he wants, but this API is built for developpers trusting each other so please, do not delete everyone's work.

Code samples

// we compute keys
curl -X POST -H "Content-Type: application/json" \
-d '{"dataset": ""}' \

// we have keys extraction ID so get it

// we can as well get the keys if we waited enough time

// if we want the keys as turtle
curl -H "Accept: application/rdf+turtle"

// or

// now we're done so we can delete the instance
curl -X DELETE
>>> import requests
>>> url = ''
>>> dataset = {'dataset': ''}
>>> response =, data=dataset)
>>> response.content
'{"id": 21, "dataset": "",
"keys_URI": "",
"runtime_date": "2013-07-19", "key_type": "ads", "support_treshold": 0.0,
"discriminability_treshold": 1.0, "undefined": false, "rdf_types": [],
"rdf_properties": [], "each_type": false}'
>>> response.status_code
>>> url += '21/keys/'
>>> response = requests.get(url)
>>> response.content
'[{"": [{"@value": "melinda:Key"}],
, {"@id": ""},
{"@id": ""}]'
>>> response.json()
[{u'melinda:discriminability': [{u'@value': 1.0}], 
u'': [{u'@value': u'melinda:Key'}],
{u'@id': u''},
{u'@id': u''}]
require 'net/http'

url       = ''
dataset   = { dataset: '' }

http      =
request   =, data=dataset)
response  = http.request(request)
# '{"id": 21, "dataset": "",
#"keys_URI": "",
#"runtime_date": "2013-07-19", "key_type": "ads", "support_treshold": 0.0,
#"discriminability_treshold": 1.0, "undefined": false,
#"rdf_types": [], "rdf_properties": [], "each_type": false}'

url       += '21/keys/'
http      = Net::HTTP::new(url)
request   =, data=dataset)
response  = http.request(request)
# ...


Content policy

As said before (read the delete section), all keys extraction instances and sets of RDF keys are stored publicly but these are the only things we store. Since there is no authentification system there is no need to store anything about the users, not even the IP addresses from where the requests come. This means that all the available data are anonymous.


If you have questions, or comments or anything related to the keys extraction tool, please send an e-mail to jerome.david#inria:fr

If you have questions or comments or anything related to the keys extraction tool web API, you can send an e-mail to contact#anthonydelaby:me

We sincerely hope you enjoy our work ! Now let's compute some keys.