Planet RDF

Subscribe to Planet RDF feed
Updated: 1 week 1 day ago

Preprint: Interpreting Medical Tables as Linked Data to Generate Meta-Analysis Reports

Thu, 07/17/2014 - 10:38

Categories:

RDF

Varish Mulwad, Tim Finin and Anupam Joshi, Interpreting Medical Tables as Linked Data to Generate Meta-Analysis Reports, 15th IEEE Int. Conf. on Information Reuse and Integration, Aug 2014.

Evidence-based medicine is the application of current medical evidence to patient care and typically uses quantitative data from research studies. It is increasingly driven by data on the efficacy of drug dosages and the correlations between various medical factors that are assembled and integrated through meta–analyses (i.e., systematic reviews) of data in tables from publications and clinical trial studies. We describe a important component of a system to automatically produce evidence reports that performs two key functions: (i) understanding the meaning of data in medical tables and (ii) identifying and retrieving relevant tables given a input query. We present modifications to our existing framework for inferring the semantics of tables and an ontology developed to model and represent medical tables in RDF. Representing medical tables as RDF makes it easier for the automatic extraction, integration and reuse of data from multiple studies, which is essential for generating meta–analyses reports. We show how relevant tables can be identified by querying over their RDF representations and describe two evaluation experiments: one on mapping medical tables to linked data and another on identifying tables relevant to a retrieval query.

Notes on public talks

Wed, 07/16/2014 - 17:35

Categories:

RDF

Massimo and I worked together on two posters about automatic provenance capturing for research publications and we won the ESIP FUNding Friday award. What left unforgettable to me, however, is the great lesson I learnt from giving the 2 minute pitch in front of the ESIP folks.

During the 2 minutes talk, I just could not help staring at the two posters we printed and made on the day before and that morning. Now I know the reason — it’s because I only practiced my speech with one of the posters displayed on my laptop. For the other poster, I have no chance to practice talking about it at all. I became dependent on the presence of the posters in front of me and cannot make the talk in front of people, instead of posters.

Possible solutions to make my eyes move away from the posters when talking? The best I thought of is to get REALLY familiar with the topic I’m gonna present — at least so familiar that I don’t need to look at any auxiliary facility such as a poster to remind myself what to say, better if being able to save some spare attention for the audience — to receive their feedback and adjust accordingly in real time. The need to ignore the audience for a while to concentrate on “what should I say here?” indicates that I’m not familiar enough with the topic.

In addition to the content, presenters also need to get familiar with the way of presenting the content. This could include scrutinizing the practice talk sentence by sentence to make sure “I said what I meant and I meant what I said”. Not until such clarity and confidence are reached can one start thinking about all the fancy stuff like speaking pace, volume variations and eye contacts with audience. Well, those are fancy to me, not necessarily for good speakers.

So there is really a lot to work on for a public talk, especially if it’s the first time for the presenter to talk about the idea. The work is so much that it cannot be done over the night before the talk. We need to work on the familiarity, clarity and confidence of our ideas on a daily basis. It helps to write down what we mean and talk about it often.

 

:BaseKB offered as a better Freebase version

Tue, 07/15/2014 - 19:49

Categories:

RDF

In The trouble with DBpedia, Paul Houle talks about the problems he sees in DBpedia, Freebase and Wikidata and offers up :BaseKB as a better “generic database” that models concepts that are in people’s shared consciousness.

:BaseKB is a purified version of Freebase which is compatible with industry-standard RDF tools. By removing hundreds of millions of duplicate, invalid, or unnecessary facts, :BaseKB users speed up their development cycles dramatically when compared to the source Freebase dumps.

:BaseKB is available for commercial and academic use under a CC-BY license. Weekly versions (:BaseKB Now) can be downloaded from Amazon S3 on a “requester-paid basis”, estimated at $3.00US per download. There are also BaseKB Gold releases which are periodic :BaseKB Now snapshots. These can be downloaded free via Bittorrent or purchased as a Blu Ray disc.

It looks like it’s worth checking out!

From Taxonomies over Ontologies to Knowledge Graphs

Tue, 07/15/2014 - 08:57

Categories:

RDF

With the rise of linked data and the semantic web, concepts and terms like ‘ontology’, ‘vocabulary’, ‘thesaurus’ or ‘taxonomy’ are being picked up frequently by information managers, search engine specialists or data engineers to describe ‘knowledge models’ in general. In many cases the terms are used without any specific meaning which brings a lot of people to the basic question:

What are the differences between a taxonomy, a thesaurus, an ontology and a knowledge graph?

This article should bring light into this discussion by guiding you through an example which starts off from a taxonomy, introduces an ontology and finally exposes a knowledge graph (linked data graph) to be used as the basis for semantic applications.

1. Taxonomies and thesauri

Taxonomies and thesauri are closely related species of controlled vocabularies to describe relations between concepts and their labels including synonyms, most often in various languages. Such structures can be used as a basis for domain-specific entity extraction or text categorization services. Here is an example of a taxonomy created with PoolParty Thesaurus Server which is about the Apollo programme:

The nodes of a taxonomy represent various types of ‘things’ (so called ‘resources’): The topmost level (orange) is the root node of the taxonomy, purple nodes are so called ‘concept schemes’ followed by ‘top concepts’ (dark green) and ordinary ‘concepts’ (light green). In 2009 W3C introduced the Simple Knowledge Organization System (SKOS) as a standard for the creation and publication of taxonomies and thesauri. The SKOS ontology comprises only a few classes and properties. The most important types of resources are: Concept, ConceptScheme and Collection. Hierarchical relations between concepts are ‘broader’ and its inverse ‘narrower’. Thesauri most often cover also non-hierarchical relations between concepts like the symmetric property ‘related’. Every concept has at least on ‘preferred label’ and can have numerous synonyms (‘alternative labels’). Whereas a taxonomy could be envisaged as a tree, thesauri most often have polyhierarchies: a concept can be the child-node of more than one node. A thesaurus should be envisaged rather as a network (graph) of nodes than a simple tree by including polyhierarchical and also non-hierarchical relations between concepts.

2. Ontologies

Ontologies are perceived as being complex in contrast to the rather simple taxonomies and thesauri. Limitations of taxonomies and SKOS-based vocabularies in general become obvious as soon as one tries to describe a specific relation between two concepts: ‘Neil Armstrong’ is not only unspecifically ‘related’ to ‘Apollo 11′, he was ‘commander of’ this certain Apollo mission. Therefore we have to extend the SKOS ontology by two classes (‘Astronaut’ and ‘Mission’) and the property ‘commander of’ which is the inverse of ‘commanded by’.

The SKOS concept with the preferred label ‘Buzz Aldrin’ has to be classified as an ‘Astronaut’ in order to be described by specific relations and attributes like ‘is lunar module pilot of’ or ‘birthDate’. The introduction of additional ontologies in order to expand expressivity of SKOS-based vocabularies is following the ‘pay-as-you-go’ strategy of the linked data community. The PoolParty knowledge modelling approach suggests to start first with SKOS to further extend this simple knowledge model by other knowledge graphs, ontologies and annotated documents and legacy data. This paradigm could be memorized by a rule named ‘Start SKOS, grow big’.

3. Knowledge Graphs

Knowledge graphs are all around (e.g. DBpedia, Freebase, etc.). Based on W3C’s Semantic Web Standards such graphs can be used to further enrich your SKOS knowledge models. In combination with an ontology, specific knowledge about a certain resource can be obtained with a simple SPARQL query. As an example, the fact that Neil Armstrong was born on August 5th, 1930 can be retrieved from DBpedia. Watch this YouTube video which demonstrates how ‘linked data harvesting’ works with PoolParty.

Knowledge graphs could be envisaged as a network of all kind things which are relevant to a specific domain or to an organization. They are not limited to abstract concepts and relations but can also contain instances of things like documents and datasets.

Why should I transform my content and data into a large knowledge graph?

The answer is simple: to being able to make complex queries over the entirety of all kind of information. By breaking up the data silos there is a high probability that query results become more valid.

With PoolParty Semantic Integrator, content and documents from SharePoint, Confluence, Drupal etc. can be tranformed automatically to integrate them into enterprise knowledge graphs.

Taxonomies, thesauri, ontologies, linked data graphs including enterprise content and legacy data – all kind of information could become part of an enterprise knowledge graph which can be stored in a linked data warehouse. Based on technologies like Virtuoso, such data warehouses have the ability to serve as a complex question answering system with excellent performance and scalability.

4. Conclusion

In the early days of the semantic web, we’ve constantly discussed whether taxonomies, ontologies or linked data graphs will be part of the solution. Again and again discussions like ‘Did the current data-driven world kill ontologies?‘ are being lead. My proposal is: try to combine all of those. Embrace every method which makes meaningful information out of data. Stop to denounce communities which don’t follow the one or the other aspect of the semantic web (e.g. reasoning or SKOS). Let’s put the pieces together – together!

 

[CfP] Semantic Web Journal: Special Issue on Question Answering over Linked Data

Mon, 07/14/2014 - 14:58

Categories:

RDF
Dear all, The Semantic Web Journal is launching a special issue on Question Answering over Linked Data, soliciting original papers that * address the challenges involved in question answering over linked data, * present resources and tools to support question answering over linked data, or * describe question answering systems and applications. Submission deadline is November 30th, 2014. For more detailed information please visit: http://www.semantic-web-journal.net/blog/call-papers-special-issue-question-answering-over-linked-data With kind regards, Axel Ngonga and Christina Unger

New Version of FOX

Mon, 07/14/2014 - 14:57

Categories:

RDF
Dear all, We are very pleased to announce a new version of FOX [1]. Several improvements have been carried out: (1) We have fixed minor issues in the code. In addition, we have updated several libraries. (2) As a result, the FOX output parameters have changed minimally. An exact specification of the parameters with examples is available at the demo page. [2] (3) Moreover, we now make bindings available for Java[3] and Python[4] to use FOX’s web service within your application. Enjoy and cheers, The FOX team [1] https://github.com/AKSW/FOX/releases/tag/v2.2.0 [2] http://fox.aksw.org [3] https://github.com/renespeck/fox-java [4] https://github.com/earthquakesan/fox-py

New UMBEL Concept Tagger Web Service

Mon, 07/14/2014 - 14:44

Categories:

RDF
We just released a new UMBEL web service endpoint and online tool: the Concept Tagger Plain.

This plain tagger uses UMBEL reference concepts to tag an input text. The OBIE (Ontology-Based Information Extraction) method is used, driven by the UMBEL reference concept ontology. By plain we mean that the words (tokens) of the input text are matched to either the preferred labels or alternative labels of the reference concepts. The simple tagger is merely making string matches to the possible UMBEL reference concepts.

This tagger uses the plain labels of the reference concepts as matches against the input text. With this tagger, no manipulations are performed on the reference concept labels nor on the input text (like stemming, etc.). Also, there is NO disambiguation performed by the tagger if multiple concepts are tagged for a given keyword.

Intended Users

This tool is intended for those who want to focus on UMBEL and do not care about more complicated matches. The output of the tagger can be used as-is, but it is intended to be the initial input to more sophisticated reference concept matching and disambiguation methods. Expect additional tagging methods to follow (see conclusion).

The Web Service Endpoint

The web service endpoint is freely available. It can return its resultset in JSON, Clojure code or EDN (Extensible Data Notation).

This endpoint will return a list of matches on the preferred and alternative labels of the UMBEL reference concepts that match the tokens of an input text. It will also return the number of matches and the position of the tokens that match the concepts.

The Online Tool

We also provide an online tagging tool that people can use to experience interacting with the web service.

The results are presented in two sections depending on whether the preferred or alternative label(s) were matched. Multiple matches, either by concept or label type, are coded by color. Source words with matches and multiple source occurrences are ranked first; thereafter, all source words are presented alphabetically.

The tagged concepts can be clicked to have access to their full description.

EDN and ClojureScript

An interesting thing about this user interface is that it has been implemented in ClojureScript and the data serialization exchanged between this user interface and the tagger web service endpoint is in EDN. What is interesting about that is that when the UI receives the resultset from the endpoint, it only has to evaluate the EDN code using the ClojureScript reader (cljs.reader/read-string) to consider the output of the web service endpoint as native data to the application.

No parsing of non-native data format is necessary, which makes the code of the UI simpler and makes the data manipulation much more natural to the developer since no external API is necessary.

What is Next?

This is the first of a series of tagging web service endpoints that will be released. Our intent is to release UMBEL tagging services that have different level of sophistication. Depending on how someone wants to use UMBEL, he will have access to different tagging services that he could use and supplement with their own techniques to end up with their desired results.

The next taggers (not in order) that are planned to be released are:

  • Plaintagger – no weighting or classification except by occurrence count
    • Entity plain tagger (using the Wikidata dictionary)
    • Scones plain tagger – concept + entity
  • Nountagger – with POS, only tags the nouns; generally, the preferred, simplest baselinetagger
    • Concept noun tagger
    • Entity noun tagger
    • Scones noun tagger
  • N-gramtagger – a phrase-basedtagger
    • Concept n-gram tagger
    • Entity n-gram tagger
    • Scones n-gram tagger
  • Completetagger – combinations of above with different machine learning techniques
    • Concept complete tagger
    • Entity complete tagger
    • Scones complete tagger.

So, we welcome you to try out the system online and we welcome your comments and suggestions.

Validating RDF Data by Evaluating RDF/Clojure Code

Mon, 07/07/2014 - 18:27

Categories:

RDF

I recently started to investigate different ways to serialize RDF triples using Clojure code 1 2 3. I had at least two goals in mind: first, ending up with an RDF serialization format that is valid Clojure code and that could easily be manipulated using core Clojure functions. The second goal was to be able to “execute” the code to validate the data according to the semantics of the ontologies used to define the data.

This blog post focuses on showing how the second goal can be implemented.

Before doing so, let’s take some time to explore what the sayings of ‘Code as Data' and ‘Data as Code' may mean in that context.

Code as Data, Data as Code

What is Code as Data? It means that the program code you write is also data that can be manipulated by a program. In other words, the code you are writing can be used as input [to a macro], which can then be transformed and then evaluated. The code is considered to be data to be manipulated by a macro system to output executable code. The code itself becomes data that can be manipulated with some internal mechanism in the language. But the result of these manipulations is still executable code.

What is Data as Code? It means that you can use a programming language’s code to embed (serialize) data. It means that you can specify your own sublanguage (DSL), translate it into code (using macros) and execute the resulting code.

The initial goal of a RDF/Clojure serialization is to specify a way to write RDF triples (data) as Clojure (code). That code is data that can be manipulated by macros to produce executable code. The evaluation of the resulting code is the validation of the data structures (the graph defined by the triples) according to the semantics defined in the ontologies. This means that validating the graph may also occur by evaluating the resulting code (and running the functions).

Ontology Creation

In my previous blog posts about serializing RDF data as Clojure code, I noted that the properties, classes and datatypes that I was referring to in those blog posts were to be defined elsewhere in the Clojure application and that I would cover it in another blog post. Here it is.

All of the ontology properties, classes and datatypes that we are using to serialize the RDF data are defined as Clojure code. They can be defined in a library, directly in your application’s code or even as data that gets emitted by a web service endpoint that you evaluate at runtime (for data that has not yet been evaluated).

In the tests I am doing, I define RDF properties as Clojure functions; the RDF classes and datatypes are normal records that comply with the same RDF serialization rules as defined for the instance records.

Some users may wonder: why is everything defined as a map but not the properties? Though each property’s RDF description is available as a map, we use it as Clojure meta-data for that function. We consider that properties are functions and not a map. As you will see below, these functions are used to validate the RDF data serialized in Clojure code. That is the reason why they are represented as Clojure functions and not as maps like everything else.

Someone could easily leverage the RDF/Clojure serialization without worrying about the ontologies. He could get the triples that describes the records without worrying about the semantics of the data as represented by the ontologies. However, if that same person would like to reason over the data that is presented to him — if he wants to make sure the data is valid and coherent –then he will require the ontologies descriptions.

Now let’s see how these ontologies are being generated.

Creating OWL Classes

As I said above, an OWL class is nothing but another record. It is described using the same rules as previously defined4. However, it is described using the OWL language and refers to a specific semantic. Creating such a class is really easy. We just have to follow the semantics of the OWL language, and the rules of RDF/Clojure serialization. For example, take this example that creates a simple FOAF person class:

(def foaf:+person
  "The class of all the persons."
  {#'uri "http://xmlns.com/foaf/0.1/Person"
   #'rdf:type #'owl:+class
   #'rdfs:label "Person"
   #'rdfs:comment "The class of all the persons."})

As you can see, we are describing the class the same way we were defining normal instance records. However, we are doing it using the OWL language.

Creating OWL Datatypes

Datatypes are also serialized like normal RDF/Clojure records; that is, just like classes. However, since the datatypes are fairly static in the way we define them, I created a simple macro called gen-datatype that can be used to generate datatypes:

(defmacro gen-datatype
  "Create a new datatype that represents a OWL datatype class.
   [name] is the name of the datatype to create.
   Optional parameters are:
     [:uri] this is the URI of the datatype to create
     [:base] this is the URI of base XSD datatype of this new datatype
     [:pattern] this is a regex pattern to use to use to validate that
                a given string represent a value that belongs to that datatype
     [:docstring] the docstring to use when creating this datatype"
  [name & {:keys [uri base pattern docstring]}]
  `(def ~name
     ~(str docstring)
     (merge {#'rdf:type "http://www.w3.org/TR/rdf-schema#Datatype"}
            (if ~uri {#'rdf.core/uri ~uri})<br />
            (if ~pattern {#'xsp:pattern ~pattern})
            (if ~base {#'xsp:base ~base}))))

You can use this macro like this:

(gen-datatype *full-us-phone-number
              :uri "http://purl.org/ontology/foo#phone-number"
              :pattern "^[0-9]{1}-[0-9]{3}-[0-9]{3}-[0-9]{4}$"
              :base "http://www.w3.org/2001/XMLSchema#string"
              :docstring "Datatype representing a phone US phone number")

And it will generate a datatype like this:

{#'ontologies.core/xsp:base "http://www.w3.org/2001/XMLSchema#string"
 #'ontologies.core/xsp:pattern "^[0-9]{1}-[0-9]{3}-[0-9]{3}-[0-9]{4}$"
 #'rdf.core/uri "http://purl.org/ontology/foo#phone-number"
 #'ontologies.core/rdf:type "http://www.w3.org/TR/rdf-schema#Datatype"}

What this datatype defines is a class of literals that represents the full version of an US phone number. I will explain how such a datatype is used to validate RDF data records below.

Creating OWL Properties

Properties are different from classes and datatypes. They are represented as functions in the RDF/Clojure serialization. I created another simple macro called gen-property to generate these OWL properties:

(defmacro gen-property
  "Create a new property that represents a OWL property.
     [name] is the name of the property/function to create. This is the name that will be
            used in your Clojure code.
     [:uri] this is the URI of the property to create
     [:description] this is the description of the property to create
     [:domain] this is the domain of the URI to create. The domain is represented by one or multiple
               classes that represent that domain. If there is more than one class that represent the domain
               you can specify the ^intersection-of or the ^union-of meta-data to specify if the classes
               should be interpreted as a union or an intersection of the set of classes.
     [:range] this is the range of the URI to create. The range is represented by one or multiple
               classes that represent that range. If there is more than one class that represent the range
               you can specify the ^intersection-of or the ^union-of meta-data to specify if the classes
               should be interpreted as a union or an intersection of the set of classes.
     [:sub-class-of] one or multiple classes that are super-classes of this class
     [:equivalent-property] one or multiple classes that are equivalent classes of this class
     [:is-object-property] true if the property being created is an object property
     [:is-datatype-property] true if the property being created is a datatype property
     [:is-annotation-property] true if the property being created is an annotation property
     [:cardinality] cardinality of the property"
  [name &amp; {:keys [uri
                  label
                  description
                  domain
                  range
                  sub-property-of
                  equivalent-property
                  is-object-property
                  is-datatype-property
                  is-annotation-property
                  cardinality]}]
  (let [vals (gensym "label-")
        docstring (if description
                    (str description ".\n [" vals "] is the preferred label to specify.")
                    (str ""))
        type (if is-object-property
               #'owl:+object-property
               (if is-annotation-property
                 #'owl:+annotation-property
                 #'owl:+datatype-property))
        metadata (merge (if uri {#'rdf.core/uri uri})
                        (if type {#'rdf:type type})
                        (if label {#'iron:pref-label label})
                        (if description {#'iron:description description})
                        (if range {#'rdfs:range range})
                        (if domain {#'rdfs:domain domain})
                        (if cardinality {#'owl:cardinality cardinality}))]
     `(defn ~(with-meta name metadata)
        ~(str docstring)
        [~vals]
        (rdf.property/validate-property #'~name ~vals))))

Note that this macro currently only accommodates a subset of the OWL language. For example, there is no way to use the macro to specify cardinality, etc. I only created what was required for writing this blog post.

You can then use this macro to create new properties like this:

(gen-property foo:phone
              :is-datatype-property true
              :label "phone number"
              :uri "http://purl.org/ontology/foo#phone"
              :range *full-us-phone-number
              :domain #'owl:+thing
              :cardinality 1)

(gen-property foo:knows
              :is-object-property true
              :label "a person that knows another person"
              :uri "http://purl.org/ontology/foo#knows"
              :range #'umbel.ref/umbel-rc:+person
              :domain #'umbel.ref/umbel-rc:+person) Some other Classes, Datatypes and Properties

So, here is the list of classes, datatypes and properties that will be used later in this blog post for demonstrating how validation occurs in such a framework:

(in-ns 'rdf.core)
(defn uri
  [s]
  (try
    (URI. #^String s)
    (catch Exception e
      (throw (IllegalStateException. (str "Invalid URI: \"" s "\""))))))

(defn datatype
  [s]
  (if (var? s)
    (if (not= (get @s #'ontologies.core/rdf:type) "http://www.w3.org/TR/rdf-schema#Datatype")
      (throw (IllegalStateException. (str "Provided value for datatype is not a datatype: \"" s "\""))))
    (throw (IllegalStateException. (str "Provided value for datatype is not a datatype: \"" s "\"")))))

(in-ns 'ontologies.core)

(gen-property iron:pref-label
              :uri "http://purl.org/ontology/iron#prefLabel"
              :label "Preferred label"
              :description "Preferred label for describing a resource"
              :domain #'owl:+thing
              :range #'rdfs:*literal
              :is-datatype-property true)

(def owl:+thing
  "The class of OWL individuals."
  {#'uri "http://www.w3.org/2002/07/owl#Thing"
   #'rdf:type #'rdfs:+class
   #'rdfs:label "Thing"
   #'rdfs:comment "The class of OWL individuals."})

(gen-datatype xsd:*string
              :uri "http://www.w3.org/2001/XMLSchema#string"
              :docstring "Datatypes that represents all the XSD strings") Concluding with Ontologies

Ontologies are easy to write in RDF/Clojure. There is a simple set of macros that can be used to help create the ontology classes, properties and datatypes. However, in the future I am anticipating to create a library that would use the OWLAPI to take any OWL ontology and to serialize it using these rules. The output could be Clojure code like this, or JAR libraries. Additionally, some investigation will be done to use more Clojure idiomatic projects like Phil Lord’s Tawny-OWL project.

RDF Data Instantiation Using Clojure Code

Now that we have the classes, datatypes and properties defined in our Clojure application, we can start defining data records like this:

(def valid-record (r {uri "http://foo-bar.com/test/"
                      rdf:type owl:+thing
                      foo:phone ["1-421-353-9057"]
                      iron:pref-label {value "Test cardinality validation"
                                       lang "en"
                                       datatype xsd:*string}})) Data Validation

Now that we have all of the ontologies defined in our Clojure application, we can start to define records. Let’s start with a record called valid-record that describes something with a phone number and a preferred label. The data is there and available to you. Now, what if I would like to do a bit more than this, what if I would like to validate it?

Validating such a record is as easy as evaluating it. What does that mean? It means that each value of the map that describes the record will be evaluated by Clojure. Since each key refers to a function, then evaluating each value means that we evaluate the function and use the value as specified by the description of the record. Then we iterate over the whole map to validate all of the triples.

To perform this kind of process, we can create a validate-resource function that looks like:

(defn validate-resource [resource]
  (doseq [[property value] resource]
    (do (println (str "validating resource property: " property))
    (if (fn? @property)
      (@property value)))))

You can use it like this:

(validate-resource valid-record)

If no exceptions are thrown, then the record is considered valid according to the ontology specifications. Easy, no? Now let’s take a look at how this works.

If you check the gen-property macro, you will notice that every time a function is evaluated, the #'rdf.property/validate-property function is called. What this function does is to perform the validation of the property given the specified value(s). The validation is done according to the description of the property in the ontology specification. Such a validate-property looks like:

(defn validate-property
  "Validate that the values of the property are valid according to the description of that property
   [property] should be the reference to the function, like #'foo-phone
   [values] are the actual values of that property"
  [property values]
  (do
    (validate-owl-cardinality property values)
    (validate-rdfs-range property values)))

So what it does is to run a series of other functions to validate different characteristics of a property. For this blog post, we demonstrate how the following characteristics are being validated:

  1. Cardinality of a property
  2. URI validation
  3. Datatype validation
  4. Range validation when the range is a class.
Cardinality Validation

Validating the cardinality of a property means that we check if the number of values of a given property is as specified in the ontology. In this example, we validate the exact cardinality of a property. It could be extended to validate the maximum and minimum cardinalities as well.

The function that validates the cardinality is the validate-owl-cardinality function that is defined as:

(defn validate-owl-cardinality
  [property values]
  (doseq [[meta-key meta-val] (seq (meta property))]
    ; Only validate if there is a owl/cardinality property defined in the metadata
    (if (= meta-key #'ontologies.core/owl:cardinality)
      ; If the value is a string, a var or a map, we check if the cardinality is 1
      (if (or (string? values) (map? values) (var? values))
        (if (not= meta-val 1)
          (throw (IllegalStateException.
                  (format "CARDINALITY VALIDATION ERROR: property %s has 1 values and was expecting %d values" property meta-val))))
        ; If the value is an array, we validate the expected cardinality
        (if (not= (count values) meta-val )
          (throw (IllegalStateException.
                  (format "CARDINALITY VALIDATION ERROR: property %s has %d values and was expecting %d values" property (count values) meta-val))))))))

For each property, it checks to see if the owl:cardinality property is defined. If it is, then it makes sure that the number of values for that property is valid according to what is defined in the ontology. If there is a mismatch, then the validation function will throw an exception and the validation process will stop.

Here is an example of a record that has a cardinality validation error as defined by the property (see the description of the property below):

(def card-validation-test (r {uri "http://foo-bar.com/test/"
                              rdf:type owl:+thing
                              foo:phone ["1-421-353-9057" "(1)-(412)-342-3246"]
                              iron:pref-label {value "Test cardinality validation"
                                               lang "en"
                                               datatype xsd:*string}})) user> (validate-resource card-validation-test)
IllegalStateException CARDINALITY VALIDATION ERROR: property #'dataset-test.core/foo:phone has 2 values and was expecting 1 values  rdf.property/validate-owl-cardinality (property.clj:36) URI Validation

Everything you define in RDF/Clojure has a URI. However, not every string is a valid URI. All of the URIs you may define can be validated as well. When you define a URI, you use the #'rdf.core/uri function to specify the URI. That function is defined as:

(defn uri
  [s]
  (try
    (URI. #^String s)
    (catch Exception e
      (throw (IllegalStateException. (str "Invalid URI: \"" s "\""))))))

As you can see, we are using the java.net.URI function to validate the URI you are defining for your records/classes/properties/datatypes. If you make a mistake when writing a URI, then a validation error will be thrown and the validation process will stop.

Here is an example of a record that has an invalid URI:

(def uri-validation-test (r {uri "-http://foo-bar.com/test/"
                             rdf:type owl:+thing
                             foo:phone "1-421-353-9057"
                             iron:pref-label {value "Test URI validation"
                                              lang "en"
                                              datatype xsd:*string}})) user> (validate-resource uri-validation-test)
IllegalStateException Invalid URI: "-http://foo-bar.com/test/"  rdf.core/uri (core.clj:16) Datatype Validation

In OWL, a datatype property is used to refer to literal values that belong to classes of literals (datatypes classes). A datatype class is a class that represents all the literals that belong to that class of literal values as defined by the datatype. For example, the *full-us-phone-number datatype we described above defines the class of all the literals that are full US phone numbers.

Validating the value of a property according to its datatype means that we make sure that the literal value(s) belong to that datatype. Most of the time, people will use the XSD datatypes. If custom datatypes are created, then they will be based on one of the XSD datatypes, and a regex pattern will be defined to specify how the literal should be constructed.

(defn validate-rdfs-range
  [property values]
  (do
    ; If the value is a map, then validate the "value", "lang" and "datatype" assertions
    (if (map? values)
      (validate-map-properties values))
    (doseq [[meta-key ranges] (seq (meta property))]
      ; make sure a range is defined for this property
      (if (= meta-key #'ontologies.core/rdfs:range)
        (let [ranges (if (vector? ranges)
                       ranges
                       ^:intersection-of [ranges])]
          (if (true? (:intersection-of (meta ranges)))
            ; consider that all the values of the range is a intersection-of
            (doseq [range ranges]<br />
              (if (is-datatype-property? property)
                ; we are checking the range of a datatype property
                ; @TODO here we have to change that portion to call a function that will do the validation
                ;       according to the existing XSD types, or any custom datatype based on these core
                ;       XSD datatypes. Just like the DVT (Dataset Validation Tool)
                ;
                ;       For now, we simply test using a datatype that has a pattern defined.
                (let [pattern (get range #'ontologies.core/xsp:pattern)]
                  (if pattern
                    ; a validation pattern has been defined for this value
                    (if (vector? values)
                      ; Validate all the values of the property according to this Datatype
                      (doseq [v values]
                        (validate-range-pattern v pattern ranges))
                      ; Validate the value according to the datatype
                      (validate-range-pattern values pattern ranges))))
                ; we are checking the range of an object property
                (if (vector? values)
                  (doseq [v values]
                    (validate-range-object v range property))
                  (validate-range-object values range property))))
            ; consider that all the values of the range is an union-of
            (println "@TODO Ranges union validation")))))))

(defn- validate-range-pattern
  [v pattern range]
  (if (string? v)
    (if (nil? (re-seq (java.util.regex.Pattern/compile pattern) v))
      (throw (IllegalStateException.
              (format "Value \"%s\" invalid according to the definition of the datatype \"%s\""  v range))))
    (if (and (map? v) (nil? (validate-map-properties v)))
      (if (nil? (re-seq (java.util.regex.Pattern/compile pattern) (get v 'value)))
        (throw (IllegalStateException.
                (format "Value \"%s\" invalid according to the definition of the datatype \"%s\""  v range)))))))

(defn- validate-map-properties
  [m]
  (doseq [[p v] m]
        (if (fn? @p)
          (@p v))))

What this function does is to validate the range of a property. It checks what kind of values that exist for the input property according to the RDF/Clojure specification (is it a string, a map, an array, a var, etc.?). Then it checks if the property is an object property or a datatype property. If it is a datatype property, then it checks if a range has been defined for it. If it does, then it validates the value(s) according to the datatype defined in the range of the property.

Here is an example of a few records that have different datatype validation errors:

(def datatype-validation-test (r {uri "http://foo-bar.com/test/"
                                  rdf:type owl:+thing
                                  foo:phone "1-421-353-90573"
                                  iron:pref-label {value "Test cardinality validation"
                                                   lang "en"
                                                   datatype xsd:*string}}))
(def datatype-validation-test-2 (r {uri "http://foo-bar.com/test/"
                                  rdf:type owl:+thing
                                  foo:phone "1-421-353-9057"
                                  iron:pref-label {value "Test datatype validation"
                                                   lang "en"
                                                   datatype "not-a-datatype"}}))

(def xsd:*string-not-a-datatype)

(def datatype-validation-test-3 (r {uri "http://foo-bar.com/test/"
                                    rdf:type owl:+thing
                                    foo:phone "1-421-353-9057"
                                    iron:pref-label {value "Test datatype validation"
                                                     lang "en"
                                                     datatype xsd:*string-not-a-datatype}}))

(def datatype-validation-test-4 (r {uri "http://foo-bar.com/test/"
                                    rdf:type owl:+thing
                                    foo:phone [{value "1-421-353-9057"
                                                datatype xsd:<em>string-not-a-datatype}]
                                    iron:pref-label {value "Test datatype validation"
                                                     lang "en"
                                                     datatype xsd:</em>string}})) user> (validate-resource datatype-validation-test)
IllegalStateException Value "1-421-353-90573" invalid according to the definition of the datatype "[{#'ontologies.core/xsp:pattern "^[0-9]{1}-[0-9]{3}-[0-9]{3}-[0-9]{4}$", #'rdf.core/uri "http://purl.org/ontology/foo#phone-number", #'ontologies.core/rdf:type "http://www.w3.org/TR/rdf-schema#Datatype"}]"  rdf.property/validate-range-pattern (property.clj:150)

user> (validate-resource datatype-validation-test-2)
IllegalStateException Provided value for datatype is not a datatype: "not-a-datatype"  rdf.core/datatype (core.clj:31)

user> (validate-resource datatype-validation-test-3)
IllegalStateException Provided value for datatype is not a datatype: "#'dataset-test.core/xsd:*string-not-a-datatype"  rdf.core/datatype (core.clj:30)

user> (validate-resource datatype-validation-test-4)
IllegalStateException Provided value for datatype is not a datatype: "#'dataset-test.core/xsd:*string-not-a-datatype"  rdf.core/datatype (core.clj:30)

As you can see, the validate-rdfs-range is incomplete regarding datatype validation. I am still updating this function to make sure that we validate all the existing XSD datatypes. Then we have to better validate the custom datatypes to make sure that we consider their xsp:base type, etc. The code that should be created is similar to the one I created for the Data Validation Tool (which is written in PHP).

Range validation when the range is a class

Finally, let’s shows how the range of an object property can be validated. Validating the range of an object property means that we make sure that the record referenced by the object property belongs to the class of the range of the property.

For example, consider a property foo:knows that has a range that specifies that all the values of foo:knows needs to belong to the class umbel-rc:+person. This means that all of the values defined for the foo:knows property for any record needs to refer to a record that is of type umbel-rc:+person. If it is not the case, then there is a validation error.

Here is an example of a record where the foo:knows property is not properly used:

(def wrench (r {uri "http://foo-bar.com/test/bob"
               rdf:type umbel.ref/umbel-rc:+product
               iron:pref-label "The biggest wrench ever"}))

(def object-range-validation-test (r {uri "http://foo-bar.com/test/bob"
                                      rdf:type umbel.ref/umbel-rc:+person
                                      foo:knows wrench
                                      iron:pref-label {value "Test object range validation"
                                                       lang "en"
                                                       datatype xsd:*string}}))

Remember we defined the foo:knows property with the range of umbel-rc:+person. However, in the example, the reference is to a wrench record that is of type umbel-rc:+product. Thus, we get a validation error:

user> (validate-resource object-range-validation-test)
IllegalStateException The resource "http://umbel.org/umbel/rc/Product" referenced by the property "#'dataset-test.core/foo:knows" does not belong to the class "#'umbel.ref/umbel-rc:+person" as defined by the range of the property  rdf.property/validate-range-object (property.clj:142)

The function that validates the ranges of the object properties is defined as:

(defn- validate-range-object
  [r range property]
  (do (println range)
  (let [r (if (var? r)
            (deref r)
            (if (map? r)
              (r)
              (if (string? r)
                ; @TODO get the resource's description from a dataset index
                ({}))))
        uri (get (deref (get r #'ontologies.core/rdf:type)) #'rdf.core/uri)
        uri-ending (do (println uri) (if (> (.lastIndexOf uri "/") -1)
                     (subs uri (inc (.lastIndexOf uri "/")))
                     (str "")))
        super-classes (try
                        (read-string (:body (clj-http.client/get (str "http://umbel.org/ws/super-classes/" uri-ending)
                                                                 {:headers {"Accept" "application/clojure"}
                                                                  :throw-exceptions false})))
                        (catch Exception e
                          (eval nil)))
        range-uri (get @range #'rdf.core/uri)]
    (if-not (some #{range-uri} super-classes)
      (throw (IllegalStateException. (str "The resource \"" uri "\" referenced by the property \"" property "\" does not belong to the class \"" range "\" as defined by the range of the property" )))))))

Normally, this kind validation should be done using the descriptions of the loaded ontologies. However, for the benefit of this blog post, I used a different way to perform this validation. I purposefully used some UMBEL Reference Concepts as the type of the records I described. Then the object range validation function leverages the UMBEL super-classes web service endpoint to check get the super-classes of a given class.

So what this function does is to check the type of the record(s) referenced by the foo:knows property. Then it checks the type of these record(s). What needs to be validated is whether the type(s) of the referenced record is the same, or is included, in the class defined in the range of the foo:knows property.

In our example, the range is #'umbel-rc:+person. This means that the foo:knows property can only refer to umbel-rc:+person records. In the example where we have a validation error, the type of the wrench record is umbel-rc:+product. What the validation function does is to get the list of all the super classes of the umbel-rc:+product class, and check if it is a sub-class of the umbel-rc:+person class. In this case, it is not, thus an error is thrown.

What is interesting with this example is the UMBEL super-classes web service endpoint does return the list of super classes as Clojure code. Then we use the read-string function to evaluate the list before manipulating it as if it was part of the application’s code.

Conclusion

What is elegant with this kind RDF/Clojure serialization is that the validation of RDF data is the same as evaluating the underlying code (Data as Code). If the data is invalid, then exceptions are thrown and the validation process aborts.

One thing that I yet have to investigate with such a RDF/Clojure serialization is how the semantics of the properties, classes and datatypes could be embedded into the RDF/Clojure records such that we end up with stateful RDF records that embed their own semantic at a specific point in time. This leverage would mean that even if an ontology changes in the future, the records will still be valid according to the original ontology that was used to describe them at a specific point in time (when they got written, when they got emitted by a web service endpoint, etc.).

Also, as some of my readers pointed out with my previous blog post about this subject, the fact that I use vars to serialize the RDF triples means that the serialization won’t produce valid ClojureScript code since vars doesn’t exists in ClojureScript. Paul Gearon was proposing to use keywords as the key instead of vars. Then to get the same effect as with the vars, to use a lookup index to call the functions. This avenue will be investigated as well and should be the topic of a future blog post about this RDF/Clojure serialization.

  1. Data as Code. Code as Data: Tighther Semantic Web Development Using Clojure
  2. Investigating Options to Serialize RDF data as Clojure Code
  3. Revision of Serializing RDF Data as Clojure Code Specification
  4. Revision of Serializing RDF Data as Clojure Code Specification

"Fonds & Bonds" DC-2014 pre-conference full day archives workshop

Tue, 07/01/2014 - 23:59

Categories:

RDF
2014-07-01, "Fonds & Bonds: Archival Metadata, Tools and Identity Management" is a full-day pre-conference workshop at DC-2014 on archival metadata, tools, and standards. The workshop is being hosted by and at the Harry Ransom Center, University of Texas, Austin and will feature top experts in the archives field. The program includes presentations on archival data in leading discovery services (e.g., Europeana), an update on revisions in process to EAD, the latest on tools popular with working archivists including ArchivesSpace, RAMP, and xEAC, and leading developments addressing archives-related identities (Find & Connect, ISNI, SNAC). More information on the workshop can be found at http://dcevents.dublincore.org/IntConf/index/pages/view/2014-archives.

Registration for DC-2014 now open!

Tue, 07/01/2014 - 23:59

Categories:

RDF
2014-07-01, Online registration for DC-2014 is now open at http://purl.org/dcevents/dc-2014/register. The conference and DCMI Annual Meeting is scheduled for 8-11 October in Austin, Texas. This year's theme is "Metadata Intersections: Bridging the Archipelago of Cultural Memory". Metadata is fundamental in enabling ubiquitous access to cultural and scientific resources through galleries, libraries, archives and museums (GLAM). While fundamental, GLAM traditions in documentation and organization lead to significant differences in both their languages of description and domain practices. DC-2014 will explore the role of metadata in spanning the archipelago of siloed cultural memory in an emerging context of linked access to data repositories as well as repositories of cultural artifacts. More information about the conference can be found at http://purl.org/dcevents/dc-2014.

Geodata 2014

Tue, 07/01/2014 - 00:39

Categories:

RDF

A few weeks ago I attended the 2014 Geodata Workshop. Like the previous Geodata workshop in 2011, this workshop was focused on discussing policies and techniques to improve inter-agency geographic data integration and data citation. While there have been advances in recommendations for data citation and geodata integration since the last Geodata workshop, I felt the mood of the attendees indicated that we are now in much the same place we were in 2011. There was strong consensus as to the importance of data citation and integration, but a feeling that no one is really doing it at scale, the tools aren’t where we need them to be, and the agency policies are not yet at a state to successfully drive widespread adoption. Despite these hurdles this is a community that is clearly excited and willing to take the first steps towards making widespread data integration and data citation a reality in the geodata community.

Meanwhile, in the trenches…

I had several conversations with attendees who represent publishers of oceanographic vocabularies. Many of these vocabularies have been publicly available for several years, but have been traditionally been 3-star open data (publicly available in a non-proprietary machine-readable format, no links to external vocabularies). These publishers are excited about upgrading their vocabulary services to be 5-star open data (use open W3C standards such as RDF/SPARQL, identify things with resolvable URIs, link to other people’s data) because they see a major benefit in being able refer to the authoritative source for a term or identified resource that is related to their vocabulary but for which they are not the authoritative source. This is a great example of a group that has already identified a specific real-world need and benefit from integration and who are actively laying the groundwork that will enable that integration to be successful. This group was enthusiastic about cross-linking their vocabluaries and I have no doubt their efforts will be viewed as a data integration success at the next Geodata workshop.

Where we can help…

As a result of these discussions our lab is starting a Linked Vocabulary API effort whose goal is to provide a Linked Data API configuration specialized to the purpose of publishing SKOS vocabularies. Our goal is to develop a configuration that makes bootstraping a RESTful linked data API to a SKOS vocabulary simple and accessible for the broad scientific community.  This effort is based on work we previously did for the CMSPV project.

In conclusion

What I will remember most from Geodata 2014 is the excitment members of the community had towards adopting new technologies and techniques and making widespread data integration and citation a reality. Where conventions have yet to be established the community is willing to take the first steps and establish best practices.  Where policies have yet to be formalized the community is ready to work with policy makers to ensure clear and helpful policies are established .  Whenever the next Geodata workshop is held, I am confident that it’s narrative will be full of success stories that began at the 2014 workshop.

Read Write Web — Q2 Summary — 2014

Tue, 07/01/2014 - 00:26

Categories:

RDF
Summary

WWW 2014 kicked off in Korea with some interesting material presented.  Two that caught my eye were, “Trust in Social Computing” and “The Mobile Semantic Web“.  The first one may be particularly related to our merger with the Trust CG last year.  Feel free to browse the slides and much more that are available online.

Linked data have announced a Data on the Web Best Practices Working Group, with the publication of The Use Case & Requirements document, as a first milestone.  MIT Decentralized Information Group have announced a interesting project HTTPa (HTTP with Accountability), details here.

The RWW CG has had some interesting discussions with a slight uptick in mail list activity,  from about 1 post per day last quarter, to 2 posts per day this quarter.  I’ll try and summarize some of the more interesting topics below.

Communications and Outreach

Work has continued with the The Linked Data Platform Working Group who are getting close to publishing their final spec.  Of particular interest is some work that is starting related to Access Control, with the outline of a charter for an Access Control Working Group.

The Web Payments Community Group have done some work on tying credentials to identity and allowing reading and writing to those documents.  A blog post describing the pros and cons with a full demo is available here.

 

Community Group

A collection of very interesting work was announced by Thomas Bergwinkl, codenamed, LDApp.  It is a full LD stack in javascript, including, Universal Access Control, an extension to RDF Interfaces, JSON LD support, and much more!

Apart from this, lots of libraries received work, including, ldp4j (java), GOLD (go) and node-rdf (node-js).

Applications

Decentralized identity and authentication was showcased, using a single access controlled image, which enabled you to login with a huge variety of web 1.0/2.0/3.0 methods.  See the demo here.

A chrome extension to webizen was released.  YouID for android and IOS was also put live and a boilerplate for creating RWW apps was also built, this quarter.  Additionally, I have done some more work on creating test currency for the RWW.

 

 

Last but not Least…

Best wishes to Tim Berners-Lee, the person who started the whole Read Write Web idea, in his marriage to Rosemary Leith. Congratulations!

GrowJSON

Mon, 06/30/2014 - 21:17

Categories:

RDF

I have an idea that I think is very important but I haven’t yet polished to the point where I’m comfortable sharing it. I’m going to share it anyway, unpolished, because I think it’s that useful.

So here I am, handing you a dull, gray stone, and I’m saying there’s a diamond inside. Maybe even a dilithium crystal. My hope is that a few experts will see what I see and help me safely extract it. Or maybe someone has already extracted it, and they can just show me.

The problem I’m trying to solve is at the core of decentralized (or loosely-coupled) systems. When you have an overall system (like the Web) composed of many subsystems which are managed on their own authority (websites), how can you add new features to the system without someone coordinating the changes?

RDF offers a solution to this, but it turns out to be pretty hard to put into practice. As I was thinking about how to make that easier, I realized my solution works independently of the rest of RDF. It can be applied to JSON, XML, or whatever. For now, I’m going to start with JSON.

Consider two on-the-web temperature sensors:

> GET /temp HTTP/1.1
> Host: paris.example.org
> Accept: text/json
>
< HTTP/1.1 200 OK
< Content-Type: text/json
<
{"temp":35.2} > GET /temp HTTP/1.1
> Host: berkeley.example.org
> Accept: text/json
>
< HTTP/1.1 200 OK
< Content-Type: text/json
<
{"temp":35.2}

The careful human reader will immediately wonder whether these temperatures are in Celcius or Fahrenheit, or if maybe the first is in Celcius and the second Fahrenheit. This is a trivial example of a much deeper problem.

Here’s the first sketch of my solution:

> GET /temp HTTP/1.1
> Host: paris.example.org
> Accept: text/json
>
< HTTP/1.1 200 OK
< Content-Type: text/json
<
[
{"GrowJSONVersion": 0.1,
"defs": {
"temp": "The temperature in degrees Fahrenheit as measured by a sensor and expressed as a JSON number"
},
{"temp":35.2}
]
> GET /temp HTTP/1.1
> Host: berkeley.example.org
> Accept: text/json
>
< HTTP/1.1 200 OK
< Content-Type: text/json
<
[
{"GrowJSONVersion": 0.1,
"defs": {
"temp": "The temperature in degrees Fahrenheit as measured by a sensor and expressed as a JSON number"
},
{"temp":35.2}
]

I know it looks ugly, but now it’s clear that both readings are in Fahrenheit.

My proposal is that much like some data-consuming systems do schema validation now, GrowJSON data-consuming systems would actually look for that exact definition string.

This way, if a third sensor came on line:

> GET /temp HTTP/1.1
> Host: doha.example.org
> Accept: text/json
>
< HTTP/1.1 200 OK
< Content-Type: text/json
<
[
{"GrowJSONVersion": 0.1,
"defs": {
"temp": "The temperature in degrees Celcius as measured by a sensor and expressed as a JSON number"
},
{"temp":35.2}
]

the software could automatically determine that it does not contain data in the format it was expecting. In this case, a human could easily read the definition and make the software handle both formats.

That’s the essence of the idea. Any place you might have ambiguity or a naming collision in your JSON, instead use natural language definitions that are detailed enough that (1) two people are very unlikely to chose the same text, and (2) if they did, they’re extremely likely to have meant the same thing, and while we’re at it (3) will help people implement code to handle it.

I see you shaking your head in disbelief, confusion, or possibly disgust. Let me try answering a few questions:

Question: Are you really suggesting every JSON document would include complete documentation of all the fields used in that JSON document?

Conceptually, yes, but in practice we’d want to have an “import” mechanism, allowing those definitions to be in another file or Web Resource. That might look something like:

> GET /temp HTTP/1.1
> Host: paris.example.org
> Accept: text/json
>
< HTTP/1.1 200 OK
< Content-Type: text/json
<
[
{"GrowJSONVersion": 0.1}
{"import": "http://example.org/schema",
"requireSHA256": "7998bb7d2ff3cfa2666016ea0cd7a379b42eb5b0cebbb1142d8f086efaccfbc6",
},
{"temp":35.2}
] > GET /schema HTTP/1.1
> Host: example.org
> Accept: text/json
>
< HTTP/1.1 200 OK
< Content-Type: text/json
<
[
{"GrowJSONVersion": 0.1,
"defs": {
"temp": "The temperature in degrees Fahrenheit as measured by a sensor and expressed as a JSON number"
}
] Question: Would that break if you didn’t have a working Internet connection?

No, by including the SHA we make it clear the bytes aren’t allowed to change. So the data-consumer can actually hard-code the results of retrieval obtained at build time.

Question: Would the data-consumer have to copy the definition without changing one letter?

Yes, because the machines don’t know which letters might be important. In practice the person programming the data-consumer could do the same kind of import, referring to the same frozen schema on the Web, if they want to. Or they can just cut-and-paste the definitions they are using.

Question: Would the object keys still have to match?

No, only the definitions. If the Berkeley sensor used tmp instead of temp, the consumer would still be able to understand it just the same.

Question: Is that documentation string just plaintext?

I’m not sure yet. I wish markdown were properly standardized, but it’s not. The main kind of formatting I want in the definitions is links to other terms defined in the same document. Something like these [[term]] expressions:

{"GrowJSONVersion": 0.1,
"defs": {
"temp": "The temperature in degrees Fahrenheit as measured by a sensor at the current [[location]] and expressed as a JSON number"
"location": "The place where the temperature reading [[temp]] was taken, expressed as a JSON array of two JSON numbers, being the longitude and latitude respectively, expressed as per GRS80 (as adopted by the IUGG in Canberra, December 1979)"
}

As I’ve been playing around with this, I keep finding good documentation strings include links to related object keys (properties), and I want to move the names of the keys outside the normal text, since they’re supposed to be able to change without changing the meaning.

Question: Can I fix the wording in some definition I wrote?

Yes, clearly that has to be supported. It would be done by keeping around the older text as an old version. As long as the meaning didn’t change, that’s okay.

Question: Does this have to be in English?

No. There can be multiple languages available, just like having old versions available. If any one of them matches, it counts as a match.

 


New UMBEL Web Services

Mon, 06/30/2014 - 18:46

Categories:

RDF

I am happy to announce the immediate availability of a brand new UMBEL website and a new set of eight UMBEL web services.

UMBEL (Upper Mapping and Binding Exchange Layer) is a general reference structure of 28,000 concepts, which provides a scaffolding to link and interoperate other datasets and domain vocabularies. This project is now six years old.

I would recommend that your read Mike’s blog post about this new release if you want more background information about UMBEL and to have a better understanding of how it can help you integrate, manage, publish and reason over your data.

In this blog post, I will focus on the technical aspects of this new web site and the new set of web service endpoints.

Toward a Better Web Experience

The Web is changing fast. Techniques for developing web sites are constantly and quickly evolving. People uses all kind of devices with different sizes of screens to consume Web content. Websites are more and more responsive by their clever architecture design, and their simpler user interfaces. This is the kind of website we wanted to create for the new UMBEL website.

Clojure Web Service Endpoints at the Core

The core of the new UMBEL website are the new web services. As soon as you are performing a search, or looking at the description of a reference concept or a super type, your browser is making a series of asynchronous queries to the UMBEL web service endpoints.

The average query time is about 60 milliseconds for any of the web service query. This means that a web page is fully loaded within 300 to 500 milliseconds where most of the time is spent downloading the web files (the JavaScript, CSS, HTML and image files) and not querying the web service endpoints. Bearing in mind that the website currently run on a small server with a single core and 1.8G of RAM, these are really good performance figures.

We are initially releasing 8 web service endpoints (with more to follow). They have been created to help developers quickly start using the reference structure without having to download and deploy the entire structure on their own infrastructure. The 8 web services are:

  1. Search concept
  2. Get concept
  3. Get super type
  4. Get narrower concepts
  5. Get broader concepts
  6. Get sub-classes
  7. Get super-classes
  8. Degree

All these web services are calculating the results at runtime. For example, if you want to find the degree between two reference concepts, then the degree is calculated at runtime. It is the same for all the web services that does inferencing like the Get narrower concepts or Get broader concepts web service endpoints.

What we did to get these excellent performance measures is to use Clojure as the programming language and framework to develop the new web service endpoints. Then we define the UMBEL structure as Clojure code.

Each web service endpoint is comprised of simple pure functions that perform calculations on the UMBEL graph of 28 000 nodes. None of the functions are more than 30 lines of code (per endpoint) which greatly simplifies their creation, debugging, maintenance and optimization. Then we use contributed libraries such as Ring and Compojure to manage the creation of the web service endpoints, and Clucy/Lucene for the search engine.

The web services can easily be scaled horizontally since everything is self contained in a single WAR file that can be deployed on new servers in a few clicks. Then the new servers can participate into a cluster of UMBEL web service servers.

Another advantage of using this technology stack for creating the UMBEL web service endpoints is that UMBEL is not just a reference structure nor a set of web service endpoints. It is also a programming API that could be used in any Clojure or Java applications. The UMBEL reference structure, along with all the functions that uses it will be available as a JAR file. That way, UMBEL become portable. It could be used as a library in any JVM application without requiring it to send queries to external web services, or to create complex stacks to deploy and use the UMBEL reference structure in different applications.

Bootstrap as the HTML/CSS/JavaScript Framework

The previous UMBEL website was using Drupal 6. For the ones that were using it, it was sometimes clunky, less responsive and more heavy weight. The problem is that we were not requiring a full CMS system for developing a simple UMBEL website that is only informational.

We wanted a responsive experience for the UMBEL user. We wanted to have the fastest experience possible and we wanted to have this experience on any kind of device: desktop computers, tables, mobile phones, etc.

This is why we choose to develop the new UMBEL website using Twitter’s Bootstrap HTML, CSS and JavaScript framework. This is a framework that anybody can use to quickly create simple, beautiful and modern websites. It uses a grid system to create responsive user interfaces on any kind of device (screen size). That way, UMBEL users have the same kind of experience whether they are using a normal desktop screen, a tablet of their mobile phone.

This choice enabled us to create a simple, modern, nice looking and responsive website for UMBEL.

Introduction to the UMBEL Web Services

Now let’s take the time to introduce each of the UMBEL web service endpoint. The first thing to know is that the UMBEL web service endpoints are free to use, have no usage limits and there is no throttling.

Search Concept Web Service

The Search Web service is used to find UMBEL reference concepts that match a search string. This is the primary tool for finding available concepts in the reference structure. It supports the Lucene query syntax and search queries can be constrained on different fields like the preferred label, alternative labels, descriptions and URI.

Get Concept Web Service

The Get Concept Web service is used to get the full description of a UMBEL Reference Concept. By querying this Web service endpoint, you will get the preferred label, all the alternative labels (namely, the items in the semset), the sub/super classes of the concept, the broader/narrower concepts and the description of that concept.

This is the Web service endpoint that should be used to get the direct relationships with any other reference concept.

Reference concepts descriptions are available as N-Triples, RDF+XML, structJSON or Clojure code.

Get Super Type Web Service

The Get Super Type Web service is used to get the full description of a UMBEL Super Type. By querying this Web service endpoint, you will get the preferred label, all of the alternative labels, the description, and the disjoint super types of a target super type.

Get Narrower Concept Web Service

The Get Narrower Concept Web service is used to get the list of all the narrower concepts of a given reference concept. This processing is done by inference, which means that if A -> B -> C are narrower concepts, then the narrower concepts of A are both B and C, which is what will be returned by the endpoint.

Get Broader Concept Web Service

The Get Broader Concept Web service is used to get the list of all the broader concepts for a given reference concept. This processing is done by inference, which means that if A -> B -> C are broader concepts, then the broader concepts of C are both A and B, which is thus what will be returned by the endpoint.

The broader reference concepts do not include the super type as their top concept (use the Get Super-Class-Of web service endpoint for that).

Get Sub Classes Web Service

The Get Sub Classes Web service is used to get the list of all the sub classes of a given reference concept. This processing is done by inference, which means that if A -> B -> C are sub classes, then the sub classes of A are both B and C, which is what will be returned by the endpoint.

Get Super Classes Web Service

The Get Super Classes Web service is used to get the list of all the super classes of a given reference concept. This processing is done by inference, which means that if A -> B -> C are super classes, then the super classes of C are both A and B, which is what will be returned by the endpoint.

The super classes do include the super types as their top concept (use the Get Super-Class-Of web service endpoint for that).

Degree Web Service

The Degree Web service is used to get the degree (measure of distance) between two UMBEL reference concepts by following the path of a transitive property.

Conclusion

This new website along with these new web service endpoints are still using the UMBEL reference structure version 1.05. However, in the coming month or two, a new version of the reference structure should be released. The structure itself won’t change much except the introduction of a few new reference concepts. But new mechanisms (mostly related to attributes) will be introduced. It will also come with a brand new mapping with external data schemas and data sources such as Schema.org, Wikipedia, etc.

On my side, I will start writing more about UMBEL. New web service endpoints will be released over time. The API available to use, manage and leverage the structure will constantly expand.

On the other side, I will write about how the UMBEL reference structure can be used, how it can be leveraged to integrate data sources, to expend search queries, etc.

AKSW Colloquium “Combination of Topic Modeling and Semantic Web” on Monday, June 30

Thu, 06/26/2014 - 08:12

Categories:

RDF
Combination of Topic Modeling and Semantic Web

On Monday, June 30, at 3.00 p.m. in room Paulinum 702, Michael Röder will present his yearly PhD progress report “Combination of Topic Modeling and Semantic Web”. The presentation addresses the usage of Topic Modeling in the area of Semantic Web. We will focus a use case in which topic models shall be used to recommend similar RDF datasets for a given dataset.

About the AKSW Colloquium

This event is part of a series of events about Semantic Web technology. Please see http://wiki.aksw.org/Colloquium for further information about previous and future events. As always, Bachelor and Master students are able to get points for attendance and there is complimentary coffee and cake after the session.

AKSW Colloquium Guest Talk “SWeeT Web of Heritage” on Wednesday, July 2

Thu, 06/26/2014 - 08:03

Categories:

RDF
SWeeT Web of Heritage

On Wednesday, July 2, at 1.30 p.m. in room P702, T B Dinesh from Janastu, a non-profit organisation providing free open source software, will present the SWeeT Web of Heritage.

Dinesh has a computer science background (Univ Iowa and CWI, NL) and is a founder member of Janastu and International Institute of Art Culture and Democracy in Bangalore, India. Janastu engages in technology research and support for other non-profit, issue based, organizations. IIACD engages in digital humanities, living heritage and community health. We network with pastoral communities developing frameworks for community-managed knowledge, supporting AIDS advocacy programs, post-tsunami rehabilitation, biodiversity and environmental groups, craft communities and open source support for non-profit organizations.

Abstract

Our journey into SWeeTs (“Semantic Web tweeTs”) started with the work on the Re-narration Web which attempts to bridge the gap in web accessibility discourses in addressing the needs of non-literate web users. Heritage Knowledge Bank is an application of this, that we will also discuss, that is being developed for the Indian Digital Heritage project.

About the AKSW Colloquium

This event is part of a series of events about Semantic Web technology. Please see http://wiki.aksw.org/Colloquium for further information about previous and future events. As always, Bachelor and Master students are able to get points for attendance and there is complimentary coffee and cake after the session.