Planet RDF

Subscribe to Planet RDF feed
Updated: 1 hour 37 min ago

Future of Text Analytics: A report from Text Analytics World

Wed, 04/15/2015 - 17:00

Categories:

RDF

When I went to Text Analytics World in San Francisco earlier this month, I was struck at how many of the presenters, particularly consultants, ended their talks describing future directions of text analytics as something that sounded so familiar. They described what would be possible once there's advanced maturity in ontologies, the breaking down of siloes, entity and relationship resolution by multiple methods, and automated linking of it all together into semantic network models of knowledge: flexible exploration of the relevant. They made it sound like a bit of a stretch, almost pie in the sky, but what they briefly described as this destination was curiously similar to what was shown concretely in the last presentation of the conference, my own.

BGP Statement Pattern Counts

Wed, 04/15/2015 - 14:06

Categories:

RDF

A recent developer mailing list inquiry caught our attention a few days ago. The author was wondering, “[can] anyone … give me a recommendation for ‘how big a sane query can get’?” He was wondering, “[can] anyone … give me a recommendation for ‘how big a sane query can get’?” His use case was to apply a SPARQL query against the LOD dataset in order to test for the presence of “patterns for human associations”. There was mention of subgraph sizes in excess of 400K triples, but as it were, his efforts appeared thwarted by much lower limits which his query processor set for the number of statement patterns in a BGP.

Support Issue Workflow

Wed, 04/15/2015 - 14:06

Categories:

RDF

Where Dydra appears, in public, as a strictly remote service, the question occurs —

“If one cannot walk down the hall and knock on a door – or pound on a table, how does one get issues resolved in a timely manner?”

Any claims, that our response completion statistics are well over three nines notwithstanding, a better answer describes how we resolve those issues which do appear in a manner which ensures that 24/7 customer operations remain in service.

One aspect is a transparent process to record and track issues. The second is a well-defined response and resolution process.

Open data and diabetes

Tue, 04/14/2015 - 20:16

Categories:

RDF

In December my daughter was diagnosed with Type 1 diabetes. It was a pretty rough time.

AKSW Colloquium, 13-04-2015, Effective Caching Techniques for Accelerating Pattern Matching Queries

Mon, 04/13/2015 - 09:51

Categories:

RDF

In this colloquium, Claus Stadler will present the paper Effective Caching Techniques for Accelerating Pattern Matching Queries

Abstract: Using caching techniques to improve response time of queries is a proven approach in many contexts. However, it is not well explored for subgraph pattern matching queries, mainly because  of  subtleties  enforced  by  traditional  pattern  matching models.  Indeed,  efficient  caching  can  greatly  impact  the  query answering performance for massive graphs in any query engine whether  it  is  centralized  or  distributed.  This  paper  investigates the capabilities of the newly introduced pattern matching models in graph simulation family for this purpose. We propose a novel caching technique, and show how the results of a query can be used to answer the new similar queries according to the similarity measure  that  is  introduced.  Using  large  real-world  graphs,  we experimentally verify the efficiency of the proposed technique in answering subgraph pattern matching queries

Link to PDF

Walls Have Eyes goes to the Design Museum

Fri, 04/10/2015 - 16:07

Categories:

RDF

We made this initially as a post for a presentation at work, but it doesn’t seem quite right for a work blogpost (though we will do one for that too) but it seems a shame for it not to be public.

The context is

Getting to grips with “semantic” interoperability

Fri, 04/10/2015 - 12:44

Categories:

RDF

Enabling and managing interoperability at the data and the service level is one of the strategic key issues in networked knowledge organization systems (KOSs) and a growing issue in effective data management. But why do we need “semantic” interoperability and how can we achieve it?

Interoperability vs. Integration

The concept of (data) interoperability can best be understood in contrast to (data) integration. While integration refers to a process, where formerly distinct data sources and their representation models are being merged into one newly consolidated data source, the concept of interoperability is defined by a structural separation of knowledge sources and their representation models, but that allows connectivity and interactivity between these sources by deliberately defined overlaps in the representation model. Under circumstances of interoperability data sources are being designed to provide interfaces for connectivity to share and integrate data on top of a common data model, while leaving the original principles of data and knowledge representation intact. Thus, interoperability is an efficient means to improve and ease integration of data and knowledge sources.

Three levels of interoperability

When designing interoperable KOSs it is important to distinguish between structural, syntactic and semantic interoperability (Galinski 2006):

  • Structural interoperability is achieved by representing metadata using a shared data model like the Dublin Core Abstraction Model or RDF (Resource Description Framework).
  • Syntactic interoperability if achieved by serializing data in a shared mark-up language like XML, Turtle or N3.
  • Semantic interoperability is achieved by using a shared terminology or controlled vocabulary to label and classify metadata terms and relations.

Given the fact that metadata standards carry a lot of intrinsic legacy, it is sometimes very difficult to achieve interoperability at all three levels mentioned above. Metadata formats and models are historically grown, they are most of the time a result of community decision processes, often highly formalized for specific functional purposes and most of the time deliberately rigid and difficult to change. Hence it is important to have a clear understanding and documentation of the application profile of a metadata format as a precondition for enabling interoperability at all three levels mentioned above. Semantic Web standards do a really good job in this respect!!

Transforming music data into a PoolParty project

Thu, 04/09/2015 - 08:37

Categories:

RDF
Goal

For the Nolde project it was requested to build a knowledge graph, containing detailed information about the austrian music scene: artists, bands and their music releases. We decided to use PoolParty, since theses entities should be accessible in an editorial workflow. More details about the implementation will be provided in a later blog post.

In the first round I want to share my experiences with the mapping of music data into SKOS. Obviously, LinkedBrainz was the perfect source to collect and transform such data since this is available as RDF/NTriples dumps and even providing a SPARQL endpoint! LinkedBrainz data is modeled using the Music Ontology.

E.g. you can select all mo:MusicArtists with relation to Austria.

I imported LinkedBrainz dump files and imported them into a triple store, together with DBpedia dumps.

With two CONSTRUCT queries, I was able to collect the required data and transform it into SKOS, into a PoolParty compatible format:

Construct Artists

Every matching MusicArtist results in a SKOS concept. The foaf:name is mapped to skos:prefLabel (in German).

As you can see, I used Custom Schema features to provide self-describing metadata on top of pure SKOS features: a MusicBrainz link, a MusicBrainz Id, DBpedia link, homepage…

In addition you can see in the query that also data from DBpedia was collected. In case a owl:sameAs relationship to DBpedia exists, a possible abstract is retrieved. When a DBpedia abstract is available it is mapped to skos:definition.

Construct Releases (mo:SignalGroups) with relations to Artists

Similar to the Artists, a matching SignalGroup results in a SKOS Concept. A skos:related relationship is defined between an Artist and his Releases.

Outcome

The SPARQL construct queries provided ttl files that could by imported directly into PoolParty, resulting in a project, containing nearly 1,000 Artists and 10,000 Releases:

Special talk: Linked Data Quality Assessment and its Application to Societal Progress Measurement

Tue, 04/07/2015 - 16:13

Categories:

RDF
Linked Data Quality Assessment and its Application to Societal Progress Measurement

Abstract: In recent years, the Linked Data (LD) paradigm has emerged as a simple mechanism for employing the Web as a medium for data and knowledge integration where both documents and data are linked. Many different communities on the Internet such as geographic, media, life sciences and government have already adopted these LD principles. In all these use cases utilizing LD, one crippling problem is the underlying data quality. Incomplete, inconsistent or inaccurate data affects the end results gravely, thus making them unreliable. Data quality is commonly conceived as fitness for use, be it for a certain application or use case.  A key challenge is to assess the quality of datasets published on the Web and make this quality information explicit. Assessing data quality is particularly a challenge in LD as the underlying data stems from a set of multiple, autonomous and evolving data sources. Moreover, the dynamic nature of LD makes assessing the quality crucial to measure the accuracy of representing the real-world data. In this thesis, we first unify 18 data quality dimensions and provide a total of 69 metrics for assessment of LD. Next, three different methodologies for linked data quality assessment are evaluated namely (i) user-driven; (ii) crowdsourcing and (iii) semi-atuomated use case driven. Finally, we take into account a domain-specific use case that consumes LD and leverages on data quality. We show the advantages of this semi-automated assessment over the other types of quality assessment methodologies discussed earlier. The Observatory aims at evaluating the impact of research development on the economic and healthcare performance of each country per year. We illustrate the usefulness of LD in this use case and the importance of quality assessment for any data analysis.

Join us!

  • Thursday, 9 April at 1pm, Room P702

Read Write Web — Q1 Summary — 2015

Tue, 03/31/2015 - 19:50

Categories:

RDF
Summary

2015 is to shaping up be the year that standards for reading and writing, and the web in general, start to be put together into, next generation, systems and applications.  Quite a comprehensive review post, contains much of what is being looked forward to.

The Spatial Data on the Web working group was announced and the EU funded Aligned project also kicked off.

Congratulations to the Linked Data Platform working group, who achieved REC status this quarter, after several years of hard work.  Having spent most of the last three month testing various implementations, I’m happy to say it has greatly exceeded my already high expectations.

Communications and Outreach

A number of read write web standards and apps were demoed at the W3C Social Web Working group F2F, hosted by MIT.  This seems to have gone quite well and resulted in the coining of a new term “SoLiD” — Social Linked Data!  Apps based on the Linked Data Platform have been considered as part of the work of this group.

 

Community Group

A relatively quiet quarter in the community group, tho still around 60 posts on our mailing list.  There is much interest on the next round of work that will be done with the LDP working group.  Some work has been done on login and signup web components for WebID, websockets and a relaunch of WebIDRealm.

Applications

Lots of activity on the apps front.  Personally I’ve been working using GOLD, but for also announced was the release of Virtuoso 7.2 for those that like a feature rich enterprise solution.

Making use of the experimental pub sub work with websockets, I’ve started to work on a chat application.  A profile reader and editor allows you to create and change your profile.  I’ve continued to work on a decentralized virtual wallet and props goes out to timbl who in his vanishingly small amounts of spare time has been working on a scheduler app.

Last but not Least…

For those of you that like the web, like documentation, like specs and like academic papers, all four have been wrapped into one neat package with the announcement of linked open research.  It’s a great way to document work and create templates for upstream delivery.  Hover over the menu in the top right and see many more options.  I’m looking forward to using this to try to bridge the gap between the worlds of documentation, the web, and research.

DCMI Webinar: "From 0 to 60 on SPARQL queries in 50 minutes" (Redux)

Mon, 03/30/2015 - 23:59

Categories:

RDF
2015-03-30, This webinar with Ethan Gruber on 13 May provides an introduction to SPARQL, a query language for RDF. Users will gain hands on experience crafting queries, starting simply, but evolving in complexity. These queries will focus on coinage data in the SPARQL endpoint hosted by http://nomisma.org: numismatic concepts defined in a SKOS-based thesaurus and physical specimens from three major museum collections (American Numismatic Society, British Museum, and Münzkabinett of the Staatliche Museen zu Berlin) linked to these concepts. Results generated from these queries in the form of CSV may be imported directly into Google Fusion Tables for immediate visualization in the form of charts and maps. Additional information and free registration is available at http://dublincore.org/resources/training/#2015gruber. Redux: This webinar was first presented as a training session in the LODLAM Training Day at SemTech2014.

National Diet Library of Japan publishes translations of key DCMI specifications

Mon, 03/30/2015 - 23:59

Categories:

RDF
2015-03-30, DCMI is please to announce that the National Diet Library, the sole national library in Japan, has translated the DCMI Metadata Terms and the Singapore Framework for Dublin Core Application Profiles into Japanese. The links to the new Japanese translations, as well as others are available on the DCMI Documents Translation page at http://dublincore.org/resources/translations/index.shtml.

A little stepper motor

Sun, 03/29/2015 - 19:08

Categories:

RDF

I want to make a rotating 3D-printed head-on-a-spring for my

Two AKSW Papers at ESWC 2015

Tue, 03/24/2015 - 12:38

Categories:

RDF

We are very pleased to announce that two of our papers were accepted for presentation as full research papers at ESWC 2015.

Automating RDF Dataset Transformation and Enrichment (Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo, and Jens Lehmann)

With the adoption of RDF across several domains, come growing requirements pertaining to the completeness and quality of RDF datasets. Currently, this problem is most commonly addressed by manually devising means of enriching an input dataset. The few tools that aim at supporting this endeavour usually focus on supporting the manual definition of enrichment pipelines. In this paper, we present a supervised learning approach based on a refinement operator for enriching RDF datasets. We show how we can use exemplary descriptions of enriched resources to generate accurate enrichment pipelines. We evaluate our approach against eight manually defined enrichment pipelines and show that our approach can learn accurate pipelines even when provided with a small number of training examples.

HAWK – Hybrid Question Answering using Linked Data (Ricardo Usbeck, Axel-Cyrille Ngonga Ngomo, Lorenz Bühmann, and Christina Unger)

The decentral architecture behind the Web has led to pieces of information being distributed across data sources with varying structure. Hence, answering complex questions often required combining information from structured and unstructured data sources. We present HAWK, a novel entity search approach for Hybrid Question Answering based on combining Linked Data and textual data. The approach uses predicate-argument representations of questions to derive equivalent combinations of SPARQL query fragments and text queries. These are executed so as to integrate the results of the text queries into SPARQL and thus generate a formal interpretation of the query. We present a thorough evaluation of the framework, including an analysis of the influence of entity annotation tools on the generation process of the hybrid queries and a study of the overall accuracy of the system. Our results show that HAWK achieves 0.68 respectively 0.61 F-measure within the training respectively test phases on the Question Answering over Linked Data (QALD-4) hybrid query benchmark.

Come over to ESWC and enjoy the talks. Best regards, Sherif on behalf of AKSW

AKSW Colloquium, 03-23-2015, Git Triple Store and From CPU bringup to IBM Watson

Mon, 03/23/2015 - 09:58

Categories:

RDF
From CPU bring up to IBM Watson by Kay Müller, visiting researcher, IBM Ireland

Working in a corporate environment like IBM offers many different opportunities to work on the bleeding edge of research and development. In this presentation Kay Müller, who is currently a Software Engineer in the IBM Watson Group, is going to give a brief overview of some of the projects he has been working on in IBM. These projects range from a CPU bring up using VHDL to the design and development of a semantic search framework for the IBM Watson system.

Git Triple Store by Natanael Arndt

In a setup of distributed clients resp. applications with different actors writing on the same knowledge base (KB) a synchronization of distributed copies of the KB, an edit history with provenance information and a management for different versions of the KB in parallel are needed. The aim is to design and construct a Triple Store back end which records any change on triple-level and enables distributed curation of RDF-graphs. This should be achieved by using a distributed revision control system for holding a serialization of the RDF-graph. Natanael Arndt will present the paper “R&Wbase: Git for triples” by Miel Vander Sande et al. published at LDOW2013 as related work. Additionally, he will present his ideas towards a colaboration infrastructure using DVCS for triples.

 

About the AKSW Colloquium

This event is part of a series of events about Semantic Web technology. Please see http://wiki.aksw.org/Colloquium for further information about previous and future events. As always, Bachelor and Master students are able to get points for attendance and there is complimentary coffee and cake after the session.

 

AKSW Colloquium, 03-23-2015, Query Tree Learner and From CPU bringup to IBM Watson

Mon, 03/23/2015 - 09:58

Categories:

RDF
From CPU bring up to IBM Watson by Kay Müller, visiting researcher, IBM Ireland

Working in a corporate environment like IBM offers many different opportunities to work on the bleeding edge of research and development. In this presentation Kay Müller, who is currently a Software Engineer in the IBM Watson Group, is going to give a brief overview of some of the projects he has been working on in IBM. These projects range from a CPU bring up using VHDL to the design and development of a semantic search framework for the IBM Watson system.

Git Triple Store by Natanael Arndt In a setup of distributed clients resp. applications with different actors writing on the same knowledge base (KB) you need Synchronization of distributed copies of the KB, an edit history with provenance information and a management for different versions of the KB in parallel. The aim is to design and construct a Triple Store back end which records any change on Triple level and enables distributed curation of RDF-graphs. This should be achieved by using a distributed revision control system for holding a serialization of the RDF-graph.

Today I will present the paper “R&Wbase: Git for triples” by Miel Vander Sande et al. published at LDOW2013 as related work. Additionally I will present my ideas towards a colaboration infrastructure for a DVCS for triples.

 

About the AKSW Colloquium

This event is part of a series of events about Semantic Web technology. Please see http://wiki.aksw.org/Colloquium for further information about previous and future events. As always, Bachelor and Master students are able to get points for attendance and there is complimentary coffee and cake after the session.