we are happy to announce DL-Learner 1.0.
DL-Learner is a framework containing algorithms for supervised machine learning in RDF and OWL. DL-Learner can use various RDF and OWL serialization formats as well as SPARQL endpoints as input, can connect to most popular OWL reasoners and is easily and flexibly configurable. It extends concepts of Inductive Logic Programming and Relational Learning to the Semantic Web in order to allow powerful data analysis.
DL-Learner is used for data analysis in other tools such as ORE and RDFUnit. Technically, it uses refinement operator based, pattern based and evolutionary techniques for learning on structured data. For a practical example, see http://dl-learner.org/community/carcinogenesis/. It also offers a plugin for Protege, which can give suggestions for axioms to add. DL-Learner is part of the Linked Data Stack – a repository for Linked Data management tools.
We want to thank everyone who helped to create this release, in particular (alphabetically) An Tran, Chris Shellenbarger, Christoph Haase, Daniel Fleischhacker, Didier Cherix, Johanna Völker, Konrad Höffner, Robert Höhndorf, Sebastian Hellmann and Simon Bin. We also acknowledge support by the recently started SAKE project, in which DL-Learner will be applied to event analysis in manufacturing use cases, as well as the GeoKnow and Big Data Europe projects where it is part of the respective platforms.
Lorenz Bühmann, Jens Lehmann and Patrick Westphal
A survey or systematic literature review is a text of a scholarly paper, which includes the current knowledge including substantive findings, as well as theoretical and methodological contributions to a particular topic. Literature reviews use secondary sources, and do not report new or original experimental work .
A systematic review is a literature review focused on a research question, trying to identify, appraise, select and synthesize all high quality research evidence and arguments relevant to that question. Moreover, a literature review is comprehensive, exhaustive and repeatable, that is, the readers can replicate or verify the review.Steps to perform a survey
Select two independent reviewers
Look for related/existing surveys
If it exists, see how long back it was done. If it was 10 years ago, you can go ahead and update it.
Formulate research questions
Devise eligibility criteria
Define search strategy – keywords, journals, conferences, workshops to search in
Retrieve further potential article using search strategy and also directly contacting top researchers in the field
Compare chosen articles among reviewers and decide a core set of papers to be included in the survey
Perform Qualitatively and Quantitatively on the selected set of papers
Report on the results
There are several benefits/advantages of conducting a survey, such as:
A survey is the best way to get an idea of the state-of-the-art technologies, algorithms, tools etc. in a particular field
One can get a clear birds-eye overview of the current state of that field
It can serve as a great starting point for a student or any researcher thinking of venturing into that particular field/area of research
One can easily acquire updated information of a subject by referring to a review
It gives researchers the opportunity to formalize different concepts of a particular field
It allows one to identify challenges and gaps that are unanswered and crucial for that subject
However, there are a few limitations that must be considered before undertaking a survey such as:
Surveys can tend to be biased, thus it is necessary to have two researchers, who perform the systematic search for the articles independently
It is quite challenging to unify concepts, especially when there are different ideas referring to the same concepts developed over several years
Indeed, conducting a survey and getting the article published is a long process
In our group, three students conducted comprehensive literature reviews on three different topics:
Linked Data Quality: The survey covers 30 core papers, which focus on providing quality assessment methodologies for Linked Data specifically. A total of 18 data quality dimensions along with their definitions and 69 metrics are provided. Additionally, the survey contributes a comparison of 12 tools, which perform quality assessment of Linked Data .
Ubiquitous Semantic Applications: The survey presents a thorough analysis of 48 primary studies out of 172 initially retrieved papers. The results consist of a comprehensive set of quality attributes for Ubiquitous Semantic Applications together with corresponding application features suggested for their realization. The quality attributes include aspects such as mobility, usability, heterogeneity, collaboration, customizability and evolvability. The proposed quality attributes facilitate the evaluation of existing approaches and the development of novel, more effective and intuitive Ubiquitous Semantic Applications .
User interfaces for semantic authoring of textual content: The survey covers a thorough analysis of 31 primary studies out of 175 initially retrieved papers. The results consist of a comprehensive set of quality attributes for SCA systems together with corresponding user interface features suggested for their realization. The quality attributes include aspects such as usability, automation, generalizability, collaboration, customizability and evolvability. The proposed quality attributes and UI features facilitate the evaluation of existing approaches and the development of novel more effective and intuitive semantic authoring interfaces .
Also, here is a presentation on “Systematic Literature Reviews”: http://slidewiki.org/deck/57_systematic-literature-review.References
 Lisa A. Baglione (2012) Writing a Research Paper in Political Science. Thousand Oaks: CQ Press.
 Amrapali Zaveri, Anisa Rula, Andrea Maurino, Ricardo Pietrobon, Jens Lehmann and Sören Auer (2015), ‘Quality Assessment for Linked Data: A Survey’, Semantic Web Journal. http://www.semantic-web-journal.net/content/quality-assessment-linked-data-survey
 Timofey Ermilov, Ali Khalili, and Sören Auer (2014). ;Ubiquitous Semantic Applications: A Systematic Literature Review’. Int. J. Semant. Web Inf. Syst. 10, 1 (January 2014), 66-99. DOI=10.4018/ijswis.2014010103 http://dx.doi.org/10.4018/ijswis.2014010103
 Ali Khalili and Sören Auer (2013). ‘User interfaces for semantic authoring of textual content: A systematic literature review’, Web Semantics: Science, Services and Agents on the World Wide Web, Volume 22, October 2013, Pages 1-18 http://www.sciencedirect.com/science/article/pii/S1570826813000498
In the last couple of weeks at work we’ve been making “radios” in order to test the
This release also introduces new vocabulary for describing visual artworks: a new VisualArtwork type alongside supporting properties - artEdition, artform, material and surface. Many thanks to Paul Watson for leading that work. See also Paul's blog posts about the schema, its mapping to the VRA Core 4, and its use with Getty's Art and Architecture Thesaurus (AAT) via Linked Data.
Invoices and bills also now have dedicated vocabulary in schema.org, see the new Invoice type for details. This addresses situations when an invoice is received that is not directly attached to an Order, for example utility bills.
As usual then release notes page has full details. In recent weeks we have been taking care to document the status of all schema.org open issues and proposals in our issue tracker on the Github site. As always, thanks are due to everyone who contributed to this release and to the ongoing discussions in Github and at W3C.
One of AKSW’s Big Data Project, SAKE – Semantische Analyse Komplexer Ereignisse (SAKE – Semantic Analysis of Complex Events) kicked-off in Karlsruhe. SAKE is one of the winners of the Smart Data Challenge and is funded by the German BMWi (Bundesministerium für Wirtschaft und Energie) and has a duration of 3 years. Within this project, AKSW will develop powerful methods for analysis of industrial-scale Big Linked Data in real time. To this end, the team will extend existing frameworks like LIMES, DL-Learner, QUETSAL and FOX. Together with USU AG, Heidelberger Druckmaschinen, Fraunhofer IAIS and AviComp Controls novel methods for tackling Business Intelligence challenges will be devised.
More info to come soon!
Axel on behalf of the SAKE team
The need to bridge between the unstructured data on the document Web and the structured data on the Data Web has led to the development of a considerable number of annotation tools. Those tools are hard to compare since published results are calculated on diverse datasets and measured in different units.
We present GERBIL, a general entity annotation system based on the BAT-Framework. GERBIL offers an easy-to-use web-based platform for the agile comparison of annotators using multiple datasets and uniform measuring approaches. To add a tool to GERBIL, all the end user has to do is to provide a URL to a REST interface to its tool which abides by a given specification. The integration and benchmarking of the tool against user-specified datasets is then carried out automatically by the GERBIL platform. Currently, out platform provides results for 9 annotators and 11 datasets with more coming. Internally, GERBIL is based on the Natural Language Programming Interchange Format (NIF) and provide Java classes for implementing APIs for datasets and annotators to NIF. For the paper see here.Towards Efficient and Effective Semantic Table Interpretation by Ziqi Zhang presented by Ivan Ermilov Abstract
Ivan will present a paper that describes TableMiner by Ziqi Zhang, the first semantic Table Interpretation method that adopts an incremental, mutually recursive and bootstrapping learning approach seeded by automatically selected ‘partial’ data from a table. TableMiner labels columns containing named entity mentions with semantic concepts that best describe data in columns, and disambiguates entity content cells in these columns. TableMiner is able to use various types of contextual information outside tables for Table Interpretation, including semantic markups (e.g., RDFa/microdata annotations) that to the best of our knowledge, have never been used in Natural Language Processing tasks. Evaluation on two datasets shows that compared to two baselines, TableMiner consistently obtains the best performance. In the classification task, it achieves significant improvements of between 0.08 and 0.38 F1 depending on different baseline methods; in the disambiguation task, it outperforms both baselines by between 0.19 and 0.37 in Precision on one dataset, and between 0.02 and 0.03 F1 on the other dataset. Observation also shows that the bootstrapping learning approach adopted by TableMiner can potentially deliver computational savings of between 24 and 60% against classic methods that ‘exhaustively’ processes the entire table content to build features for interpretation.About the AKSW Colloquium
This event is part of a series of events about Semantic Web technology. Please see http://wiki.aksw.org/Colloquium for further information about previous and future events. As always, Bachelor and Master students are able to get points for attendance and there is complimentary coffee and cake after the session.
Thinking about the purpose of prototypes:
Make new and upcoming technologies and standards tangible enough to help people think through the consequences of them.
Technology is moving fast, but it is also unevenly distributed, and the consequences – good and bad – of emerging technologies may only become apparent as they move into the mainstream. By making these consequences tangible early we can choose between possible futures.Links
This is my view only, and there’s a certain amount of thinking out loud / lack of checking / potentially high bullshit level.
Yesterday I was asked to comment on a Radiodan doc and this popped out:
I made one of these a few months ago – they’re super simple – but Chris Lynas asked me about it, so I thought I should write it up quickly.
It’s an internet radio that turns itself on for
A round up of recent industry news on the topics of Big Data and Enterprise Data Management
When Radiodan can’t access the web, it throws up an access point (AP) created by the Pi: you connect directly to that and it displays the available wifi points in a webpage as a captive portal, and asks you to add the password for the one you want. It’s not easy to get credentials for wifi to objects with no user interface, and this is the best one we’ve found so far (
Driving business value from your data often requires integration across many sources. These integration projects can be time consuming, expensive and difficult to manage. Any short cuts can compromise on quality and reuse. In many industries, non-compliance with data governance rules can put you firm’s reputation at risk and expose you to large fines.
Traditional data integration methods require point to point mapping of source and target systems. This effort typically requires a team of both business SMEs and technology professionals. These mappings are time consuming to create and code and errors in the ETL (Extract, Transform, and Load) process require iterative cycles through the process.
- ROCKER – A Refinement Operator for Key Discovery (Tommaso Soru, Edgard Marx, Axel-Cyrille Ngonga Ngomo)
- GERBIL – General Entity Annotation Benchmark Framework (Ricardo Usbeck, Michael Röder, Axel-Cyrille Ngonga-Ngomo, Ciro Baron, Andreas Both, Martin Brümmer, Diego Ceccarelli, Marco Cornolti, Didier Cherix, Bernd Eickmann, Paolo Ferragina, Christiane Lemke, Andrea Moro, Roberto Navigli, Francesco Piccinno, Giuseppe Rizzo, Harald Sack, René Speck, Raphaël Troncy, Jörg Waitelonis and Lars Wesemann)
Happy New Year !!
The theme of the 2015 Ontology Summit is Internet of Things: Toward Smart Networked Systems and Societies. The Ontology Summit is an annual series of events (first started by Ontolog and NIST in 2006) that involve the ontology community and communities related to each year’s theme.
The 2015 Summit will hold a virtual discourse over the next three months via mailing lists and online panel sessions augmented conference calls. The Summit will culminate in a two-day face-to-face workshop on 13-14 April 2015 in Arlington, VA. The Summit’s goal is to explore how ontologies can play a significant role in the realization of smart networked systems and societies in the Internet of Things.
The Summit’s initial launch session will take place from 12:30pm to 2:00pm EDT on Thursday, January 15th and will include overview presentations from each of the four technical tracks. See the 2015 Ontology Summit for more information, the schedule and details on how to participate in these free an open events.