Planet RDF

Subscribe to Planet RDF feed
Updated: 4 days 21 hours ago

DCMI's first Governing Board officer transition

Fri, 10/24/2014 - 23:59

Categories:

RDF
2014-10-24, In the closing ceremony of DC-2014, DCMI exercised the first Governing Board officer transition under the Initiative's new governance structure. Michael Crandall stepped into the role of Immediate Past Chair as Eric Childress assumed the roles of Chair of DCMI and the Governing Board. Joseph Tennis became the new Chair Elect of the Board and will succeed as Chair at DC-2015 in São Paulo, Brazil. Information about the new DCMI governance structure can be found in the DCMI Handbook at http://wiki.dublincore.org/index.php/DCMI_Handbook/orgStructure

DCMI/ASIS&T; Webinar - The Learning Resource Metadata Initiative, describing learning resources with schema.org, and more?

Fri, 10/24/2014 - 23:59

Categories:

RDF
2014-10-24, The Learning Resource Metadata Initiative (LRMI) is a collaborative initiative that aims to make it easier for teachers and learners to find educational materials through major search engines and specialized resource discovery services. The approach taken by LRMI is to extend the schema.org ontology so that educationally significant characteristics and relationships can be expressed. In this webinar, Phil Barker and Lorna M. Campbell of Cetis will introduce schema.org and present the background to LRMI, its aims and objectives, and who is involved in achieving them. The webinar will outline the technical aspects of the LRMI specification, describe some example implementations and demonstrate how the discoverability of learning resources may be enhanced. Phil and Lorna will present the latest developments in LRMI implementation, drawing on an analysis of its use by a range of open educational resource repositories and aggregators, and will report on the potential of LRMI to enhance education search and discovery services. Whereas the development of LRMI has been inspired by schema.org, the webinar will also include discussion of whether LRMI has applications beyond those of schema.org. Registration at http://bit.ly/dcmiWebinar-LRMI. The webinar is free to DCMI Individual & Organizational Members, to ASIS&T Members and at modest fee to non-members.

Tutorials, presentations and proceedings of DC-2014 published

Fri, 10/24/2014 - 23:59

Categories:

RDF
2014-10-24, Over 236 people from 17 countries attended DC-2014 in Austin, Texas from 8 through 11 October 2014. Pre- and post-Conference tutorials and workshops were presented in the AT&T Executive Education and Conference Center and at the Harry Ransom Center with 90 people attending in each of the two venues on the University of Texas at Austin campus. Over 190 people attended the 2-day conference. Presentation slides of the keynote by Eric Miller of Zepheira LLC as well as presentations from the special sessions and the tutorials/workshops are available online at the conference website at http://bit.ly/dc2014-presentations. The full text of the peer reviewed papers, project reports, extended poster abstracts and poster images are also available. Additional assets from the conference will be added to the online proceedings as they become available over the next few weeks.

Stewardship of LRMI specification transferred to DCMI

Fri, 10/24/2014 - 23:59

Categories:

RDF
2014-10-24, After lengthy deliberations, the leadership of the Learning Resource Metadata Initiative (LRMI) has determined that the stewardship of the LRMI 1.1 specification, as well management of future LRMI development, will be passed to DCMI and its long-standing Education Community. The LRMI specification on which schema.org educational properties and classes are based was created by the Association of Educational Publishers (AEP) and Creative Commons (CC) with support from the Bill & Melinda Gates Foundation. The three-phased development cycle of the LRMI 1.1 specification included closing processes for the orderly passing of stewardship to a recognized organization in the metadata sector with commitments to transparency and community involvement. Stewardship of the LRMI specification within DCMI will be a function of a DCMI/LRMI Task Group. The LRMI specification as well as links to the Task Group and community communications channels can be found on the DCMI website at http://dublincore.org/dcx/lrmi-terms/. For the full announcement of the transfer, see the AEP press release at http://www.lrmi.net/lrmi-transfers-stewardship.

AKSW internal group meeting @ Dessau

Thu, 10/23/2014 - 00:07

Categories:

RDF

Recently AKSW members were at the city of Dessau for an internal group meeting.

The meeting took place between 8th and 10th of October, in the modern university of architecture of Bauhaus were we also stayed hosted. Bauhaus is located in the city of Dessau, about one hour from Leipzig. Bauhaus operated from 1919 to 1933 and was famous for the approach to design that combined crafts and the fine arts. At that time the German term Bauhaus – literally “house of construction” – was understood as meaning “School of Building”. It seemed to be a perfect getaway and an awesome location for AKSWers meet and “build” together the future steps of the group.

Wednesday was spent mostly in smaller group discussions on various ongoing projects. Over the next two days the AKSW PhD students presented their achievements, current status and future plans of their PhD projects. During the meeting, we have the pleasure to receive various of AKSWers leaders and project managers as prof. Dr. Sören Auer, Dr. Jens Lehmann, prof. Dr. Thomas Riechert and Dr. Michael Martin. The heads of the AKSW gave their inputs and suggestions to the students in order to help them to improve, continue and or complete their PhDs. In addition, the current projects were also discussed so as to find possible synergies between them and to discuss further improvements and ideas.

However, we do find some time to enjoy the beautiful city of Dessau and learn a little bit more about the history of this wonderful city.

Overall, it was a productive and recreational trip not only to keep a track of each students progress but also to help them to improve their work. We are all thankful to prof. Dr. Thomas Riechert who was the main responsible for organizing this amazing meeting.

On Universality and Core Competence

Wed, 10/22/2014 - 17:23

Categories:

RDF

I will here develop some ideas on the platform of Peter Boncz's inaugural lecture mentioned in the previous post. This is a high-level look at where the leading edge of analytics will be, now that the column store is mainstream.

Peter's description of his domain was roughly as follows, summarized from memory:

The new chair is for data analysis and engines for this purpose. The data analysis engine includes the analytical DBMS but is a broader category. For example, the diverse parts of the big data chain (including preprocessing, noise elimination, feature extraction, natural language extraction, graph analytics, and so forth) fall under this category, and most of these things are usually not done in a DBMS. For anything that is big, the main challenge remains one of performance and time to solution. These things are being done, and will increasingly be done, on a platform with heterogenous features, e.g., CPU/GPU clusters, possibly custom hardware like FPGAs, etc. This is driven by factors of cost and energy efficiency. Different processing stages will sometimes be distributed over a wide area, as for example in instrument networks and any network infrastructure, which is wide area by definition.

The design space of database and all that is around it is huge, and any exhaustive exploration is impossible. Development times are long, and a platform might take ten years to be mature. This is ill compatible with academic funding cycles. However, we should not leave all the research in this to industry, as industry maximizes profit, not innovation or absolute performance. Architecting data systems has aspects of an art. Consider the parallel with architecture of buildings: There are considerations of function, compatibility with environment, cost, restrictions arising from the materials at hand, and so forth. How a specific design will work cannot be known without experiment. The experiments themselves must be designed to make sense. This is not an exact science with clear-cut procedures and exact metrics of success.

This is the gist of Peter's description of our art. Peter's successes, best exemplified by MonetDB and Vectorwise, arise from focus over a special problem area and from developing and systematically applying specific insights to a specific problem. This process led to the emergence of the column store, which is now a mainstream thing. The DBMS that does not do columns is by now behind the times.

Needless to say, I am a great believer in core competence. Not every core competence is exactly the same. But a core competence needs to be broad enough so that its integral mastery and consistent application can produce a unit of value valuable in itself. What and how broad this is varies a great deal. Typically such a unit of value is something that is behind a "natural interface." This defies exhaustive definition but the examples below may give a hint. Looking at value chains and all diverse things in them that have a price tag may be another guideline.

There is a sort of Hegelian dialectic to technology trends: At the start, it was generally believed that a DBMS would be universal like the operating system itself, with a few products with very similar functionality covering the whole field. The antithesis came with Michael Stonebraker declaring that one size no longer fit all. Since then the transactional (OLTP) and analytical (OLAP) sides are clearly divided. The eventual synthesis may be in the air, with pioneering work like HyPer led by Thomas Neumann of TU München. Peter, following his Humbolt prize, has spent a couple of days a week in Thomas's group, and I have joined him there a few times. The key to eventually bridging the gap would be compilation and adaptivity. If the workload is compiled on demand, then the right data structures could always be at hand.

This might be the start of a shift similar to the column store turning the DBMS on its side, so to say.

In the mainstream of software engineering, objects, abstractions and interfaces are held to be a value almost in and of themselves. Our science, that of performance, stands in apparent opposition to at least any naive application of the paradigm of objects and interfaces. Interfaces have a cost, and boxes limit transparency into performance. So inlining and merging distinct (in principle) processing phases is necessary for performance. Vectoring is one take on this: An interface that is crossed just a few times is much less harmful than one crossed a billion times. Using compilation, or at least type-and-data-structure-specific variants of operators and switching their application based on run-time observed behaviors, is another aspect of this.

Information systems thus take on more attributes of nature, i.e., more interconnectedness and adaptive behaviors.

Something quite universal might emerge from the highly problem-specific technology of the column store. The big scan, selective hash join plus aggregation, has been explored in slightly different ways by all of HyPer, Vectorwise, and Virtuoso.

Interfaces are not good or bad, in and of themselves. Well-intentioned naïveté in their use is bad. As in nature, there are natural borders in the "technosphere"; declarative query languages, processor instruction sets, and network protocols are good examples. Behind a relatively narrow interface lies a world of complexity of which the unsuspecting have no idea. In biology, the cell membrane might be an analogy, but this is in all likelihood more permeable and diverse in function than the techno examples mentioned.

With the experience of Vectorwise and later Virtuoso, it turns out that vectorization without compilation is good enough for TPC-H. Indeed, I see a few percent of gain at best from further breaking of interfaces and "biology-style" merging of operators and adding inter-stage communication and self-balancing. But TPC-H is not the end of all things, even though it is a sort of rite of passage: Jazz players will do their take on Green Dolphin Street and Summertime.

Science is drawn towards a grand unification of all which is. Nature, on the other hand, discloses more and more diversity and special cases, the closer one looks. This may be true of physical things, but also of abstractions such as software systems or mathematics.

So, let us look at the generalized DBMS, or the data analysis engine, as Peter put it. The use of DBMS technology is hampered by its interface, i.e., declarative query language. The well known counter-reactions to this are the NoSQL, MapReduce, and graph DB memes, which expose lower level interfaces. But then the interface gets put in the whole wrong place, denying most of the things that make the analytics DBMS extremely good at what it does.

We need better and smarter building blocks and interfaces at zero cost. We continue to need blocks of some sort, since algorithms would stop being understandable without any data/procedural abstraction. At run time, the blocks must overlap and interpenetrate: Scan plus hash plus reduction in one loop, for example. Inter-thread, inter-process status sharing for things like top k for faster convergence, for another. Vectorized execution of the same algorithm on many data for things like graph traversals. There are very good single blocks, like GPU graph algorithms, but interface and composability are ever the problem.

So, we must unravel the package that encapsulates the wonders of the analytical DBMS. These consist of scan, hash/index lookup, partitioning, aggregation, expression evaluation, scheduling, message passing and related flow control for scale-out systems, just to mention a few. The complete list would be under 30 long, with blocks parameterized by data payload and specific computation.

By putting these together in a few new ways, we will cover much more of the big data pipeline. Just-in-time compilation may well be the way to deliver these components in an application/environment tailored composition. Yes, keep talking about block diagrams, but never once believe that this represents how things work or ought to work. The algorithms are expressed as distinct things, but at the level of the physical manifestation, things are parallel and interleaved.

The core skill for architecting the future of data analytics is correct discernment of abstraction and interface. What is generic enough to be broadly applicable yet concise enough to be usable? When should the computation move, and when should the data move? What are easy ways of talking about data location? How can protect the application developer be protected from various inevitable stupidities?

No mistake about it, there are at present very few people with the background for formulating the blueprint for the generalized data pipeline. These will be mostly drawn from architects of DBMS. The prospective user is any present-day user of analytics DBMS, Hadoop, or the like. By and large, SQL has worked well within its area of applicability. If there had never been an anti-SQL rebel faction, SQL would not have been successful. Now that a broader workload definition calls for redefinition of interfaces, so as to use the best where it fits, there is a need for re-evaluation of the imperative Vs. declarative question.

T. S. Eliot once wrote that humankind cannot bear very much reality. It seems that we in reality can deconstruct the DBMS and redeploy the state of the art to serve novel purposes across a broader set of problems. This is a cross-over that slightly readjusts the mental frame of the DBMS expert but leaves the core precepts intact. In other words, this is a straightforward extension of core competence with no slide into the dilettantism of doing a little bit of everything.

People like MapReduce and stand-alone graph programming frameworks, because these do one specific thing and are readily understood. By and large, these are orders of magnitude simpler than the DBMS. Even when the DBMS provides in-process Java or CLR, these are rarely used. The single-purpose framework is a much narrower core competence, and thus less exclusive, than the high art of the DBMS, plus it has a faster platform development cycle.

In the short term, we will look at opening the SQL internal toolbox for graph analytics applications. I was discussing this idea with Thomas Neumann at Peter Boncz's party. He asked who would be the user. I answered that doing good parallel algorithms, even with powerful shorthands, was an expert task; so the people doing new types of analytics would be mostly on the system vendor side. However, modifying such for input selection and statistics gathering would be no harder than doing the same with ready-made SQL reports.

There is significant possibility for generalization of the leading edge of database. How will this fare against single-model frameworks? We hope to shed some light on this in the final phase of LDBC and beyond.

Inaugural Lecture of Prof. Boncz at VU Amsterdam

Wed, 10/22/2014 - 17:21

Categories:

RDF

Last Friday, I attended the inaugural lecture of Professor Peter Boncz at the VU University Amsterdam. As the reader is likely to know, Peter is one of the database luminaries of the 21st century, known among other things for architecting MonetDB and Actian Vector (Vectorwise) and publishing a stellar succession of core database papers.

The lecture touched on the fact of the data economy and the possibilities of E-science. Peter proceeded to address issues of ethics of cyberspace and the fact of legal and regulatory practice trailing far behind the factual dynamics of cyberspace. In conclusion, Peter gave some pointers to his research agenda; for example, use of just-in-time compilation for fusing problem-specific logic with infrastructure software like databases for both performance and architecture adaptivity.

There was later a party in Amsterdam with many of the local database people as well as some from further away, e.g., Thomas Neumann of Munich, and Marcin Zukowsky, Vectorwise founder and initial CEO.

I should have had the presence of mind to prepare a speech for Peter. Stefan Manegold of CWI did give a short address at the party, while presenting the gifts from Peter's CWI colleagues. To this I will add my belated part here, as follows:

If I were to describe Prof. Boncz, our friend, co-worker, and mentor, in one word, this would be man of knowledge. If physicists define energy as that which can do work, then knowledge would be that which can do meaningful work. A schematic in itself does nothing. Knowledge is needed to bring this to life. Yet this is more than an outstanding specialist skill, as this implies discerning the right means in the right context and includes the will and ability to go through with this. As Peter now takes on the mantle of professor, the best students will, I am sure, not fail to recognize excellence and be accordingly inspired to strive for the sort of industry changing accomplishments we have come to associate with Peter's career so far. This is what our world needs. A big cheer for Prof. Boncz!

I did talk to many at the party, especially Pham Minh Duc, who is doing schema-aware RDF in MonetDB, and many others among the excellent team at CWI. Stefan Manegold told me about Rethink Big, an FP7 for big data policy recommendations. I was meant to be an advisor and still hope to go to one of their meetings for some networking about policy. On the other hand, the EU agenda and priorities, as discussed with, for example, Stefano Bertolo, are, as far as I am concerned, on the right track: The science of performance must meet with the real, or at least realistic, data. Peter did not fail to mention this same truth in his lecture: Spinoffs play a key part in research, and exposure to the world out there gives research both focus and credibility. As René Char put it in his poem L'Allumette (The Matchstick), "La tête seule à pouvoir de prendre feu au contact d'une réalité dure." ("The head alone has power to catch fire at the touch of hard reality.") Great deeds need great challenges, and there is nothing like reality to exceed man's imagination.

For my part, I was advertising the imminent advances in the Virtuoso RDF and graph functionality. Now that the SQL part, which is anyway the necessary foundation for all this, is really very competent, it is time to deploy these same things in slightly new ways. This will produce graph analytics and structure-aware RDF to match relational performance while keeping schema-last-ness. Anyway, the claim has been made; we will see how it is delivered during the final phase of LDBC and Geoknow.

hostapd debugging

Wed, 10/22/2014 - 16:50

Categories:

RDF

sudo hostapd -dd /etc/hostapd/hostapd.conf

tells you if your config file is broken. Which helps.

Sample hostapd.conf for wpa-password protected access point:

ssid=myssid
interface=wlan0
driver=nl80211
hw_mode=g
channel=1
wpa=2
wpa_passphrase=mypass
wpa_key_mgmt=WPA-PSK
# makes the SSID visible and broadcasted
ignore_broadcast_ssid=0


The Importance of Use Cases Documents

Tue, 10/21/2014 - 11:22

Categories:

RDF
The Data on the Web Best Practices WG is among those who will be meeting at this year’s TPAC in Santa Clara. As well as a chance for working group members to meet and make good progress, it’s a great … Continue reading →

AKSW at #ISWC2014. Come and join, talk and discuss with us!

Thu, 10/16/2014 - 12:00

Categories:

RDF
Hello AKSW Follower! We are very pleased to announce that nine of our papers were accepted for presentation at ISWC 2014. In the main track of the conference we will present the following papers:

This year, the Replication, Benchmark, Data and Software Track started and we got accepted twice!

Additionally, four  of our papers will be presented within different workshops:

You can also find us at the posters and demo session where we are goint to present

  • AGDISTIS – Multilingual Disambiguation of Named Entities Using Linked Data, Ricardo Usbeck , Axel-Cyrille Ngonga Ngomo, Wencan Luo and Lars Wesemann
  • Named Entity Recognition using FOX, René Speck and Axel-Cyrille Ngonga Ngomo
  • AMSL – Creating a Linked Data Infrastructure for Managing Electronic Resources in Libraries, Natanael Arndt, Sebastian Nuck, Andreas Nareike, Norman Radtke, Leander Seige and Thomas Riechert.
  • Xodx – A node for the Distributed Semantic Social Network, Natanael Arndt and Sebastian Tramp.

We are especially looking forward to see you at the full-day tutorial:

Come to ISWC at Riva del Garda, talk to us and enjoy the talks. More information on various publications can be found at http://aksw.org/Publications. Cheers, Ricardo on behalf of AKSW

Tarot scoring

Wed, 10/15/2014 - 02:16

Categories:

RDF
Score keeping for the French card game Tarot is way too difficult, especially after a couple of cocktails. Here's my attempt to fix that problem.

Wikidata article in CACM

Mon, 10/13/2014 - 00:51

Categories:

RDF

I just noticed that Denny Vrandecic and Markus Krötzsch have an article on Wikidata in the latest CACM. Good work! Even better, it’s available without subscription.

Wikidata: a free collaborative knowledgebase, Denny Vrandecic and Markus Krötzsch, Communications of the ACM, v57, n10 (2014), pp 78-85.

“This collaboratively edited knowledgebase provides a common source of data for Wikipedia, and everyone else.

Unnoticed by most of its readers, Wikipedia continues to undergo dramatic changes, as its sister project Wikidata introduces a new multilingual “Wikipedia for data” (http://www.wikidata.org) to manage the factual information of the popular online encyclopedia. With Wikipedia’s data becoming cleaned and integrated in a single location, opportunities arise for many new applications.”

Cross origin in Sinatra

Thu, 10/09/2014 - 12:04

Categories:

RDF

I keep looking this up, so:

require 'sinatra'
require 'sinatra/base'
require 'sinatra/cross_origin'

class MyApp < Sinatra::Base
  register Sinatra::CrossOrigin

  get '/' do
    cross_origin
    ...
  end

end


LIMES Version 0.6 RC4

Mon, 10/06/2014 - 22:49

Categories:

RDF

It has been a while but that moment has arrived again. We are happy to announce a new release of the LIMES framework. This version implements novel geo-spatial measures (e.g., geographic mean) as well as string similarity measures (jaro, jaro-winkler, etc.). Moreover, we fixed some minor bugs (thanks for the bug reports). The final release (i.e., version 0.7) will soon be available so stay tuned!

Link on,
Axel