RDF

Three Last Call Working Drafts published by the RDFa Working Group

Planet RDFThu, 02/02/2012 - 15:13

Categories:

RDF

The RDF Web Applications Working Group has published three Last Call Working Drafts:
* RDFa Core 1.1,
* RDFa Lite 1.1 and
* XHTML+RDFa 1.1.

Together, these documents outline the vision for RDFa in a variety of XML and HTML-based Web markup languages. RDFa Core 1.1 specifies the core syntax and processing rules for RDFa 1.1 and how the language is intended to be used in XML documents. RDFa Lite 1.1 provides a simple subset of RDFa for novice Web authors. XHTML+RDFa 1.1 specifies the usage of RDFa in the XHTML markup language.

Public reviews due by 21 February.

Official announcement at W3C.

Automatic text analytics using DBpedia and PoolParty – A Live Demo

Planet RDFThu, 02/02/2012 - 11:22

Categories:

RDF

Let me show you which steps have to be taken to generate a high-quality text mining application, ready to be used to annotate and to categorize any kind of text or documents covering nearly any domain. With our approach of thesaurus based text mining your documents can also be linked to the world of linked (open) data; enrich your documents with data from the LOD cloud!

Step 1. Generate a thesaurus by using a linked data source like DBpedia

As recently reported SWC has developed a tool called SKOSsy which can be used to extract seed thesauri from DBpedia. In our example I will generate a knowledge model describing the domain of “digital photography“. This step took around 15 minutes.

Step 2. Load the thesaurus into PoolParty and improve it to your needs

After the seed thesaurus has been loaded into PoolParty Thesaurus Manager you have many possibilities to enhance the knowledge model further: Add more categories, synonyms, relations etc. In this example I use the seed-thesaurus without any further improvements. This step took approximately 2 minutes.

Step 3. Generate an automatic text extractor on top of your thesaurus

This step took a couple of seconds and ended up in having generated a fast and reliable text mining application on top of PoolParty Extractor, ready to be used to enrich your documents with data from the LOD cloud.

You can try it out here: PPX Live-Demo

To try the extractor on your own, please take a look at the image above which shows a proper configuration, you have to insert the following UUID in the form: d35d4ddb-adc3-4ea5-b027-deacac03e391

Since our example is all about ‘digital photography’, we recommend to use text samples (or some fragments) like these ones to test the quality of PPX based text analytics:

Let us know what you think about this straight-forward approach and your opinion about the quality of the results. We believe that thesaurus based text mining is in many cases an alternative to some other approaches, especially if you want to to enrich your content with information from the upcoming web of data.

Of course we would be happy to generate other demos in the areas of your interest! Just get in contact with us by using our contact form.

2 PhD and 1 PostDoc Position at Business School of Bern University of Applied Sciences

Planet RDFThu, 02/02/2012 - 10:44

Categories:

RDF

For collaborative international research projects in the area of intelligent information management, the Business School of Bern University of Applied Sciences (BUAS) in cooperation with research group Agile Knowledge Engineering and Semantic Web (AKSW) at Universität Leipzig opens positions for:

2 PhD and 1 PostDoc Position in Knowledge Engineering / Semantic Web

The positions are primarily based at BUAS (Switzerland) and funded by European FP7 projects and possibly Swiss national research grants. A close collaboration and ca. 4 research visits per year at AKSW research group at Universität Leipzig (Germany) are envisaged for the PhD students to complete their PhD program.

We offer
The stimulating environment of two research institutes in the fields of Business Informatics, Semantic Web, Ontology Engineering, Linked Data Web, Knowledge Management, Data integration and Service-Oriented Architectures;

  • Long-term collaboration with well-known academic institutions and major companies around the world;
  • A multicultural working place with state-of-the-art infrastructure, a competitive salary and resources including funding for attending international conferences, PhD symposia, summer schools, etc.;
  • competitively funded PhD positions close to the rate of the Swiss National Science Foundation (currently ca. CHF 41’000);
  • competitively paid Postdoc positions commensurate with the pay scale of BUAS (starting at CHF 80’000 depending on experience).

We expect

  • A strong background in Computer Science or related disciplines;
  • Excellent software engineering skills with demonstrated proficiency in modern software development;
  • The willingness to work in an international environment and combine formal scientific work with application-oriented research in order to solve real-world problems;
  • Research interest and expertise in at least one of the following: knowledge representation and ontology languages, natural language processing, data management and integration, Semantic Web standards, business aspects of semantic systems;
  • Prospective PhD students should fulfill the doctorate entrance requirements of Universität Leipzig (i.e. masters degree or equivalent)
  • Proficiency in English and the willingness to learn one of the official Swiss languages (e.g. German, French, Italian).

To apply
Applicants should include a cover letter, curriculum vitae incl. list of publications, a research statement and the names and addresses of two referees, via email (PDF only) to ksm1@bfh.ch (Dr. Michael Kaschewsky, Head of Research Group, Bern University of Applied Sciences, Business School). Positions are open until filled, but candidates are advised to apply by 1 March 2012. In addition, qualified Postdoctoral researchers have the opportunity to get funding for their position and additionally for a doctoral position that they supervise independently but must apply by 15 February 2012 – if you are interested please contact us asap.

About us
Bern University of Applied Sciences (BUAS) is the regional leader in applied science and research with seven departments across three cities. Research at the Business School in Bern is nationally leading and internationally renowned in the field of e-government and applied informatics in the public sector.
AKSW research group at the Universität Leipzig is establishing theoretical results and scalable implementations for the Semantic Data Web (e.g. DBpedia, OntoWiki, DL-Learner). Particular emphasis is given to areas such as ontology creation and manipulation, knowledge extraction, ontology learning and information & data integration on the Semantic Data Web.
Additional information regarding our research and projects as well as further information concerning these positions is available at http://bfh.ch and http://aksw.org.

What makes a good leader?

Planet RDFWed, 02/01/2012 - 17:33

Categories:

RDF

Abstract

This article is a short summary of my experiances about leadership. However these points are very simple to write down every fresh leader should learn these things from one by one, to have deep understanding.

Most important things

  1. Always be prepared
    If you have a meeting with your team, you should always spend time to prepare for it
  2. You are the Hero of your team
    Whethet you want it or not, you will be the one who will be followed, you should show always the best of yourself (if you are unstable, you team will be unstable, if you are focused, your team will be focused)
    You should perform the best, if you do extraordinary job, your team will follow you
  3. Give strict but honest valuations
    Saying the hard truth is not always pleasent, but a must
    Focus always on the improvement
  4. A "Thank you" is more important than money
  5. Listen to your team
    Most of the time, team can solve every problem facing with you, because your problem your teams problem, and your teams problem is yours
  6. Give vision

Communication

  • Keep eye contact
    You should look always into your collage's eyes
  • Listen carefully
    A good leader is a good listener
  • Use the word "We" instead of "I" or "You"
  • Your body tells the rest of the story
    Your gestures, are as important as you say in person

Human sniplets

  • You cannot change people, just showing a better way to them

Get comments on the Water Quality Portal from AGU 2011

Planet RDFWed, 02/01/2012 - 17:01

Categories:

RDF

AGU 2011 was the 1st conference I attended after I came to RPI. There were many interesting activities in the conference and I feel this is a very rewarding experience. I had the poster for the semantic water quality portal and also helped a little bit with the RPI table in the academic section (by just being there). I went to several talks and visited some exhibit booths. We had a group lunch at Chevy’s, which was very fun!

Preparing the poster helped me to rethink the water quality portal. Thank Evan for his poster for ISWC 2011, which was a very good starting point for my AGU poster! Having the poster session was even more fun and rewarding! I presented the portal to researchers from various fields and countries. Most of the people I talked to said that the portal is a nice and interesting project. Some researchers gave me very helpful comments like:
1. bring in crowd sourcing, e.g. let users report problem
2. help farmers to identify polluted wells
3. we should have an approach for pulling new data from USGS and EPA, e.g. some subscription
4. regulation management for users (insert/upload/delete)
5. consider allergic as use cases, possible conditions for allergic alert: wind + time, a combination of pollutants

I went to several talks during AGU and got to know the cool projects that researchers from different organizations (EPA, Standford, UMD, Google, NASA) have been doing. It was impressive to see that how computer science has been widely and deeply used in geophysical research. And I felt that scientists from geophysical fields expect more cooperation with people from computer science.

I went to the exhibits twice and spent quite some time there. I used the wired network provided by Google to do my assignments for the AI course. I also listened the talk about Google earth engine, a very cool platform for geophysical scientists!

Attending a conference as huge as AGU indeed requires some energy but after all this is worthwhile.

Tips about travel reimbursement that Carol gave me today:

1. Keep boarding pass to show if you sit in economic class
2. check out at the hotel and get the folio to show that you actually
stayed at the hotel for how many nights
3. get itemized receipts at restaurants

Thank Carol!

New RDFa Drafts Published

Planet RDFTue, 01/31/2012 - 16:35

Categories:

RDF

The W3C RDF Web Applications Working Group has published three Last Call Working Drafts today:

Together, these documents outline the vision for RDFa in a variety of XML and HTML-based Web markup languages. RDFa Core 1.1 specifies the core syntax and processing rules for RDFa 1.1 and how the language is intended to be used in XML documents or in HTML. RDFa Lite 1.1 provides a simple subset of RDFa for novice Web authors. XHTML+RDFa 1.1 specifies the usage of RDFa in the XHTML markup language.

A number of improvements have been made to RDFa 1.1 over the past year by working closely with Google, Microsoft, Yahoo! and the other search engine developers. Public review and comments have resulted in a number of further refinements to the language that eases the learning curve for beginner Web authors.

The release of these documents as Last Call Working Drafts is a signal to the public that the Working Group believes that all of the technical requirements, public comments and reported issues have been addressed. It is also an open invitation to the general public to review and provide feedback on the finalization of this technology via the RDF Web Applications Working Group mailing list, by 21 February.

DCMI-UK Regional Meeting: Mark your calendars for 26-27 April 2012

Planet RDFMon, 01/30/2012 - 23:59

Categories:

RDF
2012-01-30, The DCMI Bibliographic Metadata Task Group will be holding its inaugural Meeting in London on 26 April 2012 in conjunction with the DCMI Vocabulary Management Community. The meetings will be followed on 27 April by a seminar examining the impact of the London Meeting of 2007, which brought DCMI and the Joint Steering Committee for Development of RDA (JSC) together to build the RDA Vocabularies. The DCMI meetings on the 26th will be free and open to all. The full announcement of the meetings is available and the agenda for the meeting of this community will be developed on the wiki and the Task Group discussion list over the coming months.

My First AGU Experience

Planet RDFMon, 01/30/2012 - 15:14

Categories:

RDF

(This post was supposed to be posted a month ago. But I had some trouble accessing the TW weblog website when I was in China, so I have to post it now after I came back to Troy.)

AGU 2011 Fall Meeting was the first time I went to an academic conference. I was very excited when I learned I’ve got such an opportunity. My goals were to present our poster, to check out what it is like in such a conference, and to have an idea about what other people are doing in the Informatics area.

My poster was about the work with Eric Rozell on the temporal metadata modeling in VSTO. I presented its motivation and methodology to several people, and it certainly drew some interest. Our approach has been viewed as an effective way to deal with a large amount of data and to improve reasoning and searching capacities. It was suggested that a similar technique (in the sense of including the temporal range for a dataset to a granularity of days using time:DateTimeInterval) has been used for data indexing in relational databases in NASA. In terms of the presentation, I think putting our posters, publications, and demos into flash drives and distributing them to people was a very good idea. It greatly helped the interested audiences to understand our work more afterwards.

There were many other interesting work across a couple of sessions. For example, Nicholas Del Rios etc. from University of Texas at El Paso presented a semantic and provenance aware visualization framework (VisKo) that links data with visualization processes. It has been used to visualize data on behalf of Giovanni. It is able to capture data processing provenance and visualization provenance in PML. Besides posters, I also went several talks from different sessions. Though I failed to connect most of them to my research work, I thought it was nice to hear about what other people have been working on.

Another output for me was to meet people in the Earth Science and Informatics areas. Although the names I could remember were limited, what I saw was they are a group of people who show enthusiasm about their work. They believe in what they are doing and have the confidence in the accomplishment their work will bring. I really look forward to working with many of them.

To sum up, this was a great experience for me in the beginning stage of my Ph.D. career.  Next time I will try to meet and talk to more people, and get more feedback about my own work.

 

Open Data – a virtual natural resource

Planet RDFMon, 01/30/2012 - 09:13

Categories:

RDF

A virtual natural resource? Doesn’t make sense, does it?

Let me explain.

Natural resources are derived from the environment. Many of them are essential for our survival while others are used for satisfying our wants.

… is with Wikipedia

Small Data

Planet RDFMon, 01/30/2012 - 09:04

Categories:

RDF

I'd just like to plant a little flag in the sand. Big Data seems to be the flavour of the month (and is undeniably extremely useful and interesting), but I've a gut feeling that might be symptomatic of not seeing the wood for the trees (or maybe vice versa).

I've not thought this through much, but surely any trends/correlations/relationships that are important enough to be of interest should be detectable without having to build a terabyte+ store? Rather that trying to capture as much raw data as possible up front, I suspect a more productive approach long-term will be to work with (maybe federated) crawler farms, with lots and lots of algorithms running in parallel over what they see. If there are appropriate training feedback loops in place, the shape of algorithms themselves could be treated as the results of the analysis.

It could be argued that once you have accumulated a corpus of raw data you can subsequently throw whatever you like at it without having to get the raw data again. But that corpus will never be complete or truly fresh - as new data appears on the Web all the time. More critically, under normal circustances you can never be sure you've got a dataset that contains a good sample representation covering whatever unknowns you're exploring. But crawlers can be directed to favour slices of the Web that contain information relevant to your hypotheses.

So, in the context of the Web, the Web itself should be the only big data needed. Which gives a neat parallel in the other sciences: reality itself is the only database you'll ever need :)

Ok, in the same way that Big Sites (like Wikipedia/dbPedia) adds big value to the Web alongside lots of small pieces, loosely joined, the same no doubt goes for Big Data. But let's not forget the vice versa, a complementary Small Data approach.

Somewhat orthogonal to this, one way in which the Web is a game changer for data is that here the relationship between pieces of data (/documents) is at least as significant as those pieces of data stacked on top of each other. Link Rank is a special case, an aggregated, flattened view of link value. If topics and entities (i.e. thing in general, people, places, concepts etc) and their interrelationships are inferred and/or explicitly named, it should expose some interesting facets of how human knowledge works.

Comment to G+ please.

… you end up with a graph

Planet RDFSun, 01/29/2012 - 21:23

Categories:

RDF

Quite often I hear people coming up with rather strange explanations why we use graphs, or to be more specific for the Web case, RDF. Some think that the reason is to make the developer’s life harder. Right. It’s so much easier to understand a key-value structure. And there are the ones who claim that we use graphs because the W3C says so (RDF is a W3C standard). Some others say graphs are used because they are the most generic, powerful data structure and you can represent any other simpler data structure, such as a tree (think XML) with it.

The real reason why graphs are in use is much simpler: when you have data sources and you start connecting data items across them, you end up with a graph. You can like it or not, but it’s inevitable.

Now, graphs have a number of desirable properties, including ‘

True Knowledge launches Evi question answering mobile app

Planet RDFSun, 01/29/2012 - 17:43

Categories:

RDF

UK semantic technology company True Knowledge has released Evi, a mobile app that competes with Siri.

The mobile app is available on the Android Market and on iTunes. You can pose queries to either by speaking or typing. The Android app uses Google’s ASR speech technology and the iTunes app uses Nuance.

True Knowledge has been developing a natural answering question answering system since 2007. You can query the True Knowledge online via a Web interface. Tty the following links for some examples:

The Evi app has a number of additional features beyond the Web-based True Knowledge QA system and these wil probably be expanded on in the months to come.

See the Technology Review story, New Virtual Helper Challenges Siri, for more information.

Something Dry: Change Notifications

Planet RDFSat, 01/28/2012 - 13:50

Categories:

RDF

Ignoring the fact that I did not blog in nearly two months I will simply get some developer information out there. Getting notified about changes in the Nepomuk database has always been a problem. All we had for a long time where the ugly

Search plus Your World - fool's gold

Planet RDFSat, 01/28/2012 - 11:59

Categories:

RDF

For quite a while I've held the view that most current approaches to Web search are fundamentally flawed, because the best way to find something is not to lose it in the first place. But as the companies invested in search gradually get smarter in their use of person- and (to a lesser extent) thing-oriented data, rather than just word association (football) search results seem increasingly more focused. Google's approach in particular has grown increasingly like the model put forward in the Semantic Web initiative. Recently with G+ we see a big push to capture and exploit data associated with personal profiles (the FOAF domain) and brands (the GoodRelations domain, although maybe there's a role for an additional brand- rather than product-oriented vocab). With Rich Snippets and Schema.org there's a direct use of semweb technology (in a slightly mangled form - One True Ontology is a well-known antipattern to anyone that bothers to look at the literature).

In fact the "Your World" part of Search plus Your World (SPYW) can be seen as a reinvention of the most important part of Semantic Web technology, that of giving everything of significance a URL: people, places, things, concepts. Given that, you can start describing and leveraging relationships between those resources. To use a phrase I think originated around microformats, it's lower-case semantic web. Ok, behind the quality glitz of G+ profiles and pages this seems to have been done in a rather sloppy, ad hoc fashion, but that in itself is fine - whatever it takes. But where Google get it very wrong is by putting themselves at the heart of their system. Not only is semantic in lower-case, so is web. If you do a search with SPYW enabled, you're pointed straight back into the Google Empire. They are making themselves gatekeepers of the Web. Although there aren't any concrete entry barriers to this walled garden, by only signposting Google's footpaths in search results it's creating a system with the same characteristics as say AOL around 2000. From Google search being a vital accessory on the open Web, it's increasingly becoming a portal.

There is already a visible cost in practice to Google's echo chamber - if you want to re-find something one of your colleagues said the other day, sure SPYW is helpful. But if you're trying to do some original research, you don't want to be searching with Your World blinkers on - an engine without those preconceptions such as DuckDuckGo will be more useful

This strategy I'd assert is doomed to failure for the same reason AOL's walled garden collapsed, to use another phrase I like to repeat, because no matter how big any single entity becomes, the rest of the Web will always be bigger. The focus on the user/Don't Be Evil thing is absolutely right to highlight the value of non-Google resources, although it does fall short by suggesting that the rest of the Web is just a handful of other companies [G+ link] i.e. Twitter, Facebook etc. Google's own long-term survival as a market leader is absolutely dependent on their respect of the Web at large.

So what should Google do? Re-read Steve Yegge's awesome rant [G+ link] for starters. Especially the bits about Platforms. G+ and Your World should be considered in this context - as a semantic (any case) Web (upper case) Platform. For example, while Google's pages appear to be aimed at providing the canonical URLs for concepts (...lower-case). But there's already an excellent source of such URLs : Wikipedia. In itself Wikipedia only provides URLs of documents who's primary topic is the thing in question, but dbPedia is a well-established mapping based on best practices from thing identifiers to Wikipedia pages (e.g. <<a href="http://dbpedia.org/resource/Berlin">http://dbpedia.org/resource/Berlin> foaf:isPrimaryTopicOf <<a href="http://en.wikipedia.org/wiki/Berlin">http://en.wikipedia.org/wiki/Berlin> . ). If a handful of students from obscure north-European universities (heh, sorry, just for the sake of contrast), with a little community support can create and maintain - give the world - a service supporting all the concepts/things covered by Wikipedia, imagine what the mighty Google could achieve...

To give a little example in the context of Personal Profiles, if I publish my definitive personal profile on my own domain (note Google already understands all the elements of this) then for queries for which "me" is the appropriate response, that page should be the first hit, not my G+ profile.

Another factor in the walled nature of G+ is the limited API. I'm sure features will be added to this in the near future, but I hope (probably unrealistically) they will use proper standards and follow known best practices. Going further into over-optimistic territory, I'll quote Tom Gruber (in an interview talking about how Siri works) :

A site that exposes RDF usually has an API that is easy to deal with, which makes our life easier. For instance, we use geonames.org as one of our geospatial information sources. It is a full-on Semantic Web endpoint, and that makes it easy to deal with. The more the API declares its data model, the more automated we can make our coupling to it.

What should we (as users and components of the Web) do? Well, basically what we're already doing...but trying not to be distracted by shiny things and keeping an eye on the long term - standards are good. When we publish data on the Web we need to consider the quality of the data first (i.e. make it 5 Star), seeing it as purely Google-fodder is missing the point.

Comments please [Google+ link, the irony is not lost on me :)]

Subscribe to The Universal Pantograph aggregator - RDF