News aggregator

Nothing to Hide

Another word for itSun, 10/26/2014 - 19:12


Topic Maps

Nothing to Hide: Look out for yourself by Nicky Case.

Greg Linden describes it as:

Brilliantly done, free, open source, web-based puzzle game with wonderfully dark humor about ubiquitous surveillance

First and foremost, I sense there is real potential for this to develop into an enjoyable online game.

Second, this could be a way to educate users to security/surveillance threats.


I first saw this in Greg Linden’s Quick Links for Wednesday, October 01, 2014.

Was all beer sour before Pasteur?

Lars MariusSun, 10/26/2014 - 17:01


Topic Maps
It's often said that before Pasteur's work on yeast (and Emil Christian Hansen's introduction of the pure-yeast system) all beer was sour. Various lines of reasoning lie behind this claim. One is that all beer was spontaneously fermented back then, because nobody knew what yeast was. Another is that because brewers had no microbiological control over their yeast, they were effectively using wild yeast, and thus they would necessarily get sour beer. Many people claim there must necessarily be other organisms than pure brewer's yeast in these yeast cultures, and that these would turn the beer sour.

Death of Yahoo Directory

Another word for itSun, 10/26/2014 - 15:09


Topic Maps

Progress Report: Continued Product Focus by Jay Rossiter, SVP, Cloud Platform Group.

From the post:

At Yahoo, focus is an important part of accomplishing our mission: to make the world’s daily habits more entertaining and inspiring. To achieve this focus, we have sunset more than 60 products and services over the past two years, and redirected those resources toward products that our users care most about and are aligned with our vision. With even more smart, innovative Yahoos focused on our core products – search, communications, digital magazines, and video – we can deliver the best for our users.

Directory: Yahoo was started nearly 20 years ago as a directory of websites that helped users explore the Internet. While we are still committed to connecting users with the information they’re passionate about, our business has evolved and at the end of 2014 (December 31), we will retire the Yahoo Directory. Advertisers will be upgraded to a new service; more details to be communicated directly.

Understandable but sad. Think of indexing a book that expanded as rapidly as the Internet over the last twenty (20) years. Especially if the content might or might not have any resemblance to already existing content.

Internet remains in serious need of a curated means to access quality information. Almost any search returns links ranging from high to questionable quality.

Imagine if Yahoo segregated the top 500 computer science publishers, archives, societies, departments, blogs into a block of searchable content. (The 500 number is wholly arbitrary, could be some other number) Users would pre-qualify themselves as interested in computer science materials and create a market segment for advertising purposes.

Users would get less trash in their results and advertisers would have pre-qualified targets.

A pre-curated search set might mean you would miss an important link, but realistically, few people read beyond the first twenty (20) links anyway. An analysis of search logs at PubMed show that 80% of users choose a link from the first twenty results.

In theory you may have > 10,000 “hits” but querying all of those up for serving to a user is a waste to time.

Suspect it varies by domain but twenty (20) high quality “hits” from curated content would be a far cry from average search results now.

I first saw this in Greg Linden’s Quick Links for Wednesday, October 01, 2014.

The Chapman University Survey on American Fears

Another word for itSun, 10/26/2014 - 14:04


Topic Maps

The Chapman University Survey on American Fears

From the webpage:

Chapman University has initiated a nationwide poll on what strikes fear in Americans. The Chapman University Survey on American Fears included 1,500 participants from across the nation and all walks of life. The research team leading this effort pared the information down into four basic categories: personal fears, crime, natural disasters and fear factors. According to the Chapman poll, the number one fear in America today is walking alone at night.

A multi-disciplinary team of Chapman faculty and students wanted to capture this information on a year-over-year basis to draw comparisons regarding what items are increasing in fear as well as decreasing. The fears are presented according to fears vs. concerns because that was the necessary phrasing to capture the information correctly.

Your marketing department will find this of interest.

If you are not talking about power, fear or sex, then you aren’t talking about marketing.

IT is no different from any other product or service. Perhaps that’s why the kumbaya approach to selling semantic solutions has done so poorly.

You will need far deeper research than this to integrate fear into your marketing program but at least it is a starting point for discussion.

I first saw this at Full Text Reports as: The Chapman Survey on American Fears

Larry Garfield: On Drupal's Leadership

Planet DrupalSun, 10/26/2014 - 02:20



My DrupalCon Amsterdam Core Conversation on Managing Complexity has generated quite a bit of follow-up discussion. That's good; it's a conversation we as a community really need to be having.

There are a few points, though, that I feel bear clarification and further explanation as I fear the point of the talk has gotten lost in the details.

Before continuing, if you haven't yet I urge you to watch the session video as well as the background resources linked from the session page. This is not a new conversation; it's the latest chapter in a very long-running discussion that is larger than the Drupal project, and it behooves us all to be aware of the history and context around it.

read more

Wastebook 2014

Another word for itSat, 10/25/2014 - 23:42


Topic Maps

Wastebook 2014: What Washington doesn’t want you to read. (Voodoo Dolls, Gambling Monkeys, Zombies in Love and Paid Vacations for Misbehaving Bureaucrats Top List of the Most Outlandish Government Spending in Wastebook 2014)

From the webpage:

Gambling monkeys, dancing zombies and mountain lions on treadmills are just a few projects exposed in Wastebook 2014 – highlighting $25 billion in Washington’s worst spending of the year.

Wastebook 2014 — the report Washington doesn’t want you to read —reveals the 100 most outlandish government expenditures this year, costing taxpayers billions of dollars.

“With no one watching over the vast bureaucracy, the problem is not just what Washington isn’t doing, but what it is doing.” Dr. Coburn said. “Only someone with too much of someone else’s money and not enough accountability for how it was being spent could come up some of these projects.”

“I have learned from these experiences that Washington will never change itself. But even if the politicians won’t stop stupid spending, taxpayers always have the last word.”

Congress actually forced federal agencies to waste billions of dollars for purely parochial, political purposes.

For example, lawmakers attached a rider to a larger bill requiring NASA to build a $350 million launch pad tower, which was mothballed as soon as it was completed because the rockets it was designed to test were scrapped years ago. Similarly, when USDA attempted to close an unneeded sheep research station costing nearly $2 million every year to operate, politicians in the region stepped in to keep it open.

Examples of wasteful spending highlighted in “Wastebook 2014” include:

  • Coast guard party patrols – $100,000
  • Watching grass grow – $10,000
  • State department tweets @ terrorists – $3 million
  • Swedish massages for rabbits – $387,000
  • Paid vacations for bureaucrats gone wild – $20 million
  • Mountain lions on a treadmill – $856,000
  • Synchronized swimming for sea monkeys – $50,000
  • Pentagon to destroy $16 billion in unused ammunition — $1 billion
  • Scientists hope monkey gambling unlocks secrets of free will –$171,000
  • Rich and famous rent out their luxury pads tax free – $10 million
  • Studying “hangry” spouses stabbing voodoo dolls – $331,000
  • Promoting U.S. culture around the globe with nose flutists – $90 million

Read the full report here.

Watch the Wastebook 2014 videos here and here and here

Wastebook 2014 runs a total of one hundred and ten (110) pages and has 1137 footnotes (with references to data analysis in many cases). It occurs to me to ask if the lavish graphics, design and research were donated by volunteers or perhaps this was the work of paid staff of Sen. Coburn?

The other question to ask is what definition of “waste” is Sen. Coburn using?

I suspect the people who were paid monthly salaries for any of the listed projects would disagree their salaries were “waste.” A sentiment that would be echoed by their landlords, car dealers, grocery stores, etc.

It might be cheaper to simply pay all those staffer and not buy equipment and materials for their projects, but that would have an adverse impact on the vendors for those products and their staffs, who likewise have homes, cars, and participate in their local economies.

Not that governments are the sole offenders when it comes to waste but they are easy targets since unlike most corporations, more information is public about their internal operations.

The useful question that topic maps could play a role in on questions of “waste” would be to track the associations of people involved in a project to all the other participants in the local economy. I think you will find that the economic damage of cutting some “waste” is far higher than the cost of continuing the “waste.”

Such a project would give you the data on which to make principled arguments to distinguish between waste with little local impact and waste with a large local impact.

I first saw this at Full Text Reports as: Wastebook 2014: What Washington doesn’t want you to read.

Data Visualization with JavaScript

Another word for itSat, 10/25/2014 - 23:08


Topic Maps

Data Visualization with JavaScript by Stephen A. Thomas.

From the introduction:

It’s getting hard to ignore the importance of data in our lives. Data is critical to the largest social organizations in human history. It can affect even the least consequential of our everyday decisions. And its collection has widespread geopolitical implications. Yet it also seems to be getting easier to ignore the data itself. One estimate suggests that 99.5% of the data our systems collect goes to waste. No one ever analyzes it effectively.

Data visualization is a tool that addresses this gap.

Effective visualizations clarify; they transform collections of abstract artifacts (otherwise known as numbers) into shapes and forms that viewers quickly grasp and understand. The best visualizations, in fact, impart this understanding subconsciously. Viewers comprehend the data immediately—without thinking. Such presentations free the viewer to more fully consider the implications of the data: the stories it tells, the insights it reveals, or even the warnings it offers. That, of course, defines the best kind of communication.

If you’re developing web sites or web applications today, there’s a good chance you have data to communicate, and that data may be begging for a good visualization. But how do you know what kind of visualization is appropriate? And, even more importantly, how do you actually create one? Answers to those very questions are the core of this book. In the chapters that follow, we explore dozens of different visualizations and visualization techniques and tool kits. Each example discusses the appropriateness of the visualization (and suggests possible alternatives) and provides step-by-step instructions for including the visualization in your own web pages.

To give you a better idea of what to expect from the book, here’s a quick description of what the book is, and what it is not.

The book is a sub-part of where Stephen maintains his blog, listing of talks and a link to his twitter account.

If you are interested in data visualization with JavaScript, this should be on a short list of bookmarks.

Building Scalable Search from Scratch with ElasticSearch

Another word for itSat, 10/25/2014 - 22:46


Topic Maps

Building Scalable Search from Scratch with ElasticSearch by Ram Viswanadha.

From the post:

1 Introduction

Savvy is an online community for the world’s product enthusiasts. Our communities are the product trendsetters that the rest of the world follows. Across the site, our users are able to compare products, ask and answer product questions, share product reviews, and generally share their product interests with one another. boasts a vibrant community that save products on the site at the rate of 1 product every second. We wanted to provide a search bar that can search across various entities in the system – users, products, coupons, collections, etc. – and return the results in a timely fashion.

2 Requirements

The search server should satisfy the following requirements:

  1. Full Text Search: The ability to not only return documents that contain the exact keywords, but also documents that contain words that are related or relevant to the keywords.
  2. Clustering: The ability to distribute data across multiple nodes for load balancing and efficient searching.
  3. Horizontal Scalability: The ability to increase the capacity of the cluster by adding more nodes.
  4. Read and Write Efficiency: Since our application is both read and write heavy, we need a system that allows for high write loads and efficient read times on heavy read loads.
  5. Fault Tolerant: The loss of any node in the cluster should not affect the stability of the cluster.
  6. REST API with JSON: The server should support a REST API using JSON for input and output.

At the time, we looked at Sphinx, Solr and ElasticSearch. The only system that satisfied all of the above requirements was ElasticSearch, and — to sweeten the deal — ElasticSearch provided a way to efficiently ingest and index data in our MongoDB database via the River API so we could get up and running quickly.

If you need an outline for building a basic ElasticSearch system, this is it!

It has the advantage of introducing you to a number of other web technologies that will be handy with ElasticSearch.


Károly Négyesi: Drupal 8 critical issues office hours Oct 24, 2014

Planet DrupalSat, 10/25/2014 - 20:46



This was our first critical office hours. webflo have forward ported a Views SA (turned out that Twig autoescape made short work of the security hole -- yay! so now it's just a test) and even past the office hours followed up with a patch that now passes. I will monitor the issue further and make sure it gets reviewed and committed. ksenzee started on decoupling cache tags from cache bins -- there's no patch yet, I need to follow up on this one however from our discussion it was clear she was making a lot of progress. I was trying to help penyaskito with the language.settings config is not scalable issue but turned out his problems went away with a fresh install so that issue is now progressing well even without the office hours. So as far as I am concerned, that's two down and one moving (and as a bonus, webflo rerolled fix common HTML escaped render #key values due to Twig autoescape which is major I am not sure why it's not critical). I think critical issues office hours was off to a good start, more people would of course be better. I count 123 critical issues.

Overview App API

Another word for itSat, 10/25/2014 - 20:30


Topic Maps

Overview App API

From the webpage:

An Overview App is a program that uses Overview.

You can make one. You know you want to.

Using Overview’s App API you can drive Overview’s document handling engine from your own code, create new visualizations that replace Overview’s default Topic Tree, or write interactive document handling or data extraction apps.

If you don’t remember the Overview Project:

Overview is just what you need to search, analyze and cull huge volumes of text or documents. It was built for investigative journalists who go through thousands of pages of material, but it’s also used by reasearchers facing huge archives and social media analysts with millions of posts. With advanced search and interactive topic modeling, you can:

  • find what you didn’t even know to look for
  • quickly tag or code documents
  • let the computer organize your documents by topic, automatically

Leveraging the capabilities in Overview is a better use of resources than re-inventing basic file and search capabilities.

Understanding Information Retrieval by Using Apache Lucene and Tika

Another word for itSat, 10/25/2014 - 20:15


Topic Maps

Understanding Information Retrieval by Using Apache Lucene and Tika, Part 1

Understanding Information Retrieval by Using Apache Lucene and Tika, Part 2

Understanding Information Retrieval by Using Apache Lucene and Tika, Part 3

by Ana-maria Mihalceanu.

From part 1:

In this tutorial, the Apache Lucene and Apache Tika frameworks will  be explained through their core concepts (e.g.  parsing, mime detection,  content analysis, indexing,  scoring, boosting) via illustrative examples that should be applicable to not only seasoned software developers but to beginners to content analysis and programming as well. We assume you have a working knowledge of the Java™ programming language and plenty of content to analyze.

Throughout this tutorial, you will learn:

  • how to use Apache Tika’s API and its most relevant functions
  • how to develop code with Apache Lucene API and its most important modules
  • how to integrate Apache Lucene and Apache Tika in order to build your own piece of software that stores and retrieves information efficiently. (project code is available for download)

Part 1 introduces you to Apache Lucene and Apache Tika and concludes by covering automatic extraction of metadata from files with Apache Tika.

Part 2 covers extracting/indexing of content, along with stemming, boosting and scoring. (If any of that sounds unfamiliar, this isn’t the best tutorial for you.)

Part 3 details the highlighting of fragments when they match a search query.

A good tutorial on Apache Lucene and Apache Tika, what parts of them are covered, but there was no coverage of information retrieval. For example, part 3 talks about increasing search “efficiency” without any consideration of what “efficiency” might mean in a particular search context.

Illuminating issues in information retrieval using Apache Lucene and Tika as opposed to coding up an indexing/searching application with no discussion of the potential choices and tradeoffs would make a much better tutorial.

An interactive visualization to teach about the curse of dimensionality

Another word for itSat, 10/25/2014 - 19:36


Topic Maps

An interactive visualization to teach about the curse of dimensionality by Jeff Leek.

From the post:

I recently was contacted for an interview about the curse of dimensionality. During the course of the conversation, I realized how hard it is to explain the curse to a general audience. One of the best descriptions I could come up with was trying to describe sampling from a unit line, square, cube, etc. and taking samples with side length fixed. You would capture fewer and fewer points. As I was saying this, I realized it is a pretty bad way to explain the curse of dimensionality in words. But there was potentially a cool data visualization that would illustrate the idea. I went to my student Prasad, our resident interactive viz design expert to see if he could build it for me. He came up with this cool Shiny app where you can simulate a number of points (n) and then fix a side length for 1-D, 2-D, 3-D, and 4-D and see how many points you capture in a cube of that length in that dimension. You can find the full app here or check it out on the blog here:

An excellent visualization of the “curse of dimensionality!”

The full app will take several seconds to redraw the screen when the length of the edge gets to .5 and above (or at least that was my experience).

The 2014 Social Media Glossary: 154 Essential Definitions

Another word for itSat, 10/25/2014 - 18:20


Topic Maps

The 2014 Social Media Glossary: 154 Essential Definitions by Matt Foulger.

From the post:

Welcome to the 2014 edition of the Hootsuite Social Media Glossary. This is a living document that will continue to grow as we add more terms and expand our definitions. If there’s a term you would like to see added, let us know in the comments!

I searched but did not find an earlier version of this glossary on the Hootsuite blog. I have posted a comment asking for pointers to the earlier version(s).

In the meantime, you may want to compare: The Ultimate Glossary: 120 Social Media Marketing Terms Explained by Kipp Bodnar. From 2011 but if you don’t know the terms, even a 2011 posting may be helpful.

We all accept the notion that language evolves but within domains that evolution is gradual and as thinking in that domain shifts, making it harder for domain members to see it.

Tracking a rapidly changing vocabulary, such as the one used in social media, might be more apparent.

Google Lines Its Smart Home Nest Again ... With Revolv

Read/Write WebSat, 10/25/2014 - 17:34



ReadWriteHome is an ongoing series exploring the implications of living in connected homes.

According to the experts, we may all be living in a smart home before long. Google wants it to be theirs. Forget the fact that it doesn’t actually have a cohesive smart home system yet—it’s working on that, and quickly too. Case in point: Its Nest division just bought Revolv, one of the rising stars of the DIY smart home game. 

Nest itself has only been a part of Google for less than a year. In that time, the smart thermostat maker has picked up two popular smart home companies. Dropcam, purchased last June, was the first. Unlike that previous acquisition, however, the Revolv deal is a talent acquisition, reports The Verge. The new owners will take on Revolv team, but leave its product behind. 

Though the hub is not long for this world, at least the masterminds behind Revolv’s technology seem like a good fit for Google … er, Nest. 

Nest API session at Google I/O

The companies insist Nest operates independently, even though it held talks at Google I/O for developers. The division, which makes a smart thermostat and a carbon monoxide detector, also opened up its APIs (see our API explainer), so other products and companies can work with it. 

The end result: an expanding "eco-system” of “works with Nest” products, a line-up that now includes the Pebble smartwatch, as well as a voice-recognition device, a connected sprinkler system and other products. In other words, Nest has begun realizing its promise of becoming a bona fide platform. Now it appears to be pushing that further, by snapping up other companies. 

Revolv may fit nicely into this picture. The company, whose hub sells at places like Best Buy and Home Depot (for now anyway), created a system that other companies and products could tie into. Its hub connected and managed a wide array of devices and appliances, including Yale locks, Philips Hue lightbulbs, Sonos speakers and numerous other products. And as a DIY or do-it-yourself platform, Revolv made it easy for people to install it themselves. 

Revolv’s main competition in the DIY smart home market has been SmartThings, which sold to Samsung earlier this year. Now they operate under the umbrellas of major technology companies—both of which just happen to compete with the same archival: Apple, another tech giant eyeing the smart home space.

Months after its introduction at the Worldwide Developers Conference last June, Apple’s HomeKit initiative is still somewhat hazy. But it remains a looming figure in the smart home competition, and possibly a catalyst accelerating this race of giants.

As for Revolv, its Boulder team will work out of a new Nest office locally. As of this writing, the terms or purchase price of the acquisition was not disclosed. 

Revolv photo courtesy of Revolv; Nest API photo by Adriana Lee for ReadWrite; Nest image by Bit Boy 

Social Tip – Mastering your Twitter Analytics

AnythingGeospatialSat, 10/25/2014 - 14:39


TweetYou Tweet But Are You Aware of your Twitter Analytics? Here’s How — Ok you tweet, but do you have any idea how well your tweeting is doing? Indeed using twitter is subjective and it really is...

[[ This is a content summary only. Visit my website for full links, other content, and more! ]]

IXIS: Strengthening our Relationship with the British Council

Planet DrupalSat, 10/25/2014 - 10:56



We are delighted to be working with the British Council on a new Drupal hosting and infrastructure support project. The British Council are valued clients, and we have worked with them for more than 6 years managing both the global suite of 150 country sites, and the prestigious suite of Drupal teaching and learning sites.

We will be working to to create four individual platforms for hosting key Drupal websites on, moving away from just one main infrastructure, to improve resilience, efficiency and increase availability to the sites which generate more than 35 million page impressions per month and are used by more than 65 million people each year alone.

read more

DCMI's first Governing Board officer transition

Planet RDFFri, 10/24/2014 - 23:59


2014-10-24, In the closing ceremony of DC-2014, DCMI exercised the first Governing Board officer transition under the Initiative's new governance structure. Michael Crandall stepped into the role of Immediate Past Chair as Eric Childress assumed the roles of Chair of DCMI and the Governing Board. Joseph Tennis became the new Chair Elect of the Board and will succeed as Chair at DC-2015 in São Paulo, Brazil. Information about the new DCMI governance structure can be found in the DCMI Handbook at

DCMI/ASIS&T; Webinar - The Learning Resource Metadata Initiative, describing learning resources with, and more?

Planet RDFFri, 10/24/2014 - 23:59


2014-10-24, The Learning Resource Metadata Initiative (LRMI) is a collaborative initiative that aims to make it easier for teachers and learners to find educational materials through major search engines and specialized resource discovery services. The approach taken by LRMI is to extend the ontology so that educationally significant characteristics and relationships can be expressed. In this webinar, Phil Barker and Lorna M. Campbell of Cetis will introduce and present the background to LRMI, its aims and objectives, and who is involved in achieving them. The webinar will outline the technical aspects of the LRMI specification, describe some example implementations and demonstrate how the discoverability of learning resources may be enhanced. Phil and Lorna will present the latest developments in LRMI implementation, drawing on an analysis of its use by a range of open educational resource repositories and aggregators, and will report on the potential of LRMI to enhance education search and discovery services. Whereas the development of LRMI has been inspired by, the webinar will also include discussion of whether LRMI has applications beyond those of Registration at The webinar is free to DCMI Individual & Organizational Members, to ASIS&T Members and at modest fee to non-members.
Subscribe to The Universal Pantograph aggregator