News aggregator

Eight Years of eagereyes

EagerEyes.org9 hours 6 min ago



What is the purpose of blogging about visualization? Is it to make fun of the bad stuff? Is it to point to pretty things? Is it to explain why things are good or bad? Is it to expand the landscape of ideas and break new ground? Or is it to discuss matters at great length that ultimately don’t matter all that much?

I criticize things, and I think it’s important to do that. I don’t regret any of my postings, however strong they may have been, and however mean they may have sounded. It was all done in good faith and with the intent to point out issues and get people to pay attention.

But increasingly, I’m questioning the thinking that some of that criticism is coming from. I’m not arguing against any particular issue people like to bring up, but I am starting to wonder how much of it is simply coming out of narrow-mindedness and stubbornness. How much of it would be obviated by sitting back, taking a deep breath, and trying to see things from a different angle?

This is not just a question of tone and intensity, but one that goes much deeper: how much do we really know? When you start to ask that question in visualization, it becomes clear very quickly how shockingly little we actually really understand. Going on and on about pie charts? Point to a paper that’s actually showing that they’re bad! Yes, such a paper exists. But how many studies have shown the same thing? Not that many. And it gets much worse for things like 3D bar charts, etc. There is very little support for the religious zealotry with which we like to damn these things.

Then there is the  question of different goals. There isn’t just one use for visualization, and things created for different purposes need to be judged against different standards. It’s all about trade-offs and making decisions. An audience of readers on the web is going to need a different approach than an audience of experts who know the data really well and have a vested interest in digging deeper. An interactive piece on a news media website will need to be much more compelling than a corporate dashboard if anybody’s going to actually bother doing something with it. There is not just one purpose, or one audience, or one way to do things.

It’s encouraging to see the huge interest in visualization. And it’s even more encouraging to see some of the recent and upcoming work on rhetoric, persuasion, and related questions. Because it matters. Communication matters. Data matters. Visualization matters.

Discussing visualization needs to matter too. But it can only do so if it comes from a place of understanding, respect, and an open mind.

To Avoid Liability, Google Limits German News Content To Headlines

Search Engine Land9 hours 42 min ago


German news and magazine publishers are determined, one way or another, to get Google to pay them for their content. They’re not upset about the content appearing in Google News or search. They want it to appear – they just want Google to pay for it. Google doesn’t want to pay....

Please visit Search Engine Land for the full article.

The Missing Piece in Complex Analytics: Low Latency, Scalable Model Management and Serving with Velox

Another word for it13 hours 2 min ago


Topic Maps

The Missing Piece in Complex Analytics: Low Latency, Scalable Model Management and Serving with Velox by Daniel Crankshaw, et al.


To support complex data-intensive applications such as personalized recommendations, targeted advertising, and intelligent services, the data management community has focused heavily on the design of systems to support training complex models on large datasets. Unfortunately, the design of these systems largely ignores a critical component of the overall analytics process: the deployment and serving of models at scale. In this work, we present Velox, a new component of the Berkeley Data Analytics Stack. Velox is a data management system for facilitating the next steps in real-world, large-scale analytics pipelines: online model management, maintenance, and serving. Velox provides end-user applications and services with a low-latency, intuitive interface to models, transforming the raw statistical models currently trained using existing offline large-scale compute frameworks into full-blown, end-to-end data products capable of recommending products, targeting advertisements, and personalizing web content. To provide up-to-date results for these complex models, Velox also facilitates lightweight online model maintenance and selection (i.e., dynamic weighting). In this paper, we describe the challenges and architectural considerations required to achieve this functionality, including the abilities to span online and offline systems, to adaptively adjust model materialization strategies, and to exploit inherent statistical properties such as model error tolerance, all while operating at “Big Data” scale.

Early Warning: Alpha code drop expected December 2014.

If you want to get ahead of the curve I suggest you start reading this paper soon. Very soon.

Written from the perspective of end-user facing applications but applicable to author-facing applications for real time interaction with subject identification.

Integrating Kafka and Spark Streaming: Code Examples and State of the Game

Another word for it13 hours 31 min ago


Topic Maps

Integrating Kafka and Spark Streaming: Code Examples and State of the Game by Michael G. Noll.

From the post:

Spark Streaming has been getting some attention lately as real-time data processing tool, often mentioned alongside Apache Storm. If you ask me, no real-time data processing tool is complete without Kafka integration (smile), hence I added an example Spark Streaming application to kafka-storm-starter that demonstrates how to read from Kafka and write to Kafka, using Avro as the data format and Twitter Bijection for handling the data serialization.

In this post I will explain this Spark Streaming example in further detail and also shed some light on the current state of Kafka integration in Spark Streaming. All this with the disclaimer that this happens to be my first experiment with Spark Streaming.

If mid-week is when you like to brush up on emerging technologies, Michael’s post is a good place to start.

The post is well organized and has enough notes, asides and references to enable you to duplicate the example and to expand your understanding of Kafka and Spark Streaming.

Last Call Media: We sold Drupal to the world

Planet Drupal13 hours 43 min ago


We sold Drupal to the world

(Illustration by Colin Panetta)

Much of the world has standardized on Drupal as their solution for a Content Management System for over a million websites. This is not hard to see. For example, Drupal makes headlines when organizations like NYSE (before merging with ICE) decided to switch to it.

“Once we had those sites up and running there was a huge pent up demand for other sites in the company, and we launched 37 more. It was a big task, as some of those websites hold tens of thousands of pages - being highly regulated we are required to post everything we do online.”
- Bob Kerner, NYSE SVP & Chief Digital Officer 2010

“The important thing for us is that we are able to keep a relatively small team of 60 developers”
- Bob Kerner, NYSE SVP & Chief Digital Officer 2010

“We have tons of work to do, but we will rely on Drupal to build our social community.”
- Bob Kerner, NYSE SVP & Chief Digital Officer 2010


Another example is NBC Universal.

“[NBC Universal has] 30 to 40 leading brands, such as Bravo, Syfy, Telemundo.”
- Christopher Herring, Director, Publishing Program, NBC Universal

“We continue to push Drupal as our standard across the company.”
- Rob Gill, Director, Operations, NBC Universal


One of the most recent large scale pushes to Drupal is well underway at Pfizer. I asked Mike Lamb, Director of Marketing Technology at Pfizer, a few questions about it.

How many Drupal websites are currently in action at Pfizer?
Approx 500 - 

How many people would you say it takes to support these sites?
Easiest to calculate suggesting a core team of 12 and then approx 1 person for every 15 sites, so approx 45 people. That’s to keep the platform running – projects and enhancements is additional.

How many non-Drupal sites will become Drupal sites over the next few years?
I’d say approx 200 migrations per year. Drupal launches are a combination of site migrations and completely new sites.

This is a serious amount of Drupal for one, although a big one, company. I gave this info as a talk at a Drupal Camp in Connecticut, MA. In two years, it will take the total attendance of that camp to support Drupal at Pfizer.

A little closer to home, I asked Gary Parker, Systems Analyst at University of Massachusetts (my alma mater), about it.

How many Drupal websites are currently in action at UMASS?

OIT hosts around 120 production sites.  I believe there are probably another two dozen hosted by various departments managing their own servers.

How many will become Drupal over the next few years?

Given the number of sites currently in development and our rate of growth, I'd expect 30-50 additional Drupal sites within the next year.

These numbers are lower but this is still a lot of Drupal. The holy grail of this type of information, however, is perhaps the growing list of Drupal sites in government. The “list includes embassies, parliaments, governmental portals, police, research centers, ministries/departments, monarchies etc. in more than 150 countries.” Check it out if you haven’t yet. It is awe inspiring.

How did this happen?

A popular answer involves a long list of Drupal’s amazing feature set. But how did that happen? Drupal is not alone. It is just another shining example of a wildly successful open source project. Drupal is to the Content Management System what Linux was for the Operating System. So how do these things happen?

The reason, I think, takes the following points as its premise:

  • Open Source software is inherently inclusive and collaborative.
  • The vast majority of participation is driven by intrinsic motives for personal growth, relationships, and helping others.
  • Participating is an endeavor that creates actual happiness, dedication, and community.
  • Open Source thrives to the extent it is shared.

It is fairly straightforward to get involved in open source. Despite current issues with tech culture, the code is available, the tools are collaborative, and the standards are, for the most part, objective. Community develops from solving intrinsically interesting programming problems. This is rewarding not only to the individuals involved, but open source and the world benefits from this collaboration.

Drupal has fostered such a community for itself by being adequately inclusive and collaborative. It is trusted experts, from this community, that are being asked what they recommend be the solution to the Content Management System issue. Across the world, they are saying, “Drupal, hands down.”

It is in this sense that we have effectively sold Drupal to the world. Now, we must stand by our recommendation. We must support it.

With worldwide adoption at the rate and scale we are seeing, there are some challenges that are coming with it. Here are some:

  • Are we supporting our solution efficiently?
  • Seeming talent shortage
  • Team retention
  • Recruiting
  • Community

Each of these challenges are not unique to Drupal and are painfully experienced across the entire IT industry. Solutions are many and vary significantly between each challenge. Taken one by one, each tell a familiar story.

Are we supporting our solution efficiently?

Drupal is a powerful system with a lot of complexity. It has an infamous learning curve with nearly every Drupal project needing access to an expert a few times in its existence. Are we able to provide the needed level of Drupal support at a sustainable and affordable rate? The number of new Drupal sites is quickly outpacing the number of new Drupal experts. Salaries and rates have been increasing dramatically over the years. Is there a supply and demand issue with supporting Drupal?

A popular response from Drupal experts, “Is this a problem? What’s wrong with being in demand and making a lot of money?” During my survey on this topic, I also got responses like this:

We are basically pretty unhappy about that migration - it almost killed
support for Drupal on this campus, and still might. If we could do it all
over again we'd probably still be on 6.
-Name Withheld - VIP, A Five College Institution

The move from Drupal 6 to Drupal 7 has been very painful for many. Affordable Drupal expertise is rare and in demand, but the show must go on even if it ends horribly at times. It is reasonable to believe that, if this experience were to continue, Drupal would be abandoned.

Seeming talent shortage

Facts on this are popular across the entire computing industry. This one is concise and popular:

Some 1.2 million computing jobs will be available in the US in 2022, yet United States universities will produce only 39 percent of the graduates needed to fill them.
-NCWIT “By the numbers”

With a couple hundred million people out of work worldwide, an industry with an apparent talent shortage should give us pause. If you are a professional in the IT industry, consider this question:

How did you get into your field?

Nearly all answers to this question involve an entertaining tale of happenstance abruptly ending in, “...and that’s how I got into IT.” A popular term for this is, “accidental techie.” Since no career path was chosen, nor specific degree given, the person’s resulting career was accidental. For example, it is not unusual to find an English or Math degree in a Senior Programmer position. To go even further, I don’t find it unreasonable to consider Computer Science degrees in a web developer position as “accidental” in this sense. There is no college course that teaches you how to optimize your local development stack or the importance of limiting rounds of revisions.

I don’t fully agree, however, with the widespread use of this term. I’m sure some people truly do accidentally fall into a career in IT, but the rest end up there by following their heart. The issue is that the paths to entry are confusing, intimidating, and just damn hard for seemingly no good reason. It is not so much that there is a talent shortage as much as the directions in are mostly undefined.

Drupal, it seems, is no exception.

Team retention

If there is a talent shortage, then retention will be a challenge. Many organizations are finding themselves a stepping stone for their employees to reach greener pastures. The big players, with deeper pockets and bigger promises, are harvesting talent from smaller players, leaving the latter’s quality of work inconsistent as they scramble to find and train new talent.

And then there are statistics like this:

56% percent of Women leave IT by mid career
-Harvard Business Review - #10094

Not only are we not producing enough talent to support this industry, but we are driving a staggering portion of it away.


On the question, “What is the biggest recruiting challenge your organization faces?” a Talent Technology 2012 recruitment survey found “Finding good candidates” way out ahead of the pack with, “Filling positions fast,” in close second. Not only can we not find good candidates, but we can’t find them fast enough. There is no surprise here given the discussion so far. 


The last challenge to be considered is us; ourselves. What do we do about this? For challenges so closely related, our solutions tend to be astonishingly specific. What can we do?

Hack Talent Shortage? 

We can’t solve this by staying up late and building a website. And what good will it really do to find a way to pump more people into an industry where a substantial portion are going to leave mid career?

Buy more kegs for the office? 

The people who want more kegs aren’t missing from this equation. The issue is that we’ve hired all the people that are excited by this sort of thing.

Get recruiters access to some NSA backdoors?

Obviously no, but allowing recruiters to be more invasive won’t fix this.

“And, what did you do?” 
-Rita (Nana) Albrecht, My Grandmother (1914-2014)

When I was a kid, my grandmother used to do this thing when I would tell on my sister. I would come running to my grandmother, “She’s annoying me, she’s annoying me, make her stop.” My grandmother would always ask, “And, what did you do?” meaning, what had I done to my sister, which of course I would try to answer, “Nothing…”

She may have just been trying to get the full story but what always stuck with me was, if I just took a look at myself, I could see, I had a role to play in the situation.

So, community, we need to look at ourselves.

Talent Shortage - We need to look at ourselves

Find and support those working to ease entry into this field. Some example organizations (is there a good list somewhere?):

Here are two examples close to my home:

Groups are working hard on this already and they need our support and collaboration. Find and support organizations with goals of increasing student interest in, and preparation for, careers in STEM.

Retention - We need to look at ourselves

Here are some things we can do in our organizations to solve our retention issues:

Manager and Maker schedule distinction (see here)

I’ve seen this change IT company culture drastically for the better. This is a topic all its own, but the basic idea is in recognizing the value in giving your Makers uninterrupted time to complete their work. A Maker is someone who makes something. Writers, Craftsman, Musicians, Painters, and Programmers are examples of Makers. They need schedules with long stretches of uninterrupted time to focus on doing a good job. With this understanding, Managers work to be a distraction buffer, managing incoming issues in order to optimize the experience of the Makers, whose work quality then excels and personal enjoyment increases. Tasks deliver with higher quality resulting in Managers producing overall better projects. Teammates are much less likely to leave a team which works like this.

Consider who your policies and improvements benefit

Team retention means considering everyone. If your policies and improvements tend to focus on a subset of your team, other team members are at risk of increasingly feeling excluded. Not feeling like good fit, they will start to consider your team as a stepping stone to a better situation. A new ping pong table or keg in the office may seem a quick win for smaller homogenous teams but will foster fracturing in better evolved and more realistic situations.

Increase inner company dialog and communication

Have regular conversations about how things are going internally. Work to foster feelings of safety in sharing one’s pain points within the company. It is hard at first but invaluable once people become comfortable with sharing without fear of endangering their job and as people learn to listen without getting defensive. Increasing dialog, increases accountability and alleviates resentments that would otherwise lead to a breakdown in the team. 

Increase inner company transparency

This one is scary for many at first: Work to share more administrative details about the decisions that concern your team. Work to eliminate closed door meetings. Increasing transparency, increases trust, feelings of being trusted and feelings of true belonging to a group. It is also a way to share responsibility and, in that sense, ownership. Bad news is easier for a team to bear, and good news has a greater impact and is more intimate, when the decisions leading up to it were shared.

Make a Company Code of Conduct

Your team may be full of people that feel they don’t need something like this. They may think things like, “if people mistreat me, I’ll just tell them off” or, “we don’t need this because we don’t have a conduct problem.” There is nothing wrong with putting it in writing what is expected and what isn’t tolerated at your company. In fact, doing so means you take it seriously. It means you recognize that people are fallible, don’t always know how to act, and putting it in writing is the first step to actually making an effort to be considerate and accepting of each member of your team. You can be sure this is extremely important to at least a few people on your team, even if they haven’t found a way to express it. Do some research on other Codes of Conduct, it is very worthwhile.

Recruiting - We need to look at ourselves

We saw earlier that the biggest challenge recruiters face in an organization is finding good candidates, and fast enough. We can look at ourselves here and ask, “Who are we attracting?”

Does the organization prioritize things like:

  • Beer outings
  • Ping pong/Air hockey
  • Long hours with big one-time rewards

The first two are examples of things that can feel exclusionary to a good candidate looking for a new team to call home. The last one doesn’t work at all for people with families, for example, and is really only a great thing for very specific individuals having certain responsibilities and not others, like children. Your organization may currently feel on top of the world with those example perks above, but your next great candidates are turning and running away.

We can also ask, “How are we attracting talent?” For example, is the classic intimidating job posting involved?

Consider replacing things like this:

If you think you have the drive and positivity to fill these shoes:

  • One
  • Million
  • Bulletpoints

With things like this:

If you have skills in one of these and are excited by the rest:

  • Fewer
  • Bulletpoints

Adjustments to our hiring techniques that make them more inviting and less intimidating are essential changes to make. We must also take this further by asking ourselves, “How hard are we looking?”

Consider this fact:

26% of the computing workforce in 2013 were women. 
-NCWIT “By the numbers”

in the context of how you answered this question earlier:

How did you get into your field?

Most of us are having to find our way into IT accidentally, and many of us aren’t finding our way at all. The path to an IT career is currently pretty intimidating and rather obfuscated. It can be very hard to know whether or not you are going in the right direction or even just wasting your time trying.


Your next Drupal expert could be hiding beneath a rock of self doubt.


Community - We need to look at ourselves

Read the rest here.

Simmons College and Infocom Corporation renew organizational memberships

Planet RDFWed, 10/01/2014 - 23:59


2014-10-01, DCMI is very pleased to announce the renewal of two of its organizational members for the coming year. The Graduate School of Library and Information Science at Simmons College in Boston, USA, has renewed as an Institutional Member and Infocom Corporation of Japan has renewed as a Supporting Member. The DCMI Supporting Member Program is open to all private sector companies that want to support DCMI financially in continuing its work to the benefit of a healthy metadata ecosystem. The Institutional Member Program is open to all public sector organizations interested in supporting DCMI while participating actively in DCMI governance. Please see the membership page at for more details about DCMI's membership programs.

SMX East 2014 Day Two Live Blog Coverage

Search Engine LandWed, 10/01/2014 - 23:11


Day two of Search Marketing Expo East is now complete and below is some of the live blog coverage we found throughout the day. BuzzFeed Founder Jonah @Peretti Talks SEO, Social at #SMX, #SMX Liveblog: 25 Examples of Structured Data You Can Use Now #22a, #SMX Liveblog:...

Please visit Search Engine Land for the full article.

Encrypting Email with Office 365 Exchange Server

ABA's tech feedWed, 10/01/2014 - 22:47


First, a Word About Exchange

An Exchange Server hosts mailboxes that contain e-mail, calendar, contacts, tasks, and more. It’s an enterprise-grade system that now, thanks to Office 365, is available to small and solo firms at a reasonable price. You can use your own domain names with Exchange server and have anywhere from one to thousands of mailboxes on the system. You can access your Exchange data from Microsoft Outlook on the PC or Mac or from virtually any kind of modern mobile device: smartphones or tablets predominantly. Outlook Web Access is the web-based client that Exchange server offers so that you can access your data from any device that has a web browser and an Internet connection.

You can have multiple email addresses and multiple domain names on the same Exchange mailbox and you easily can share your Exchange data, such as your Inbox or your Calendar, with anybody else in your organization.

All of your Exchange data is encrypted between your client (Outlook or mobile) and the Office 365 Exchange server. It’s also encrypted while it’s sitting on the Exchange server. By extension, any mail you send people in your firm—since it’s always on that Exchange server or transiting to or from Outlook—is encrypted. However, you may want to send an encrypted email to an outside party as well. There are several ways to do it, but here are two options for encrypting email.

Exchange Hosted Encryption (Soon to Be Office 365 Message Encryption)

Microsoft offers a server-side, policy-based encryption solution that lets you encrypt any message sent to any party. You create transport rules on the server side that automatically encrypt messages if they meet certain criteria (such as being sent to or from particular people or containing certain key words in the subject line). The person on the other end receives a regular email message indicating that you’ve sent them an encrypted message. The email has an attachment to click so the recipient can read that message. After clicking the attachment, the browser opens, and the recipient is asked to log in with a free Microsoft account. If the recipient doesn’t have one, he or she will be prompted to create one the first time—after that it should be automatic. Once the recipient successfully authenticates, he or she will be able to read the encrypted message. If the recipient replies to the message, the reply is also encrypted.

Since Exchange Hosted Encryption is server-based, it works regardless of what client you send the email message from. You can send from Outlook, OWA, iPad, Android phone…it doesn’t matter. As long as the message meets the policy criteria you specified in the transport rule, the message will be encrypted. It also means that as long as your message meets the rule, the encryption is automatic—you can’t forget to click the Encrypt button. If you have an E-3 or E-4 plan, you get this encryption service for free. With the other Enterprise plans, including Exchange-only and Kiosk plans, you’ll need to buy the Azure Rights Management service for $2/mailbox/month.


S/MIME (Secure/Multipurpose Internet Mail Extensions) is a method to send secure email messages. It has been around since 1995 and made its Outlook debut in Outlook 97. It’s still available, in its updated version, in Outlook 2013. S/MIME uses public-key encryption to securely sign and encrypt your e-mail messages. Once you have a certificate, you go to (in Outlook) File > Options > Trust Center > Trust Center Settings > E-mail Security to get the dialog box:

Click Import/Export to import your digital certificate. Once you’ve completed that process, you can encrypt an email message by starting an e-mail to somebody, then clicking File > Properties in that e-mail message to get to the Properties dialog box:

Click the Security Settings button to get the Security Properties dialog box and check the box for Encrypt message contents and attachments. Then OK/Close your way back out, and your message should be set for encryption. One catch…you have to already have the other person’s public key attached to their contract record in your Contacts. Once you’ve got that person’s public key—either as an attachment or a download, typically, go to his or her contact record in Outlook’s people record, and click Certificates on the Ribbon. Click the Import button on the right and import their public key file to their contact record. Now you’re ready to send them S/MIME encrypted e-mail.

Additional Encryption Geekery

Public-key encryption uses a combination of two separate keys to encrypt the message:

  1. A public key, which you can publish freely,
  2. A private key, which you keep very secret.

When you want to send an encrypted email to somebody, you encrypt it using a combination of your private key and the other person’s public key. When they receive the message, they decrypt it using a combination of their private key and your public key. Only the right pair of keys will decrypt the message. There are tools that will let you generate your own key pairs or, for added security, you can obtain a key pair from one of the well-established Certificate Authorities like Verisign or Thawte.


Go Further with Office 365
This post was adapted from the Law Practice Division’s publication Microsoft Office 365 for Lawyers. Written by twenty-year legal technology veteran, Ben M. Schorr, this essential guide provides answers to the common questions asked by lawyers when migrating their offices to Office 365.

Learn More

The post Encrypting Email with Office 365 Exchange Server appeared first on Law Technology Today.

The Case for HTML Word Processors

Another word for itWed, 10/01/2014 - 22:07


Topic Maps

The Case for HTML Word Processors by Adam Hyde.

From the post:

Making a case for HTML editors as stealth Desktop Word Processors…the strategy has been so stealthy that not even the developers realised what they were building.

We use all these over-complicated softwares to create Desktop documents. Microsoft Word, LibreOffice, whatever you like – we know them. They are one of the core apps in any users operating system. We also know that they are slow, unwieldy and have lots of quirky ways of doing things. However most of us just accept that this is the way it is and we try not to bother ourselves by noticing just how awful these softwares actually are.

So, I think it might be interesting to ask just this simple question – what if we used Desktop HTML Editors instead of Word Processors to do Word Processing? It might sound like an irrational proposition…Word Processors are, after all, for Word Processing. HTML editors are for creating…well, …HTML. But lets just forget that. What if we could allow ourselves to imagine we used an HTML editor for all our word processing needs and HTML replaces .docx and .odt and all those other over-burdened word processing formats. What do we win and what do we lose?

I’m not convinced about HTML word processors but Adam certainly starts with the right question:

What do we win and what do we lose? (emphasis added)

Line your favorite word processing format up along side HTML + CSS and calculate the wins and loses.

Not that HTML word processors can, should or will replace complex typography when appropriate, but how many documents need the full firepower of a modern word processor?

I would ask a similar question about authoring interfaces for topic maps. What is the least interface that can usefully produce a topic map?

The full bells and whistle versions are common now (I omit naming names) but should those be the only choices?

PS: As far as MS Word, I use “open,” “close,” “save,” “copy,” “paste,” “delete,” “hyperlink,” “bold,” and “italic.” What’s that? Nine operations? You experience may vary.

I use LaTeX and another word processing application for most of my writing off the Web.

I first saw this in a tweet by Ivan Herman

FOAM (Functional Ontology Assignments for Metagenomes):…

Another word for itWed, 10/01/2014 - 21:43


Topic Maps

FOAM (Functional Ontology Assignments for Metagenomes): a Hidden Markov Model (HMM) database with environmental focus by Emmanuel Prestat, et al. (Nucl. Acids Res. (2014) doi: 10.1093/nar/gku702 )


A new functional gene database, FOAM (Functional Ontology Assignments for Metagenomes), was developed to screen environmental metagenomic sequence datasets. FOAM provides a new functional ontology dedicated to classify gene functions relevant to environmental microorganisms based on Hidden Markov Models (HMMs). Sets of aligned protein sequences (i.e. ‘profiles’) were tailored to a large group of target KEGG Orthologs (KOs) from which HMMs were trained. The alignments were checked and curated to make them specific to the targeted KO. Within this process, sequence profiles were enriched with the most abundant sequences available to maximize the yield of accurate classifier models. An associated functional ontology was built to describe the functional groups and hierarchy. FOAM allows the user to select the target search space before HMM-based comparison steps and to easily organize the results into different functional categories and subcategories. FOAM is publicly available at

Aside from its obvious importance for genomics and bioinformatics, I mention this because the authors point out:

A caveat of this approach is that we did not consider the quality of the tree in the tree-splitting step (i.e. weakly supported branches were equally treated as strongly supported ones), producing models of different qualities. Nevertheless, we decided that the approach of rational classification is better than no classification at all. In the future, the groups could be recomputed, or split more optimally when more data become available (e.g. more KOs). From each cluster related to the KO in process, we extracted the alignment from which HMMs were eventually built.

I take that to mean that this “ontology” represents no unchanging ground truth but rather an attempt to enhance the “…screening of environmental metagenomic and metatranscriptomic sequence datasets for functional genes.”

As more information is gained, the present “ontology” can and will change. Those future changes create the necessity to map those changes and the facts that drove them.

I first saw this in a tweet by Jonathan Eisen

Continuum Analytics Releases Anaconda 2.1

Another word for itWed, 10/01/2014 - 21:18


Topic Maps

Continuum Analytics Releases Anaconda 2.1 by Corinna Bahr.

From the post:

Continuum Analytics, the premier provider of Python-based data analytics solutions and services, announced today the release of the latest version of Anaconda, its free, enterprise-ready collection of libraries for Python.

Anaconda enables big data management, analysis, and cross-platform visualization for business intelligence, scientific analysis, engineering, machine learning, and more. The latest release, version 2.1, adds a new version of the Anaconda Launcher and PyOpenSSL, as well as updates NumPy, Blaze, Bokeh, Numba, and 50 other packages.

Available on Windows, Mac OS X and Linux, Anaconda includes more than 195 of the most popular numerical and scientific Python libraries used by scientists, engineers and data analysts, with a single integrated and flexible installer. It also allows for the mixing and matching of different versions of Python (2.6, 2.7, 3.3, 3.4), NumPy, SciPy, etc., and the ability to easily switch between these environments.

See the post for more details, check the change log, or, what the hell, download the most recent version of Anaconda.

Remember, it’s open source so you can see “…where it keeps its brain.” Be wary of results based on software that operates behind a curtain.

BTW, check out the commercial services and products from Continuum Analytics if you need even more firepower for your data processing.

SearchCap: Google Crawling, CTR Study & Focus On The User

Search Engine LandWed, 10/01/2014 - 21:00


Below is what happened in search today, as reported on Search Engine Land and from other places across the web. From Search Engine Land: Up Close @ SMX East: How Ads Influence Organic Click-Through Rate On Google If the SERPs are a zero-sum game, where drawing a click in one place takes it away...

Please visit Search Engine Land for the full article.

Uncovering Community Structures with Initialized Bayesian Nonnegative Matrix Factorization

Another word for itWed, 10/01/2014 - 20:28


Topic Maps

Uncovering Community Structures with Initialized Bayesian Nonnegative Matrix Factorization by Xianchao Tang, Tao Xu, Xia Feng, and, Guoqing Yang.


Uncovering community structures is important for understanding networks. Currently, several nonnegative matrix factorization algorithms have been proposed for discovering community structure in complex networks. However, these algorithms exhibit some drawbacks, such as unstable results and inefficient running times. In view of the problems, a novel approach that utilizes an initialized Bayesian nonnegative matrix factorization model for determining community membership is proposed. First, based on singular value decomposition, we obtain simple initialized matrix factorizations from approximate decompositions of the complex network’s adjacency matrix. Then, within a few iterations, the final matrix factorizations are achieved by the Bayesian nonnegative matrix factorization method with the initialized matrix factorizations. Thus, the network’s community structure can be determined by judging the classification of nodes with a final matrix factor. Experimental results show that the proposed method is highly accurate and offers competitive performance to that of the state-of-the-art methods even though it is not designed for the purpose of modularity maximization.

Some titles grab you by the lapels and say, “READ ME!,” don’t they?

I found the first paragraph a much friendlier summary of why you should read this paper (footnotes omitted):

Many complex systems in the real world have the form of networks whose edges are linked by nodes or vertices. Examples include social systems such as personal relationships, collaborative networks of scientists, and networks that model the spread of epidemics; ecosystems such as neuron networks, genetic regulatory networks, and protein-protein interactions; and technology systems such as telephone networks, the Internet and the World Wide Web [1]. In these networks, there are many sub-graphs, called communities or modules, which have a high density of internal links. In contrast, the links between these sub-graphs have a fairly lower density [2]. In community networks, sub-graphs have their own functions and social roles. Furthermore, a community can be thought of as a general description of the whole network to gain more facile visualization and a better understanding of the complex systems. In some cases, a community can reveal the real world network’s properties without releasing the group membership or compromising the members’ privacy. Therefore, community detection has become a fundamental and important research topic in complex networks.

If you think of “the real world network’s properties” as potential properties for identification of a network as a subject or as properties of the network as a subject, the importance of this article becomes clearer.

Being able to speak of sub-graphs as subjects with properties can only improve our ability to compare sub-graphs across complex networks.

BTW, all the data used in this article is available for downloading:

I first saw this in a tweet by Brian Keegan.

Mediacurrent: Draggableviews and Custom Publishing Options, an Alternative to Nodequeue

Planet DrupalWed, 10/01/2014 - 20:08



When approaching new Drupal projects, I’m always excited to listen and learn about the project’s requirements. It’s an occasion to create just the right solution. With a recent project, I took the opportunity to rethink the use of Nodequeue to manage front page content, and instead used the Draggableviews and Custom Publishing Options projects. Before diving into that solution, let’s step through other solutions to manage front page content, so we can undestand the pros and cons of each.

World's top universities 2014 according to Times Higher Education

Datablog (the Guardian)Wed, 10/01/2014 - 20:01



The California Institute for Technology has been named the worlds best institution for the fourth consecutive year. How do other universities compare?

The California Institute for Technology (Caltech) has been named the worlds best institution for the fourth consecutive year in the Times Higher Educations annual league table.

Harvard and Oxford follow in second and third place respectively. Stanford has retained its fourth place position while the University of Cambridge has climbed two places to fifth and the Massachusetts Institute of Technology has dropped one place to sixth on the rankings.

Continue reading...

Big Organizations Have A Love/Hate/Love Affair With Hadoop

Read/Write WebWed, 10/01/2014 - 19:49



CIOs are of two minds on Hadoop. They're not convinced that it can help them tackle their data but at the same time they're hiring Hadoop talent in droves. What gives?

The reality is that while enterprises continue to stumble in the Big Data dark, they know that Hadoop—a framework for storing and running local calculations on distributed pools of data—will almost certainly be a core part of the answer, and they're investing heavily to ensure they don't get left behind. The primary investment? Hiring Hadoop talent.

Hadoop's Big Data Gravy Train

Last week I detailed how Big Data fence-sitters are increasing their big data experiments. Companies that had "no plans to invest" in big data technologies in 2012 and 2013 are finally starting to experiment in 2014:

Source: Gartner

At the same time, however, enterprises aren't necessarily convinced that Hadoop—the poster child for Big Data—is the answer to their big data woes. 

According to a recent Barclays survey of 100 CIOs, for example, Hadoop still has a lot to prove. Of the 100 CIOs polled, 72 indicated that it's "still too early to say whether Hadoop would become an important technology in their organization."

In a separate survey of data scientists, 76% found Hadoop too difficult to program, among other complaints, causing 35% to give up on Hadoop altogether.

Of course, sometimes people use Hadoop where they shouldn't. As Facebook's analytics chief Ken Rudin said at Strata in 2013: "Hadoop isn't always the best tool for what we need to do.... In reality, big data should include Hadoop and it should include relational [databases], and it should include any other technology that is suited for the task at hand." 

High Demand For Hadoop Talent

And yet Hadoop is absolutely the right technology for a wide array of Big Data uses, and offers an array of benefits that traditional data technologies fail to deliver. This shows up pretty clearly in the job postings. As reported recently, Hadoop continues to stand out as one of the hottest job skills in the industry, paying a median salary of over $100,000 per year, according to PayScale data.

In terms of absolute jobs, Oracle still rules the roost, though it's fading fast, according to Indeed job data:

Source: Indeed

But if we look at relative job growth, Hadoop has everyone beat, whether traditional systems like Teradata and Oracle or even new data technologies like NoSQL databases:

Source: Indeed

By some measures, Hadoop demand is up 34% since 2013. Though a few years old, McKinsey & Co.'s 2011 report on big data holds true today:

By 2018, the United States alone could face a shortage of 140,000 to 190,000 people with deep analytical skills as well as 1.5 million managers and analysts with the know-how to use the analysis of big data to make effective decisions.

While Hadoop isn't the only big data technology, it's a reasonable proxy for big data, generally. And demand is high. 

True, it may be, as Twitter open source chief Chris Aniszczyk highlights, that we're "still in the early adopter stage and most companies don't generate enough data to warrant Hadoop scale setups," leading CIOs to question the value they'd get from Hadoop. Maybe. But it's also the case that few can afford to wait. Data is increasingly the primary currency for competition, and Hadoop is at the heart of it.

Lead image courtesy of Shutterstock

Drupal core announcements: This Month in Drupal Documentation

Planet DrupalWed, 10/01/2014 - 19:30



Here's an update from the Documentation Working Group (DocWG) on what has been happening in Drupal Documentation in the last month or so. Sorry... because this is posted in the Core group as well as Documentation, comments are disabled.

If you have comments or suggestions, please see the DocWG home page for how to contact us. Thanks!

Notable Documentation Updates

Here are some Community Documentation pages that were updated this past month:

  • ruscoe updated several pages of documentation about the Drupal Commerce IATS module. We always love to see contributed module maintainers documenting their modules -- thanks Dan!
  • andrisek updated several pages of documentation about the ERPAL CRM system contributed module. In this case, he's not even an official maintainer of the project -- we always love to see community members updating documentation too -- thanks Daniel!
  • chrischinchilla went through the Installation Guide and made updates for Drupal 8. That was one of our "Priority" tasks -- thanks Chris!
  • Many people updated documentation in preparation for code sprints in Amsterdam, to help new contributors get up to speed quickly. Always a good idea!
  • And there were many more updates... see below.

See the DocWG home page for how to contact us, if you'd like to be listed here in our next post!

Thanks for contributing!

Since September 1 (our previous TMIDD post), 229 contributors have made 629 total documentation page revisions, including 2 people that made more than 20 edits (andrisek and realityloop) -- thanks everyone!

In addition, there were many many commits to Drupal Core and contributed projects that improved documentation -- these are hard to count, because many commits combine code and documentation -- but they are greatly appreciated too!

Documentation Priorities

The Current documentation priorities page is always a good place to look to figure out what to work on, and has been updated recently.

If you're new to contributing to documentation, these projects may seem a bit overwhelming -- so why not try out a New contributor task to get started?

Upcoming Events - DrupalCon Amsterdam - THIS FRIDAY, October 3 - sprint! - DrupalCon Latin America, Bogotá, Columbia, Feb 10-12, 2015

Report from the Working Group

We're pleased to announce that Antje Lorch (ifrik) has officially joined the Documentation Working Group. She's been a leader of documentation events and has been participating in WG meetings for a while, so it's great to have her officially on board. Welcome Antje!

In our last This Month post, we forgot to report on a couple of our "infrastructure and tools" projects that were completed in August:

We're currently working on a new project: integrating results into the search box -- stay tuned for updates on that!

Finally, our next meeting will be October 22nd. We normally meet using Google Hangouts (although last month we met in IRC due to technical difficulties); if you'd like to join us, contact Boris (batigolix).

LinkedIn's New Tools Help Students Plan Future Careers

Read/Write WebWed, 10/01/2014 - 19:27



For some students, finding the right career path starts with finding the right school. LinkedIn announced new tools that use data from members to help students figure out which universities are most likely to help them find their dream jobs.

"Decision Boards" are like Pinterest for universities; they let students create pinboards with cards like post-it notes that contain information about schools or subject areas. They can add their own notes as well.

To help students figure out which schools or fields to add to their decision boards, LinkedIn also has a new university ranking tool. By analyzing member profiles, it determines which careers are most often chosen by graduates of particular schools—making it possible to which universities are best at turning out, say, software developers.

Students can also search on a number of variables, such as career field, location, and even a specific company they'd like to work at, to see which university was most frequently attended by LinkedIn members who fit that description. So you could search for "fashion merchandiser who works for Nordstrom and lives in New York" to get a list of promising schools.

LinkedIn has put an increased effort in attracting young people and students to the network. Last year the company introduced university pages and lowered the age limit to 14, and sometimes works with student groups to help set up LinkedIn profiles and meet professionals in fields they want to pursue.

The company also recently introduced an algorithm that can predict future careers for current professionals on LinkedIn. Now it appears the company wants to expand its fortune-telling capabilities to majors and universities.

Lead photo by Jimmy Thomas on Flickr; screenshot courtesy of LinkedIn

Subscribe to The Universal Pantograph aggregator