News aggregator

AverageExplorer:…

Another word for itSun, 08/17/2014 - 21:22

Categories:

Topic Maps

AverageExplorer: Interactive Exploration and Alignment of Visual Data Collections, Jun-Yan Zhu, Yong Jae Lee, and Alexei Efros.

Abstract:

This paper proposes an interactive framework that allows a user to rapidly explore and visualize a large image collection using the medium of average images. Average images have been gaining popularity as means of artistic expression and data visualization, but the creation of compelling examples is a surprisingly laborious and manual process. Our interactive, real-time system provides a way to summarize large amounts of visual data by weighted average(s) of an image collection, with the weights reflecting user-indicated importance. The aim is to capture not just the mean of the distribution, but a set of modes discovered via interactive exploration. We pose this exploration in terms of a user interactively “editing” the average image using various types of strokes, brushes and warps, similar to a normal image editor, with each user interaction providing a new constraint to update the average. New weighted averages can be spawned and edited either individually or jointly. Together, these tools allow the user to simultaneously perform two fundamental operations on visual data: user-guided clustering and user-guided alignment, within the same framework. We show that our system is useful for various computer vision and graphics applications.

Applying averaging to images, particularly in an interactive context with users, seems like a very suitable strategy.

What would it look like to have interactive merging of proxies based on data ranges controlled by the user?

Value-Loss Conduits?

Another word for itSun, 08/17/2014 - 20:52

Categories:

Topic Maps

Do you remove links from materials that you quote?

I ask because of the following example:

The research, led by Alexei Efros, associate professor of electrical engineering and computer sciences, will be presented today (Thursday, Aug. 14) at the International Conference and Exhibition on Computer Graphics and Interactive Techniques, or SIGGRAPH, in Vancouver, Canada.

“Visual data is among the biggest of Big Data,” said Efros, who is also a member of the UC Berkeley Visual Computing Lab. “We have this enormous collection of images on the Web, but much of it remains unseen by humans because it is so vast. People have called it the dark matter of the Internet. We wanted to figure out a way to quickly visualize this data by systematically ‘averaging’ the images.”

Which is a quote from: New tool makes a single picture worth a thousand – and more – images by Sarah Yang.

Those passages were reprinted by Science Daily reading:

The research, led by Alexei Efros, associate professor of electrical engineering and computer sciences, was presented Aug. 14 at the International Conference and Exhibition on Computer Graphics and Interactive Techniques, or SIGGRAPH, in Vancouver, Canada.

“Visual data is among the biggest of Big Data,” said Efros, who is also a member of the UC Berkeley Visual Computing Lab. “We have this enormous collection of images on the Web, but much of it remains unseen by humans because it is so vast. People have called it the dark matter of the Internet. We wanted to figure out a way to quickly visualize this data by systematically ‘averaging’ the images.”

Why leave out the hyperlinks for SIGGRAPH and the Visual Computing Laboratory?

Or for that matter, the link to the original paper: AverageExplorer: Interactive Exploration and Alignment of Visual Data Collections (ACM Transactions on Graphics, SIGGRAPH paper, August 2014) which appeared in the news release.

All three hyperlinks enhance your ability to navigate to more information. Isn’t navigation to more information a prime function of the WWW?

If so, we need to clue ScienceDaily and other content repackagers to include hyperlinks passed onto them, at least.

If you can’t be a value-add, at least don’t be a value-loss conduit.

TCP Stealth

Another word for itSun, 08/17/2014 - 20:31

Categories:

Topic Maps

New “TCP Stealth” tool aims to help sysadmins block spies from exploiting their systems by David Meyer.

From the post:

System administrators who aren’t down with spies commandeering their servers might want to pay attention to this one: A Friday article in German security publication Heise provided technical detail on a GCHQ program called HACIENDA, which the British spy agency apparently uses to port-scan entire countries, and the authors have come up with an Internet Engineering Task Force draft for a new technique to counter this program.

The refreshing aspect of this vulnerability is that the details are being discussed in public, as it a partial solution.

Perhaps this is a step towards transparency for cybersecurity. Keeping malicious actors and “security researchers” only in the loop hasn’t worked out so well.

Whether governments fall into “malicious actors” or “security researchers” I leave to your judgement.

Bizarre Big Data Correlations

Another word for itSun, 08/17/2014 - 20:16

Categories:

Topic Maps

Chance News 99 reported the following story:

The online lender ZestFinance Inc. found that people who fill out their loan applications using all capital letters default more often than people who use all lowercase letters, and more often still than people who use uppercase and lowercase letters correctly.

ZestFinance Chief Executive Douglas Merrill says the company looks at tens of thousands of signals when making a loan, and it doesn’t consider the capital-letter factor as significant as some other factors—such as income when linked with expenses and the local cost of living.

So while it may take capital letters into consideration when evaluating an application, it hasn’t held a loan up because of it.

Submitted by Paul Alper

If it weren’t an “online lender,” ZestFinance could take into account applications signed in crayon.

Chance News collects stories with a statistical or probability angle. Some of them can be quite amusing.

Storli Gard: Norway's oldest beer?

Lars MariusSun, 08/17/2014 - 15:59

Categories:

Topic Maps
Leaving Sunnmøre we drove for many hours along narrow mountain valleys. It was a completely different landscape: spruce forest, bare rocks, some farmland, no fjords. Eventually, we left the highway, then turned back west again, following progressively smaller and smaller roads into the mountains. The whole way the view was almost impossibly gorgeous, as if someone had arranged every mountain, house, and cluster of trees for maximum effect. Eventually, the road ended in front of a cluster of wooden houses. There were no signs, but from the map we assumed this had to be our destination: Storli Gard. (This is part 7 of the Norwegian farmhouse ale trip.)

Victor Kane: Super simple example of local drush alias configuration

Planet DrupalSun, 08/17/2014 - 14:12

Categories:

Drupal

So I have a folder for drush scripts _above_ several doc root folders on a dev user's server. And I want to run status or whatever and my own custom drush scripts on _different_ Drupal web app instances. Drush has alias capability for different site instances, so you can do:

$ drush @site1 status

So, how to set up an aliases file?

(I'm on Ubuntu with Drush 6.2.0 installed with PEAR as per this great d.o. doc page Installing Drush on Any Linux Server Out There (Kalamuna people, wouldn't you know it?)).

Careful reading of the excellent drush documentation points you to a Drush Shell Aliases doc page, and from there to the actual example aliases file that comes with every drush installation.

So to be able to run drush commands for a few of my local Drupal instances, I did this:

  • In my Linux user directory, I created the file ~/.drush/aliases.drushrc.php
  • Contents:
<?php $aliases['site1'] = array( 'root' => '/home/thevictor/site1/drupal-yii', 'uri' => 'drupal-yii.example.com', ); $aliases['site2'] = array( 'root' => '/home/thevictor/site2', 'uri' => 'site2.example.com', );

Then I can do, from anywhere as long as I am logged in as that user:

$ cd /tmp
$ drush @site1 status
...
$ drush @site2 status

and lots of other good stuff. Have a nice weekend.

read more

Titan 0.5 Released!

Another word for itSun, 08/17/2014 - 00:30

Categories:

Topic Maps

Titan 0.5 Released!

From the Titan documentation:

1.1. General Titan Benefits

  • Support for very large graphs. Titan graphs scale with the number of machines in the cluster.
  • Support for very many concurrent transactions and operational graph processing. Titan’s transactional capacity scales with the number of machines in the cluster and answers complex traversal queries on huge graphs in milliseconds.
  • Support for global graph analytics and batch graph processing through the Hadoop framework.
  • Support for geo, numeric range, and full text search for vertices and edges on very large graphs.
  • Native support for the popular property graph data model exposed by Blueprints.
  • Native support for the graph traversal language Gremlin.
  • Easy integration with the Rexster graph server for programming language agnostic connectivity.
  • Numerous graph-level configurations provide knobs for tuning performance.
  • Vertex-centric indices provide vertex-level querying to alleviate issues with the infamous super node problem.
  • Provides an optimized disk representation to allow for efficient use of storage and speed of access.
  • Open source under the liberal Apache 2 license.

A major milestone in the development of Titan!

If you are interested in serious graph processing, Titan is one of the systems that should be on your short list.

PS: Matthias Broecheler has posted Titan 0.5.0 GA Release, which has links to upgrade instructions and comments about a future Titan 1.0 release!

Wesley Tanaka: Fast, Low Memory Drupal 6 System Module

Planet DrupalSat, 08/16/2014 - 01:24

Categories:

Drupal

A Drupal 5 version of this module is also available.  If you would like this patch to be committed to Drupal core, please do not leave a comment on this page—please instead add your comment to Drupal issue #455092.

This is a drop-in replacement for the system.module of Drupal 6.33 which makes your Drupal 6 site use less memory and may even make it faster. A test I ran in a development environment with a stock Drupal 6 installation suggested that I got:

read more

Fast, Low Memory Drupal 6 System Module

Wesley TanakaSat, 08/16/2014 - 01:24

Categories:

Drupal

A Drupal 5 version of this module is also available.  If you would like this patch to be committed to Drupal core, please do not leave a comment on this page—please instead add your comment to Drupal issue #455092.

This is a drop-in replacement for the system.module of Drupal 6.33 which makes your Drupal 6 site use less memory and may even make it faster. A test I ran in a development environment with a stock Drupal 6 installation suggested that I got:

read more

our new robo-reader overlords

Another word for itFri, 08/15/2014 - 23:18

Categories:

Topic Maps

our new robo-reader overlords by Alan Jacobs.

After you read this post by Jacobs, be sure to spend time with Flunk the robo-graders by Les Perelman (quoted by Jacobs).

Both raise the issue of what sort of writing can be taught by algorithms that have no understanding of writing?

In a very real sense, the outcome can only be writing that meets but does not exceed what has been programmed into an algorithm.

That is frightening enough for education, but if you are relying on AI or machine learning for intelligence analysis, your stakes may be far higher.

To be sure, software can recognize “send the atomic bomb triggers by Federal Express to this address….,” or at least I hope that is within the range of current software. But what if the message is: “The destroyer of worlds will arrive next week.” Alert? Yes/No? What if it was written in Sanskrit?

I think computers, along with AI and machine learning can be valuable tools, but not if they are setting the standard for review. At least if you don’t want to dumb down writing and national security intelligence to the level of an algorithm.

I first saw this in a tweet by James Schirmer.

One Year

Eric MeyerFri, 08/15/2014 - 21:55

Categories:

Web

Exactly one year ago, in the emergency department of Cape Regional Medical Center, Rebecca had the first of her seizures, and our nightmare began.

Now we are back in the same place for our annual family vacation.  The same resort, the same building, even the same floor, though not the same room.  We go to the beach, we swim in the pools, we play games on the boardwalk.  All the things Rebecca loved to do.  In fact, her first wish with Make-A-Wish was not to go to Disney World.  Her wish was that it be summer so she could come back to New Jersey and do all those things.  Disney was a distant runner-up, a sort of consolation prize for not being able to do what she really wanted to do.

Even as we organized for that Disney trip, Kat and I decided to bring the family to New Jersey for an early vacation, if Rebecca was well enough once June finally came.  And then to come again in August, unless Rebecca was still alive but too sick to make the journey.  Neither came to pass.

Instead, we’re here without her.  I had feared this would be too painful for us to bear, but it isn’t.  New memories are being made with our children, and if sometimes Kat and I are drawn up short by a specific memory, or a wish that Rebecca were here to enjoy the trip with us, or just having the instinct to count three heads before realizing that we only have to count two, it is usually a wistful sorrow rather than a sharp agony.  Usually.

Those newly-made memories, of jumping waves and digging holes in the sand and boardwalk ice cream and going to water parks, are the building blocks of healing.  Forming them in the place that Rebecca loved so much is, we hope, the mortar that will glue them together.  It helps that we love it here too, and that love is limned by the memory of her love.

It all still seems unreal.  Our lives were proceeding as lives do, and then, in the middle of our special family time away, we were suddenly confronted with the horror that our middle child, our five-year-old girl, had a tumor in the middle of her brain.

I remember all the shock and terror and anguish, but not like it was yesterday, because it wasn’t.  It was a year ago today.

Applauding The Ends, Not The Means

Another word for itFri, 08/15/2014 - 21:25

Categories:

Topic Maps

Microsoft scans email for child abuse images, leads to arrest‏ by Lisa Vaas.

From the post:

It’s not just Google.

Microsoft is also scanning for child-abuse images.

A recent tip-off from Microsoft to the National Center for Missing & Exploited Children (NCMEC) hotline led to the arrest on 31 July 2014 of a 20-year-old Pennsylvanian man in the US.

According to the affidavit of probable cause, posted on Smoking Gun, Tyler James Hoffman has been charged with receiving and sharing child-abuse images.

Shades of the days when Kodak would censor film submitted for development.

Lisa reviews the PhotoDNA techniques used by Microsoft and concludes:

The recent successes of PhotoDNA in leading both Microsoft and Google to ferret out child predators is a tribute to Microsoft’s development efforts in coming up with a good tool in the fight against child abuse.

In this particular instance, given this particular use of hash identifiers, it sounds as though those innocent of this particular type of crime have nothing to fear from automated email scanning.

No sane person supports child abuse so the outcome of the case doesn’t bother me.

However, the use of PhotoDNA isn’t limited to photos of abused children. The same technique could be applied to photos of police officers abusing protesters (wonder where you would find those?), etc.

Before anyone applauds Microsoft for taking the role of censor (in the Roman sense), remember that corporate policies change. The goals of email scanning may not be so agreeable tomorrow.

SearchCap: AdWords Close Variants, comScore Search Rankings & Google Now Flights

Search Engine LandFri, 08/15/2014 - 19:59

Categories:

Search
Below is what happened in search today, as reported on Search Engine Land and from other places across the web. From Search Engine Land: ComScore: Yahoo Bounces Back From All-Time Low In Search Share After dropping below a 10% market share for search for the first time in June, Yahoo creeped back...

Please visit Search Engine Land for the full article.

ComScore: Yahoo Bounces Back From All-Time Low In Search Share

Search Engine LandFri, 08/15/2014 - 18:40

Categories:

Search
After dropping below a 10% market share for search for the first time in June, Yahoo creeped back above that threshold, according to ComScore’s July search engine rankings. Yahoo checked in with a 10.0% share, a 0.2 percentage point gain over its record low in June. Yahoo had 1.8 billion...

Please visit Search Engine Land for the full article.

XPERT (Xerte Public E-learning ReposiTory)

Another word for itFri, 08/15/2014 - 17:43

Categories:

Topic Maps

XPERT (Xerte Public E-learning ReposiTory)

From the about page:

XPERT (Xerte Public E-learning ReposiTory) project is a JISC funded rapid innovation project (summer 2009) to explore the potential of delivering and supporting a distributed repository of e-learning resources created and seamlessly published through the open source e-learning development tool called Xerte Online Toolkits. The aim of XPERT is to progress the vision of a distributed architecture of e-learning resources for sharing and re-use.

Learners and educators can use XPERT to search a growing database of open learning resources suitable for students at all levels of study in a wide range of different subjects.

Creators of learning resources can also contribute to XPERT via RSS feeds created seamlessly through local installations of Xerte Online Toolkits. Xpert has been fully integrated into Xerte Online Toolkits, an open source content authoring tool from The University of Nottingham.

Other useful links:

Xerte Project Toolkits

Xerte Community.

You may want to start with the browse option because the main interface is rather stark.

The Google interface is “stark” in the same sense but Google has indexed a substantial portion of all online content. I’m not very likely to draw a blank. Xpert, with a base of 364,979 resources, the odds of my drawing a blank are far higher.

The keywords are in three distinct alphabetical segments, starting with “a” or a digit, ending and then another digit or “a” follows and end, one after the other. Hebrew and what appears to be Chinese appears at the end of the keyword list, in no particular order. I don’t know if that is an artifact of the software or of its use.

The same repeated alphabetical segments occurs in Author. Under Type there are some true types such as “color print” but the majority of the listing is file sizes in bytes. Not sure why file size would be a “type.” Institution has similar issues.

If you are looking for a volunteer opportunity, helping XPert with alphabetization would enhance the browsing experience for the resources it has collected.

I first saw this in a tweet by Graham Steel.

Why Your Company Needs To Write More Open Source Software

Read/Write WebFri, 08/15/2014 - 16:32

Categories:

Web

The Wall Street Journal thinks it's news that Zulily is developing "more software in-house." It's not. At all. As Eric Raymond wrote years ago, 95% of the world's software is written for use, not for sale. The reasons are many, but one stands out: as Zulily CIO Luke Friang declares, it's "nearly impossible for a [off the shelf] solution to keep up with our pace."

True now, just as it was true 20 years ago.

But one thing is different, and it's something the WSJ completely missed. Historically software developed in-house was zealously kept proprietary because, the reasoning went, it was the source of a firm's competitive advantage. Today, however, companies increasingly realize the opposite: there is far more to be gained by open sourcing in-house software than keeping it closed.

Which is why your company needs to contribute more open-source code. Much more.

A Historical Anomaly

We've gone through an anomalous time these past 20 years. While most software continued to be written for internal use, most of the attention has been focused on vendors like SAP and Microsoft that build solutions that apply to a wide range of companies.

That's the theory, anyway.

In practice, buyers spent a small fortune on license fees, then a 5X multiple on top of that to make the software fit their requirements. For example, a company may spend $100,000 on an ERP system, but they're going to spend another $500,000 making it work. 

One of the reasons open source took off, even in applications, was that companies could get a less functional product for free (or a relatively inexpensive fee) and then spend their implementation dollars tuning it to their needs. Either way, customization was necessary, but the open source approach was less costly and arguably more likely to result in a more tailored result.

Meanwhile, technology vendors doubled-down on "sameness," as Redmonk analyst Stephen O'Grady describes:

The mainstream technology industry has, in recent years, eschewed specialization. Virtual appliances, each running a version of the operating system customized for an application or purpose, have entirely failed to dent the sales of general purpose alternatives such as RHEL or Windows. For better than twenty years, the answer to any application data persistence requirement has meant one thing: a relational database. If you were talking about enterprise application development, you were talking about Java. And so on.

Along the way, however, companies discovered that vendors weren't really meeting their needs, even for well-understood product categories like Content Management Systems. They needed different, not same.

So the customers went rogue. They became vendors. Sort of.

Scratching Their Own Itches

As is often the case, O'Grady nails this point. Writing in 2010, O'Grady uncovers an interesting trend: "Software vendors are facing a powerful new market competitor: their customers." 

Think about the most visible technologies today. Most are open source, and nearly all of them were originally written for some company's internal use, or some developer's hobby. Linux, Git, Hadoop, Cassandra, MongoDB, Android, etc. None of these technologies were originally written to be sold as products.

Instead, they were developed by companies—usually Web companies—building software to "scratch their own itches," to use the open source phrase. And unlike previous generations of in-house software developed at banks, hospitals and other organizations, they open sourced the code. 

While some companies eschew developing custom software because they don't want to maintain it, open source (somewhat) mitigates this by letting a community grow up to extend and maintain a project, thereby amortizing the costs of development for the code originators. Yahoo! started Hadoop, but its biggest contributors today are Cloudera and Hortonworks. Facebook kickstarted Cassandra, but DataStax primarily maintains it today. And so on.

Give It Away (Now)

Today real software innovation doesn't happen behind closed doors. Or, if it does, it doesn't stay there. It's open source, and it's upending decades of established software orthodoxy.

Not that it's for the faint of heart. 

The best open-source projects innovate very fast. Which is not the same as saying anyone will care about your open-source code. There are significant pros and cons to open sourcing your code. But one massive "pro" is that the best developers want to work on open code: if you need to hire quality developers, you need to give them an open source outlet for their work. (Just ask Netflix.)

But that's no excuse to sit on the sidelines. It's time to get involved, and not for the good of some ill-defined "community." No, the primary beneficiary of open-source software development is you and your company. Better get started.

Lead image courtesy of Shutterstock.

21 Metrics For Monitoring SEO Health

Search Engine LandFri, 08/15/2014 - 16:00

Categories:

Search
An SEO issue that goes unnoticed, even for a few days, can have a huge impact on business -- so what metrics can you use to detect problems early? The post 21 Metrics For Monitoring SEO Health appeared first on Search Engine Land.

Please visit Search Engine Land for the full article.
Subscribe to The Universal Pantograph aggregator