Another word for it

Subscribe to Another word for it feed
Updated: 12 hours 48 min ago

The Field Guide to Data Science

Mon, 03/30/2015 - 14:00


Topic Maps

The Field Guide to Data Science by Booz Allen Hamilton.

From “The Story of the Field Guide:”

While there are countless industry and academic publications describing what Data Science is and why we should care, little information is available to explain how to make use of data as a resource. At Booz Allen, we built an industry-leading team of Data Scientists. Over the course of hundreds of analytic challenges for dozens of clients, we’ve unraveled the DNA of Data Science. We mapped the Data Science DNA to unravel the what, the why, the who and the how.

Many people have put forth their thoughts on single aspects of Data Science. We believe we can offer a broad perspective on the conceptual models, tradecraft, processes and culture of Data Science. Companies with strong Data Science teams often focus
on a single class of problems – graph algorithms for social network analysis and recommender models for online shopping are two notable examples. Booz Allen is different. In our role as consultants, we support a diverse set of clients across a variety of domains. This allows us to uniquely understand the DNA of Data Science. Our goal in creating The Field Guide to Data Science is to capture what we have learned and to share it broadly. We want this effort to help drive forward the science and art of Data Science.

This is a great example of what can be done with authors, professional editors and graphic artists putting together a publication.

While it is just a “field guide,” it has enough depth to use it as a starting point for exploring data science projects.

Imagine that you have senior staff who have read and have a grasp of the field guide. I can easily imagine taking the appropriate parts of the field guide to serve as “windows” onto further steps for a particular project. Which would enable senior staff to remain grounded in what they understand and how further steps related back to that understanding. Overall I think this is an excellent field guide/introduction to data science.

BTW, the “Analytic Connection in the Data Lake” graphic on page 28 is similar to topic maps pointing into an infoverse.

I first saw this in a tweet by Kirk Borne.


Hotel Wi-Fi Insecurity – Big Time

Mon, 03/30/2015 - 13:28


Topic Maps

Hotel Wi-Fi router security hole: will this be the Ultimate Pwnie Award Winning Bug for 2015? by Paul Ducklin.

Paul has a highly amusing account of the Pwnie awards and his choice for 2015: CVE-2015-0932, Vulnerability Note VU#930956.

The security hole at issue:

Multiple ANTlabs InnGate models allow unauthenticated read/write to filesystem.

Simply put, some versions of a popular hotel internet access server – those portals you interact with to get Wi-Fi access while you’re at a conference centre or staying in a hotel – can be completely drained of data, and then reprogrammed arbitrarily, via the outside (internet-facing) interface.

Without any authentication.

See Paul’s post for all the details, including a very lucid discussion of rsync that is guaranteed to hold you attention.

Paul also has suggestions for avoiding unpatched ANTlabs InnGate hotel internet access servers.

You can even help your local hotel community by finding unpatched servers. Say near law enforcement conferences. The Department of Homeland Security has helpfully made a list of law enforcement meetings for 2015. (I have a copy just in case it disappears.)

OpenTSDB 2.0.1

Sun, 03/29/2015 - 23:52


Topic Maps

OpenTSDB 2.0.1

From the homepage:


  • Data is stored exactly as you give it
  • Write with millisecond precision
  • Keep raw data forever


  • Runs on Hadoop and HBase
  • Scales to millions of writes per second
  • Add capacity by adding nodes


  • Generate graphs from the GUI
  • Pull from the HTTP API
  • Choose an open source front-end

If that isn’t impressive enough, check out the features added for the 2.0 release:

OpenTSDB has a thriving community who contributed and requested a number of new features. 2.0 has the following new features:

  • Lock-less UID Assignment – Drastically improves write speed when storing new metrics, tag names, or values
  • Restful API – Provides access to all of OpenTSDB’s features as well as offering new options, defaulting to JSON
  • Cross Origin Resource Sharing – For the API so you can make AJAX calls easily
  • Store Data Via HTTP – Write data points over HTTP as an alternative to Telnet
  • Configuration File – A key/value file shared by the TSD and command line tools
  • Pluggable Serializers – Enable different inputs and outputs for the API
  • Annotations – Record meta data about specific time series or data points
  • Meta Data – Record meta data for each time series, metrics, tag names, or values
  • Trees – Flatten metric and tag combinations into a single name for navigation or usage with different tools
  • Search Plugins – Send meta data to search engines to delve into your data and figure out what’s in your database
  • Real-Time Publishing Plugin – Send data to external systems as they arrive to your TSD
  • Ingest Plugins – Accept data points in different formats
  • Millisecond Resolution – Optionally store data with millisecond precision
  • Variable Length Encoding – Use less storage space for smaller integer values
  • Non-Interpolating Aggregation Functions – For situations where you require raw data
  • Rate Counter Calculations – Handle roll-over and anomaly supression
  • Additional Statistics – Including the number of UIDs assigned and available

I suppose traffic patterns (license plates) are a form of time series data. Yes?

New York Times Confirms: Meaning Is Important!

Sun, 03/29/2015 - 23:08


Topic Maps

Kate O’Neill tweeted:

There’s that pesky need for meaning again: “Learning to See Data” via @NYTimes #bigdata

, along with this image:

I’m flogging their content on my dime, despite their firewall. Go figure.

Will the trough of disillusionment with Big Data be mostly due to lack of meaning? Lack of integration (which depends on meaning)? Lack of return (which depends on integration and meaning)?

Or some other cause? Such as visualizing data as a replacement for the meaning of data. None of the better visualization experts would so advise but is your visualization vendor one of them?

The Theory of Relational Databases

Sun, 03/29/2015 - 22:02


Topic Maps

The Theory of Relational Databases by David Maier.

From the webpage:

This text has been long out of print, but I still get requests for it. The copyright has reverted to me, and you have permission to reproduce it for personal or academic use, but not for-profit purposed. Please include “Copyright 1983 David Maier, used with permission” on anything you distribute.

Out of date, 1983, if you are looking for the latest work but not if you are interested in where we have been. Sometimes the later is more important than the former.


The Genetic Programming Bibliography (Hits 10K!)

Sun, 03/29/2015 - 20:35


Topic Maps

The Genetic Programming Bibliography by William Langdon.

A truly awesome bibliography collection and tool!

The introduction (10 pages PDF) is a model of clarity and will enhance your use/enjoyment of this bibliography.

You will also find there:Ai/index.html

I first saw this in a tweet by Jason H. Moore, PhD.

I suppose Google may downgrade my search listing because I have included a list of URLs that may be useful to you.

I prefer to post useful data for my readers than I care about gaming Google. If more of us felt that way, search results might be less the products of SEO gaming.

Collected, Vetted, Forty Visualization Blogs

Sun, 03/29/2015 - 20:07


Topic Maps

Blog Radar at VisuaLoop.

You can run a search with your favorite search engine on “visualization blogs” + “data-visualization blogs” and get about 6,500 “hits,” including duplicates. This is the weed yourself option.

Or you can choose Blog Radar and have forty (40) blogs without duplicates with three (3) posts for each one. This is the pre-weeded option.

Whether you want to expand your blog reading or are looking for a good starting point for a crawler on visualization, you will be hard pressed to find a better resource.


Big Data Leaves Money On The Table

Sun, 03/29/2015 - 18:52


Topic Maps

Big data hype reminds me of the “He’s Large” song from Popeye.

The recurrent theme is that whatever his other qualities, Bluto is large.

I mention that because Anthony Smith illustrates in When it Comes to Data, Small is the New Big, big data is great, but it never tells the whole story.

The whole story includes how and why customers buy and use your product. Trivial things like that.

Don’t use big data like the NSA uses phone data:

There is no other way we know of to connect the dots NSA & Connecting the Dots

Big data can show a return on your investment but it will only show you some of the facts that are available.

Don’t allow a fixation on “big data” blind you to the value of small data, which isn’t available to big data approaches and tools.

PS: The NSA uses phone data as churn for the sake their budget. Churn of big data doesn’t add to your bottom line.

Income Inequality – News Analysis

Sun, 03/29/2015 - 16:14


Topic Maps

Discussions of increasing income inequality are common but disconnected from facts after disparity is established. By disconnected from facts I mean, what facts should guide policies to reduce income inequality?

In general I favor reducing income inequality (substantially) so that gives you an idea of where I fall in the discussion.

I am not entirely sold on the ideas discussed in A 26-year-old MIT graduate is turning heads over his theory that income inequality is actually about housing (in 1 graph) by Greg Ferenstein, but the analysis by Matthew Rognlie is different enough from the usual positions to merit your attention.

From the post:

Wealthy tech founders and the automation of middle-class jobs are often blamed for increasing concentrations of wealth in fewer hands. But, a 26-year-old MIT graduate student, Matthew Rognlie, is making waves for an alternative theory of inequality: the problem is housing [PDF].

Rognlie is attacking the idea that rich capitalists have an unfair ability to turn their current wealth into a lazy dynasty of self-reinforcing investments. This theory, made famous by French economist Thomas Piketty, argues that wealth is concentrating in the 1% because more money can be made by investing in machines and land (capital) than paying people to perform work (wages). Because capital is worth more than wages, those with an advantage to invest now in capital become the source of long-term dynasties of wealth and inequality.

Rognlie’s blockbuster rebuttal to Piketty is that “recent trends in both capital wealth and income are driven almost entirely by housing.” Software, robots, and other modern investments all depreciate in price as fast as the iPod. Technology doesn’t hold value like it used to, so it’s misleading to believe that investments in capital now will give rich folks a long-term advantage.

Housing policy. Now there’s a minefield of conflicting interests. Still, if you want to follow where the “data leads” then it is worth a comparison to other policy alternatives and explanations.

To be sure, you can create topic maps that replicate the fanatical ravings of any economic theorist you care to follow but my prejudice is in favor of a critical examination of news reports and policy recommendations. Reasoning you can further your clients goals if your analysis has some resemblance to reality as experienced by others. (That may explain the long term and persistent failures of U.S. policy in the Middle East but I digress.)

Using Spark DataFrames for large scale data science

Sat, 03/28/2015 - 00:33


Topic Maps

Using Spark DataFrames for large scale data science by Reynold Xin.

From the post:

When we first open sourced Spark, we aimed to provide a simple API for distributed data processing in general-purpose programming languages (Java, Python, Scala). Spark enabled distributed data processing through functional transformations on distributed collections of data (RDDs). This was an incredibly powerful API—tasks that used to take thousands of lines of code to express could be reduced to dozens.

As Spark continues to grow, we want to enable wider audiences beyond big data engineers to leverage the power of distributed processing. The new DataFrame API was created with this goal in mind. This API is inspired by data frames in R and Python (Pandas), but designed from the ground up to support modern big data and data science applications. As an extension to the existing RDD API, DataFrames feature:

  • Ability to scale from kilobytes of data on a single laptop to petabytes on a large cluster
  • Support for a wide array of data formats and storage systems
  • State-of-the-art optimization and code generation through the Spark SQL Catalyst optimizer
  • Seamless integration with all big data tooling and infrastructure via Spark
  • APIs for Python, Java, Scala, and R (in development via SparkR)

For new users familiar with data frames in other programming languages, this API should make them feel at home. For existing Spark users, this extended API will make Spark easier to program, and at the same time improve performance through intelligent optimizations and code-generation.

If you don’t know Spark DataFrames, you are missing out on important Spark capabilities! This post will have to well on the way to recovery.

Even though the reading of data from other sources is “easy” in many cases and support for more is growing, I am troubled by statements like:

DataFrames’ support for data sources enables applications to easily combine data from disparate sources (known as federated query processing in database systems). For example, the following code snippet joins a site’s textual traffic log stored in S3 with a PostgreSQL database to count the number of times each user has visited the site.

That goes well beyond reading data and introduces the concept of combining data, which isn’t the same thing.

For any two data sets that are trivially transparent to you (caveat what is transparent to you may/may not be transparent to others), that example works.

That example fails where data scientists spend 50 to 80 percent of their time: “collecting and preparing unruly digital data.” For Big-Data Scientists, ‘Janitor Work’ Is Key Hurdle to Insights.

If your handlers are content to spend 50 to 80 percent of your time munging data, enjoy. Not that munging data will ever go away, but documenting the semantics of your data can enable you to spend less time munging and more time on enjoyable tasks.

United States Code (from Office of the Law Revision Counsel)

Fri, 03/27/2015 - 22:08


Topic Maps

United States Code (from Office of the Law Revision Counsel)

The download page says:

Current Release Point

Public Law 113-296 Except 113-287

Each update of the United States Code is a release point. This page provides downloadable files for the current release point. All files are current through Public Law 113-296 except for 113-287. Titles in bold have been changed since the last release point.

A User Guide and the USLM Schema and stylesheet are provided for the United States Code in XML. A stylesheet is provided for the XHTML. PCC files are text files containing GPO photocomposition codes (i.e., locators).

Information about the currency of United States Code titles is available on the Currency page. Files for prior release points are available on the Prior Release Points page. Older materials are available on the Annual Historical Archives page.

You can download as much or as little of the United States Code in XML, XHTML, PCC or PDF format.

Oh, yeah, the 113-287 reference does seem rather cryptic. What? You don’t keep up with Public Law numbers?

The short story is that Congress passed a bill to move material on national parks to volume 54 and that hasn’t happened, yet. If you need more details, see: Title 54 of the U.S. Code: Background and Guidance by the National Park Service.

You can think of this as the outcome of the sausage making process. Interesting in its own right but not terribly helpful in divining the process that produced it.


PS: On Ubuntu, the site displays great on Chrome, don’t know about IE*, and poorly on FireFox.

Data and Goliath – Bruce Schneler – New Book!

Fri, 03/27/2015 - 15:03


Topic Maps

In a recent review of Data and Goliath: The Hidden Battles to Capture Your Data and Control Your World by Bruce Schneler, Steven Aftergood writes in Data and Goliath: Confronting the Surveillance Society that:

“More than just being ineffective, the NSA’s surveillance efforts have actually made us less secure,” he says. Indeed, the Privacy and Civil Liberties Oversight Board found the “Section 215″ program for bulk collection of telephone metadata to be nearly useless, as well as likely illegal and problematic in other ways. But by contrast, it also reported that the “Section 702″ collection program had made a valuable contribution to security. Schneier does not engage on this point.

I’m waiting on my copy of Data and Goliath to arrive but I don’t find it surprising that Bruce overlooked and/or chose to not comment on the Section 702 report.

Starting with the full text, Report on the Surveillance Program Operated Pursuant to Section 702 of the Foreign Intelligence Surveillance Act, at one hundred and ninety-six pages (196), you will be surprised at how few actual facts are recited.

In terms of the efficacy of the 702 program, this is fairly typical:

The Section 702 program has proven valuable in a number of ways to the government’s efforts to combat terrorism. It has helped the United States learn more about the membership, leadership structure, priorities, tactics, and plans of international terrorist organizations. It has enabled the discovery of previously unknown terrorist operatives as well as the locations and movements of suspects already known to the government. It has led to the discovery of previously unknown terrorist plots directed against the United States and foreign countries, enabling the disruption of those plots.

That seems rather short on facts and long on conclusions to me. Yes?

Here’s a case the report singles out as a success:

In one case, for example, the NSA was conducting surveillance under Section 702 of an email address used by an extremist based in Yemen. Through that surveillance, the agency discovered a connection between that extremist and an unknown person in Kansas City, Missouri. The NSA passed this information to the FBI, which identified the unknown person, Khalid Ouazzani, and subsequently discovered that he had connections to U.S.-based Al Qaeda associates, who had previously been part of an abandoned early stage plot to bomb the New York Stock Exchange. All of these individuals eventually pled guilty to providing and attempting to provide material support to Al Qaeda.

Recalling that “early stage plot” means a lot of hot talk with no plan for implementation, which accords with pleas to “attempting to provide material support to Al Qaeda.” That’s grotesque.

Oh, another case:

For instance, in September 2009, the NSA monitored under Section 702 the email address of an Al Qaeda courier based in Pakistan. Through that collection, the agency intercepted emails sent to that address from an unknown individual located in the United States. Despite using language designed to mask their true intent, the messages indicated that the sender was urgently seeking advice on the correct mixture of ingredients to use for making explosives. The NSA passed this information to the FBI, which used a national security letter to identify the unknown individual as Najibullah Zazi, located near Denver, Colorado. The FBI then began intense monitoring of Zazi, including physical surveillance and obtaining legal authority to monitor his Internet activity. The Bureau was able to track Zazi as he left Colorado a few days later to drive to New York City, where he and a group of confederates were planning to detonate explosives on subway lines in Manhattan within the week. Once Zazi became aware that law enforcement was tracking him, he returned to Colorado, where he was arrested soon after. Further investigative work identified Zazi’s co-conspirators and located bomb-making components related to the planned attack. Zazi and one of his confederates later pled guilty and cooperated with the government, while another confederate was convicted and sentenced to life imprisonment. Without the initial tip-off about Zazi and his plans, which came about by monitoring an overseas foreigner under Section 702, the subway-bombing plot might have succeeded.

Sorry, that went by rather fast. The unknown sender in the United States did not know how to make explosives? And despite that, the plot is described as “…planning to detonate explosives on subway lines in Manhattan within the week.” Huh? That’s quite a leap from getting advice on explosives to being ready to execute a complex operation.

What’s wrong with the “terrorists” being tracked by the NSA/FBI? Almost without exception, they lack the skills to make bombs. The FBI fills in, supplying bombs in many cases, Cleveland, 2012, Portland, 2010, and that’s two I remember right off hand. (I don’t have a complete list of terror plots where the FBI supplies the bomb or bomb making materials. Do you? It would save me the work of putting one together. Thanks!)

A more general claim rounds out the “facts” claimed by the report:

A rough count of these cases identifies well over one hundred arrests on terrorism-related offenses. In other cases that did not lead to disruption of a plot or apprehension of conspirators, Section 702 appears to have been used to provide warnings about a continuing threat or to assist in investigations that remain ongoing. Approximately fifteen of the cases we reviewed involved some connection to the United States, such as the site of a planned attack or the location of operatives, while approximately forty cases exclusively involved operatives and plots in foreign countries.

Well, we know that “terrorism-related offense” includes “…attempting to provide material support to Al Qaeda.” And that conspiracy to commit a terrorist act can consist of talking about wanting to commit a terrorist act with no ability to put such a plan in action. Like no knowing how to make a bomb. Fairly serious impediment there, at least for a would be terrorist.

Not to mention that detention has no real relationship to the commission of a crime, as we have stood witness to at Guantanamo Bay (directions).

In Bruce’s defense, like he needs my help!, ;-), no one has an obligation to refute every lie told in support of government surveillance or its highly fictionalized “war on terrorism.” To no small degree, repeating those lies ad nauseam gives them credibility in group think circles, such as inside the beltway in D.C. Especially among agencies whose budgets depend upon those lies and the contractors who profit from them.

Treat yourself to some truth about cybersecurity, order your copy of Data and Goliath: The Hidden Battles to Capture Your Data and Control Your World by Bruce Schneler.

Congressional Influence Model [How To Choose Allies 4 Hackers]

Thu, 03/26/2015 - 18:51


Topic Maps

Congressional Influence Model by Westley Hennigh.

From the webpage:

This is a collection of data and code for investigating influence in Congress. Specifically, it uses data generated by MapLight and the Center for Responsive Politics to identify opposing interest groups and analyze their political contributions.

Unfortunately, due to size constraints, not all of the campaign finance data can be included in this repo. But if you’re curious you can download it using this scraper (see further instructions there).

I found this following the data for:

When interest groups disagreed on legislation, who did the 113th Congress vote with?

Sorted to show groups most frequently on opposite sides of legislation

To fully appreciate the graphic, see the original at: Congress is a Game, and We Have the Data to Show Who’s Winning by Westley Hennigh.

Where Westley also notes after the graphic:

Amongst more ideologically focused groups the situation is much the same. Conservative Republican interests were very often at odds with both health and welfare and human rights advocates, but Congress stood firmly with conservatives. They were almost twice as likely to vote against the interests of human rights advocates, and more than twice as likely to vote against health & welfare policy organizations.

The force driving this correlation between support by certain groups and favorable votes in Congress isn’t incalculable or hard to guess at. It’s money. The groups above that come out on top control massive amounts of political campaign spending relative to their opponents. The conservative Republican interests in conflict with health and welfare policy groups spent an average of 26 times as much on candidates that won seats in the 113th Congress. They outspent human rights advocates by even more — 300 times as much on average. The Chambers of Commerce, meanwhile, has spent more on lobbying than any other group every year since 1999.

As Westley points out, this is something we all “knew” at some level and now the data makes the correlation between money and policy undeniable.

My first reaction was Westley’s data is a good start towards: How much is that Representative/Senator in the window? The one with the waggly tail., a website where the minimum contribution for legislative votes, taking your calls, etc., is estimated for each member of the United States House and Senate. Interest groups could avoid overpaying for junior members and embarrassing themselves with paltry contributions to more senior members. Think of it as a public price list for legislation.

A How much is that Representative/Senator in the window? The one with the waggly tail. website would be very amusing, but it wouldn’t help me because I don’t have that sort of money. And it isn’t a straight out purchase, which is how they avoid the quid pro quo issue. Many of these interest groups have been greasing the palms of, sorry, contributing to, politicians for years.

In order to gain power by contributions, real power, requires a contribution/issue campaign that spans the political careers of multiple politicians, starting at the state and local level and following those careers into Congress. Which means, of course, getting upset about this or that outrage isn’t enough to sustain the required degree of organization and contributions. Contributions and reminders of contributions have to flow 7 x 365, in good years and lean years, perhaps even more so in lean (non-election) years.

Not to mention that you will need to make friends fast and enemies, permanent ones anyway, very slowly. Perhaps a member of Congress has too much local opposition to favor your side on a minor bill. They have simply be absent rather than vote. You have to learn to live with the reality that your representative/senator has other pressure points. Not unless you want to own one outright. They exist I have no doubt but the asking price would be very high. Easier to get one issue representatives elected than senators but I don’t know how useful that would be in the long term.

After thinking about it for a while, I concluded we know three things for sure:

  • Congress votes with conservatives twice as often as human rights advocates.
  • Conservatives outspend other groups and have for decades.
  • Outspending conservatives would require national/state/local contributions for decades.

Based on those facts, would you choose an ally that:

  • Loses twice as often on their issues as other groups?
  • Doesn’t regularly contributed to campaigns at state/local/federal levels?
  • That has no effective national/state/local organization that has persisted for decades?

How you frame your issues makes a difference in available allies.

Take for example the ACLU and its suit against the NSA to take back the Internet Backbone. The NSA Has Taken Over the Internet Backbone. We’re Suing to Get it Back.

The ACLU complaint against the NSA has issues such as:

48. Plaintiffs are educational, legal, human rights, and media organizations. Their work requires them to engage in sensitive and sometimes privileged communications, both international and domestic, with journalists, clients, experts, attorneys, civil society organizations, foreign government officials, and victims of human rights abuses, among others.

49. By intercepting, copying, and reviewing substantially all international text-based communications—and many domestic communications as well—as they transit telecommunications networks inside the United States, the government is seizing and searching Plaintiffs’ communications in violation of the FAA and the Constitution.

Really makes you feel like girding your loins and putting on body armor doesn’t it? Almost fifty (50) pages of such riveting prose.

Don’t get me wrong, I support the ACLU and deeply appreciate their suing the NSA. The NSA needs to be opposed in every venue by everyone who cares about having any semblance of freedom in the United States.

I hope the ACLU is victorious but at best, the NSA will be forced to obey existing laws, assuming you can trust known liars when they say “…now we are obeying the law, but we can’t let you see that we are obeying the law.” Somehow that doesn’t fill me with confidence, assuming the ACLU is successful.

What happens if we re-phrase the issue of NSA surveillance? So we can choose stronger allies to have on our side? Take the mass collection of credit card data for example. Sweeping NSA Surveillance Includes Credit-Card Transactions, Top Three Phone Companies’ Records by Ryan Gallagher.

What would credit card data enable? Hmmm, can you say a de facto national gun registry? With purchase records for guns and ammunition? What reason other than ownership would I have for buying .460 Weatherby Magnum ammunition?

By framing the issue of surveillance as a gun registration issue, we find the NRA joining with the ACLU and others in ACLU vs. Clapper, No. 13-cv-03994 (WHP), saying:

For more than 50 years since its decision in Nat’l Ass’n for Advancement of Colored People v. State of Ala. ex rel. Patterson, 357 U.S. 449 (1958), the Supreme Court has recognized that involuntary disclosure of the membership of advocacy groups inhibits the exercise of First Amendment rights by those groups. For nearly as long—since the debates leading up to enactment of the Gun Control Act of 1968—the Congress has recognized that government recordkeeping on gun owners inhibits the exercise of Second Amendment rights. The mass surveillance program raises both issues, potentially providing the government not only with the means of identifying members and others who communicate with the NRA and other advocacy groups, but also with the means of identifying gun owners without their knowledge or consent, contrary to longstanding congressional policy repeatedly reaffirmed and strengthened by Congresses that enacted and reauthorized the legislation at issue in this case. The potential effect on gun owners’ privacy is illustrative of the potential effect of the government’s interpretation of the statute on other statutorily protected privacy rights. The injunction should be issued.

That particular suit was unsuccessful at the district court level but that should give you an idea of how “framing” an issue can enable you to attract allies who are more successful than most.

With support of the ACLU, perhaps, just perhaps the NSA will be told to obey the law. Guesses for grabs on how successful that “telling” will be.

With the support of the NRA and similar groups, the very existence of the NSA data archives will come into question. Not beyond possibility that the NSA will be returned to its former, much smaller footprint of legitimate cryptography work.

And what of other NRA positions? (shrugs) I’m sure that any group you look closely enough at will stand for something you don’t like. As I put it to a theologically diverse group forming to create a Bible encoding, “I’m looking for allies, not soul mates. I already have one of those.”


PS: As of April, 2014, Overview of Constitutional Challenges to NSA Collection Activities and Recent Developments, is a summary of legal challenges to the NSA. Dated but I thought it might be helpful.

2nd Amendment-Summary-4-Hackers

Wed, 03/25/2015 - 21:30


Topic Maps

As promised, not a deeply technical (legal) analysis of District of Columbia vs. Heller but summary of the major themes in Scalia’s opinion for the majority.

District of Columbia v. Heller, 554 U.S. 570, 128 S. Ct. 2783, 171 L. Ed. 2d 637 (2008) [2008 BL 136680] has the following pagination markers:

* U.S. (official)
** S. Ct. (West Publishing)
*** L. Ed. 2d (Lawyers Editon 2nd)
**** BL (Bloomberg Law)

In the text you will see: [*577] for example which is the start of page 577 in the official version of the opinion. I use the official pagination herein.

Facts: Heller, a police officer applied for a handgun permit, which was denied. Without a permit, possession of a handgun was banned in the District of Columbia. Even if a permit were obtained, the handgun had to be disabled and unloaded. Heller sued the district saying that the Second Amendment protects an individual’s right to possess firearms and that the city’s ban on handguns and the non-functioning requirement, should the handgun be required for self-defense, infringed on that right.

[Observation: When challenging a law on constitutional grounds, get an appropriate plaintiff to bring the suit. I haven’t done the factual background but I rather doubt that Heller was just an ordinary police officer who decided on his own to sue the District of Columbia. Taking a case to the Supreme Court is an expensive proposition. In challenging laws that infringe on hackers, use security researchers, universities, people with clean reputations. Not saying you can’t win with others but on policy debates its better to wear your best clothes.]

Law: Second Amendment: “A well regulated Militia, being necessary to the security of a free State, the right of the people to keep and bear Arms, shall not be infringed.”

Scalia begins by observing:

“[t]he Constitution was written to be understood by the voters; its words and phrases were used in their normal and ordinary as distinguished from [****4] technical meaning.” United States v. Sprague, 282 U. S. 716, 731 (1931); see also Gibbons v. Ogden, 9 Wheat. 1, 188 (1824). [*576]

The crux of Scalia’s argument comes early and is stated quite simply:

The Second Amendment is naturally divided into two parts: its prefatory clause and its operative clause. The former does not limit the latter grammatically, but rather announces a purpose. The Amendment could be rephrased, “Because a well regulated Militia is necessary to the security of a free State, the right of the people to keep and bear Arms shall not be infringed.” [*577]

With that “obvious” construction, Scalia sweeps to one side all arguments that attempt to limit the right to bear arms to a militia context. Its just an observation, not binding in any way on the operative clause. He does retain it for later use to argue the interpretation of the operative clause is consistent with that purpose.

Scalia breaks his analysis of the operative clause into the following pieces:

a. “Right of the People.”

b. “Keep and Bear Arms”

c. Meaning of the Operative Clause.

“Right of the People.” In perhaps the strongest part of the opinion, Scalia observes that “right of the people” occurs in the unamended Constitution and Bill of Rights only two other times, First Amendment (assemby-and-petition clause) and Fourth Amendment (search-and-seizure) clause. The Fourth Amendment has fallen on hard times of late but the First Amendment is still attractive to many. He leaves little doubt that the right to “keep and bear arms” (the next question), is undoubtedly meant to be an individual right. [*579]

“Keep and Bear Arms” Before turning to “keep” and “bear,” Scalia makes two important points with regard to “arms:”

Before addressing the verbs “keep” and “bear,” we interpret their object: “Arms.” The 18th-century meaning is no different from the meaning today. The 1773 edition of Samuel Johnson’s dictionary defined “arms” as “[w]eapons of offence, or armour of defence.” 1 Dictionary of the English Language 106 (4th ed.) (reprinted 1978) (hereinafter Johnson). Timothy Cunningham'[****6] s important 1771 legal dictionary defined “arms” as “any thing that a man wears for his defence, or takes into his hands, or useth in wrath to cast at or strike another.” 1 A New and Complete Law Dictionary; see also N. Webster, American Dictionary of the English Language (1828) (reprinted 1989) (hereinafter Webster) (similar).

The term was applied, then as now, to weapons that were not specifically designed for military use and were not employed in a military capacity. For instance, Cunningham’s legal dictionary gave as an example of usage: “Servants and labourers shall use bows and arrows on Sundays, & c. and not bear other arms.” See also, e.g., An Act for the trial of Negroes, 1797 Del. Laws ch. XLIII, § 6, in 1 First Laws of the State of Delaware 102, 104 (J. Cushing ed. 1981 (pt. 1)); see generally State v. Duke, 42 Tex. 455, 458 (1874) (citing decisions of state courts construing “arms”). Although one founding-era thesaurus limited “arms” (as opposed to “weapons”) to “instruments of offence generally made use of in war,” even that source stated that all firearms constituted “arms.” 1 J. Trusler, The Distinction Between Words Esteemed [*582] Synonymous in the English Language 37 (3d ed. 1794) (emphasis added).

Some have made the argument, bordering on the frivolous, that only those arms in existence in the 18th century are protected by the Second Amendment. We do not interpret constitutional rights that way. Just as the First Amendment protects modern forms of communications, e. g., Reno v. American Civil Liberties Union, 521 U. S. 844, 849 (1997), and the Fourth Amendment applies to modern forms of search, e.g., Kyllo v. United States, 533 U. S. 27, 35-36 (2001), the Second Amendment extends, [**2792] prima facie, to all instruments that constitute bearable arms, even those that were not in existence at the time of the founding. [*581-*582]

Although he says “The 18th-century meaning is no different from the meaning today.” at the outset, the sources cited make it clear that it is the character of an item as a means of offense or defense, generally used in war, that makes it fall into the category “arms.” Which extends to bows and arrows as well as 18th century firearms as well as modern firearms.

“Arms” not limited to 18th Century “Arms”

The second point, particularly relevant to hackers, is that arms are not limited to those existing in the 18th century. Scalia specifically calls out both First and Fourth Amendment cases where rights have evolved along with modern technology. The adaptation to modern technology under those amendments is particularly relevant to making a hackers argument under the Second Amendment.

Posession/Bearing Arms

The meaning of “keep arms” requires only a paragraph or two:

Thus, the most natural reading of “keep Arms” in the Second Amendment is to “have weapons.” [*582]

Which settles the possession of arms question, but what about the right to carry such arms?

The notion of “bear arms” devolves into a lively contest of snipes between Scalia and Stevens. You can read both the majority opinion and the dissent if you are interested but the crucial text reads:

We think that JUSTICE GINSBURG accurately captured the natural meaning of “bear arms.” Although the phrase implies that the carrying of the weapon is for the purpose of “offensive or defensive action,” it in no way connotes participation in a structured military organization.

From our review of founding-era sources, we conclude that this natural meaning was also the meaning that “bear arms” had in the 18th century. In numerous instances, “bear arms” was unambiguously used to refer to the carrying of weapons outside of an organized militia. [*584]

I mention that point just in case some wag argues that cyber weapons should be limited to your local militia or that you don’t have the right to carry such weapons on your laptop, cellphone, USB drive, etc.

Meaning of the Operative Clause

c. Meaning of the Operative Clause. [4] Putting all of these textual elements together, we find that they guarantee the individual right to possess and carry weapons in case of confrontation. This meaning is strongly confirmed by the historical background of the Second Amendment. [5] We look to this because it has always been widely understood that the Second Amendment, like the First and Fourth Amendments, codified a pre-existing right. The very text of the Second Amendment implicitly recognizes the pre-existence of the right and declares only that it “shall not be infringed.” As we said in United [****11] States v. Cruikshank, 92 U. S. 542, 553 (1876), “[t]his is not a right granted by the Constitution. Neither is it in any manner dependent upon that instrument for its existence. The [**2798] second amendment declares [***658] that it shall not be infringed. . . .”[fn16] [*592]

You can’t get much better than a pre-existing right, at least not with the current Supreme Court composition. Certainly sounds like it would extent to defending your computer systems, which the government seems loathe to undertake.

Motivation for the Second Amendment

Skipping over the literalist interpretation of the prefactory clause, Scalia returns to the relationship between the prefatory and operative clause. The opinion goes on for twenty-one (21) pages at this point but an early paragraph captures the gist of the argument if not all of its details:

The debate with respect to the right to keep and bear arms, as with other guarantees in the Bill of Rights, was not over whether it was desirable (all agreed that it was) but over whether it needed to be codified in the Constitution. During the 1788 ratification debates, the fear that the Federal Government would disarm the people in order to impose rule through a standing army or select militia was pervasive in Anti-federalist rhetoric. See, e. g., Letters from The Federal Farmer III (Oct. 10, 1787), in 2 The Complete Anti-Federalist 234, 242 (H. Storing ed. 1981). John Smilie, for example, worried not only that Congress’s “command of the militia” could be used to create a “select militia,” or to have “no militia at all,” but also, as a separate concern, that “[w]hen a select militia is formed; the people in general may be disarmed.” 2 Documentary History of the Ratification of the Constitution 508-509 (M. Jensen ed. 1976) (hereinafter [*599] Documentary Hist.). Federalists responded that because Congress was given no power to abridge the ancient right of individuals to keep and bear arms, such a force could never oppress the people. See, e.g., A Pennsylvanian III (Feb. 20, 1788), in The Origin of the Second Amendment 275, [****15] 276 (D. Young ed., 2d ed. 2001) (hereinafter Young); White, To the Citizens of Virginia (Feb. 22, 1788), in id., at 280, 281; A Citizen of America (Oct. 10, 1787), in id., at 38, 40; Foreign Spectator, Remarks on the Amendments to the Federal Constitution, Nov. 7, 1788, in id., at 556. It was understood across the political spectrum that the right helped to secure the ideal of a citizen militia, which might be necessary to oppose an oppressive military force if the constitutional order broke down.[*598-*599]

Whether you choose to emphasize the disarming of the people by regulation of cyberweapons or the overreaching of the Federal government, the language here is clearly of interest in arguing for cyberweapons under the Second Amendment. The majority opinion on this point is found at pages [*598-*619].

Limitations on “Arms”

The right to possess arms, including cyberweapons, isn’t a slam dunk. The Federal and State governments can place some regulations on the possession of arms. One example that Scalia discusses is United States v. Miller, 307 U. S. 174, 179 (1939). Reading Miller:

…to say only that the Second Amendment [**2816] does not protect those weapons not typically possessed by law-abiding citizens for lawful purposes, such as short-barreled shotguns. That accords with the historical understanding of the scope of the right, see Part III, infra.[fn25] [*625]

So hackers will lose on blue boxes, if you know the reference but quite possibly win on software, code, etc. So far as I know, no one has challeged the right of computer users to protect themselves.

Is there a balancing test for cyber weapons?

The balance of the opinion is concerned with the case at hand and sparring with Justice Breyer but it does have this jewel when it is suggested that the Second Amendment should be subject to a balancing test (a likely argument about cyber weapons):

We know of no other enumerated constitutional right whose core protection has been subjected to a freestanding “interest-balancing” approach. The very enumeration of the right takes out of the hands of government — even the Third Branch of Government — the power to decide on a case-by-case basis whether the right is really worth insisting upon. A constitutional guarantee subject to future judges’ assessments of its usefulness is no constitutional guarantee at all. [15] Constitutional rights are enshrined with the scope they were understood to have when the people adopted [*635] them, whether or not future legislatures or (yes) even future judges think that scope too broad. We would not apply an “interest-balancing” approach to the prohibition of a peaceful neo-Nazi march through Skokie. See National Socialist Party of America v. Skokie, 432 U. S. 43 (1977) (per curiam). The First Amendment contains the freedom-of-speech guarantee that the people ratified, which included exceptions for obscenity, libel, and disclosure of state secrets, but not for the expression of extremely unpopular and wrongheaded views. The Second Amendment is no different. Like the First, it is the very product of an interest balancing by the people — which JUSTICE BREYER would now conduct for them anew. And whatever else it leaves to future evaluation, it surely elevates above all other interests the right of law-abiding, responsible citizens to use arms in defense of hearth and home. [*634-*635]

I rather like the lines:

The very enumeration of the right takes out of the hands of government — even the Third Branch of Government — the power to decide on a case-by-case basis whether the right is really worth insisting upon. A constitutional guarantee subject to future judges’ assessments of its usefulness is no constitutional guarantee at all.

Is the right to privacy no right at all because the intelligence community lapdog FISA court decides in secret when our right to privacy is unnecessary?

Open Issues

Forests have been depopulated to produce the paper required for all the commentaries on District of Columbia v. Heller. What I have penned above is a highly selective summary in hopes of creating interest in a Second Amendment argument for the possession and discussion of cyber weapons.

Open issues include:

  • Evolution of the notion of “arms” for the Second Amendment.
  • What does it mean to posses a cyber weapon? Is code required? Binary?
  • Defensive purposes of knowledge or cyber weapons.
  • Analogies to disarming the public.
  • Others?

As I suggested in A Well Regulated Militia, a Second Amendment argument to protect our rights to cyber weapons could prove to be more successful than other efforts to date.

Unless you like being disarmed while government funded hackers invade your privacy of course.

Let me know if you are interested in sponsoring research on Second Amendment protection for cyber weapons.

PS: Just so you know, I took my own advice and joined the NRA earlier this week. Fights like this can only be won with allies, strong allies.

Who’s Pissed Off at the United States?

Wed, 03/25/2015 - 00:14


Topic Maps

Instances of Use of United States Armed Forces Abroad, 1798-2015 by Barbara Salazar Torreon (Congressional Research Service).

From the summary:

This report lists hundreds of instances in which the United States has used its Armed Forces abroad in situations of military conflict or potential conflict or for other than normal peacetime purposes. It was compiled in part from various older lists and is intended primarily to provide a rough survey of past U.S. military ventures abroad, without reference to the magnitude of the given instance noted. The listing often contains references, especially from 1980 forward, to continuing military deployments, especially U.S. military participation in multinational operations associated with NATO or the United Nations. Most of these post-1980 instances are summaries based on presidential reports to Congress related to the War Powers Resolution. A comprehensive commentary regarding any of the instances listed is not undertaken here.

One of the first steps in security analysis is an evaluation of potential attackers. Who has a reason (in their eyes) to go to the time and trouble of attacking you?

Such a list for the United States doesn’t narrow the field by much but it may help avoid overlooking some of the less obvious candidates. To be sure the United States will keep China and North Korea as convenient whipping boys for any domestic cyber misadventures, but that’s just PR. Why would our largest creditor want to screw with our ability to pay them back? All of the antics about China are street theater, far away from where real decisions are made.

What amazes me is despite centuries of misbehavior by American administration after American administration, that places like Vietnam want to have peaceful relations with us. They aren’t carrying a grudge. Hard to say that for the engineers of one U.S. foreign policy disaster after another.

You could also think of the more recent incidents as the starting point of a list of people to hound from public office and/or public service. Either way, I think you will find it useful.

I first saw this in a tweet by the U.S. Dept. of Fear.

Bulk Collection of Signals Intelligence: Technical Options (2015)

Tue, 03/24/2015 - 23:40


Topic Maps

Bulk Collection of Signals Intelligence: Technical Options (2015)

From the webpage:

The Bulk Collection of Signals Intelligence: Technical Options study is a result of an activity called for in Presidential Policy Directive 28 (PPD-28), issued by President Obama in January 2014, to evaluate U.S. signals intelligence practices. The directive instructed the Office of the Director of National Intelligence (ODNI) to produce a report within one year “assessing the feasibility of creating software that would allow the intelligence community more easily to conduct targeted information acquisition rather than bulk collection.” ODNI asked the National Research Council (NRC) — the operating arm of the National Academy of Sciences and National Academy of Engineering — to conduct a study, which began in June 2014, to assist in preparing a response to the President. Over the ensuing months, a committee of experts appointed by the Research Council produced the report.

Useful background information for engaging on the policy side of collecting signals intelligence. Since I don’t share the starting assumption that bulk collection of signals intelligence is ever justified inside the United States, it is only of passing interest to me. I concede that in some limited cases surveillance can be authorized but only under the Fourth Amendment and then only by a constitutional court and not a FISA star chamber.

I would have posted a copy for your downloading so you could avoid registration but the work carries this restriction:

Distribution, posting, or copying of this PDF is strictly prohibited without written permission of the National Academies Press

Despite doubting I am on any list any where, still, it isn’t smart to give anyone a free shot.

The main difficulty in challenging such reports is that fictions, invented by the intelligence agencies, are take as facts. Such as the oft reported fiction that bulk collection/retention helps when a new figure is identified. To enable the agencies to consider their past activities. Certainly a theoretical possibility to be sure but how many cases and what were the results of that backtracking are unknown. Quite possibly to the intelligence agencies themselves.

If you have identified someone as a current credible threat, perhaps even on their way to commit an illegal act, who is going to worry about their phone conversations several years ago? Of course, that’s where their “logic” for immediate action runs counter to the fact they are simply inventing work for themselves. The more data they collect, the larger their IT budget and the more people needed just in case they ever want to search it. Complete and total farce.

That’s the other reason I oppose build signals intelligence collection in the United States, it is an incompetent waste of funds. Funds that could be spent on non-manipulative aid to the people of the Middle East (not their governments), which would greatly reduce the odds of anyone being unhappy enough with the United States to commit a terrorist act on its soil. Despite the fact the United States has committed numerous terrorist attacks on theirs.

Sorting [Visualization]

Tue, 03/24/2015 - 15:23


Topic Maps

Carlo Zapponi created, a visualization of sorting resource that steps through different sorting algorithms. You can choose from four different initial states, six (6) different sizes (5, 10, 20, 50, 75, 100), and six (6) different colors.

The page defaults to Quick Sort and Heap Sort, but under add algorithms you will find:

I added Wikipedia links for the algorithms. For a larger list see:
Sorting algorithm.

I first saw this in a tweet by Eric Christensen.

Bearing Arms – 2nd Amendment and Hackers – The Constitution

Tue, 03/24/2015 - 01:09


Topic Maps

All discussions of the right to bear arms in the United States start with the Second Amendment. But since words can’t interpret themselves for specific cases, our next stop is the United States Supreme Court.

One popular resource, The Constitution of the United States of America: Analysis and Interpretation (popularly known as the Constitution Annotated), covers the Second Amendment in a scant five (5) pages.

There is a vast sea of literature on the Second Amendment but there is one case that established the right to bear arms is an individual right and not limited to state militias.

In District of Columbia vs. Heller, 554 U.S. 570 (2008), Justice Scalia writing for the majority found that the right to bear arms was an individual right, for the first time in U.S. history.

The unofficial syllabus notes:

The prefatory clause comports with the Court’s interpretation of the operative clause. The “militia” comprised all males physically capable of acting in concert for the common defense. The Antifederalists feared that the Federal Government would disarm the people in order to disable this citizens’ militia, enabling a politicized standing army or a select militia to rule. The response was to deny Congress power to abridge the ancient right of individuals to keep and bear arms, so that the ideal of a citizens’ militia would be preserved. Pp. 22–28.

Interesting yes? Disarm the people in order to enable “…a politicized standing army (read NSA/CIA/FBI/DHS) or a select militia to rule.”

If citizens are prevented from owning hacking software and information, necessary for their own cybersecurity, have they not been disarmed?

Justice Scalia’s opinion is rich in historical detail and I will be teasing out the threads that seem most relevant to an argument that hacking tools and knowledge should fall under the right to bear arms under the Second Amendment.

In the mean time, some resources that you will find interesting/helpful:

District of Columbia v. Heller in Wikipedia is a quick read and a good way to get introduced to the case and the issues it raises. But only as an introduction, you would not perform surgery based on a newspaper report of a surgery. Yes?

A definite step up in analysis is SCOTUSblog, District of Columbia v. Heller. You will find twenty (20) blog posts on Heller, briefs and documents in the case, plus some twenty (20) briefs supporting the petitioner (District of Columbia) and forty-seven (47) briefs supporting the respondent (Heller). Noting that attorneys could be asked questions about any and all of the theories advanced in the various briefs.

Take this as an illustration of why I don’t visit SCOTUSblog as often as I should. I tend to get lost in the analysis and start chasing threads through the opinions and briefs. One of the many joys being that rarely you find anyone with a hand waving citation “over there, somewhere” as you do in CS literature. Citations are precise or not at all.

No, I don’t propose to drag you through all of the details even of Scalia’s majority opinion but just enough to frame the questions to be answered in making the claim that cyber weapons are the legitimate heirs of arms for purposes of the Second Amendment and entitled to the same protection as firearms.

Do some background reading today and tomorrow. I am re-reading Scalia’s opinion now and will let it soak in for a day or so before posting an outline of it relevant for our purposes. Look for it late on Wednesday, 25 March 2015.

PS: Columbia vs. Heller, 554 U.S. 570 (2008), the full opinion plus dissents. A little over one hundred and fifty (150) pages of very precise writing. Enjoy!

Association Rule Mining – Not Your Typical Data Science Algorithm

Tue, 03/24/2015 - 00:00


Topic Maps

Association Rule Mining – Not Your Typical Data Science Algorithm by Dr. Kirk Borne.

From the post:

Many machine learning algorithms that are used for data mining and data science work with numeric data. And many algorithms tend to be very mathematical (such as Support Vector Machines, which we previously discussed). But, association rule mining is perfect for categorical (non-numeric) data and it involves little more than simple counting! That’s the kind of algorithm that MapReduce is really good at, and it can also lead to some really interesting discoveries.

Association rule mining is primarily focused on finding frequent co-occurring associations among a collection of items. It is sometimes referred to as “Market Basket Analysis”, since that was the original application area of association mining. The goal is to find associations of items that occur together more often than you would expect from a random sampling of all possibilities. The classic example of this is the famous Beer and Diapers association that is often mentioned in data mining books. The story goes like this: men who go to the store to buy diapers will also tend to buy beer at the same time. Let us illustrate this with a simple example. Suppose that a store’s retail transactions database includes the following information:

If you aren’t familiar with association rule mining, I think you will find Dr. Borne’s post an entertaining introduction.

I would not go quite as far as Dr. Borne with “explanations” for the pop-tart purchases before hurricanes. For retail purposes, so long as we spot the pattern, they could be building dikes out of them. The same is the case for other purchases. Take advantage of the patterns and try to avoid second guessing consumers. You can read more about testing patterns Selling Blue Elephants.