Another word for it

Subscribe to Another word for it feed
Updated: 1 day 21 hours ago

A practical introduction to functional programming

Sun, 01/25/2015 - 22:12


Topic Maps

A practical introduction to functional programming by Mary Rose Cook.

From the post:

Many functional programming articles teach abstract functional techniques. That is, composition, pipelining, higher order functions. This one is different. It shows examples of imperative, unfunctional code that people write every day and translates these examples to a functional style.

The first section of the article takes short, data transforming loops and translates them into functional maps and reduces. The second section takes longer loops, breaks them up into units and makes each unit functional. The third section takes a loop that is a long series of successive data transformations and decomposes it into a functional pipeline.

The examples are in Python, because many people find Python easy to read. A number of the examples eschew pythonicity in order to demonstrate functional techniques common to many languages: map, reduce, pipeline.

After spending most of the day with poor documentation, this sort of post is a real delight. It took more effort than the stuff I was reading today but it saves every reader time, rather than making them lose time.

Perhaps I should create an icon to mark documentation that will cost you more time than searching a discussion list for the answer.


I first saw this in a tweet by Gianluca Fiore.

Comparative Oriental Manuscript Studies: An Introduction

Sun, 01/25/2015 - 21:52


Topic Maps

Comparative Oriental Manuscript Studies: An Introduction edited by: Alessandro Bausi (General editor), et al.

The “homepage” of this work enables you to download the entire volume or individual chapters, depending upon your interests. It provides a lengthy introduction to codicology, palaeography, textual criticism and text editing, and of special interest to library students, cataloguing as well as conservation and preservation.

Alessandro Bausi writes in the preface:

Thinking more broadly, our project was also a serious attempt to defend and preserve the COMSt-related fields within the academic world. We know that disciplines and fields are often determined and justified by the mere existence of an easily accessible handbook or, in the better cases, sets of handbooks, textbooks, series and journals. The lack of comprehensive introductory works which are reliable, up-to-date, of broad interest and accessible to a wide audience and might be used in teaching, has a direct impact on the survival of the ‘small subjects’ most of the COMSt-related disciplines pertain to. The decision to make the COMSt handbook freely accessible online and printable on demand in a paper version at an affordable price was strategic in this respect, and not just meant to meet the prescriptions of the European Science Foundation. We deliberately declined to produce an extremely expensive work that might be bought only by a few libraries and research institutions; on the other hand, a plain electronic edition only to be accessed and downloaded as a PDF file was not regarded as a desirable solution either. Dealing with two millennia of manuscripts and codices, we did not want to dismiss the possibility of circulating a real book in our turn.

It remains, hopefully, only to say,

Lector intende: laetaberis

John Svarlien says: A rough translation is: “Reader, pay attention. You will be happy you did.”

We are all people of books. It isn’t possible to separate present day culture and what came before it from books. Even people who shun reading of books, are shaped by forces that can be traced back to books.

But books did not suddenly appear as mass-printed paperbacks in airport lobbies and checkout lines in grocery stores. There is a long history of books prior to printing to the edges of the formation of codices.

This work is an introduction to the fascinating world of studying manuscripts and codices prior to the invention of printing. When nearly every copy of a work is different from every other copy, you can imagine the debates over which copy is the “best” copy.

Imagine some versions of “Gone with the Wind” ending with:

  • Frankly, my dear, I don’t give a damn. (traditional)
  • Ashley and I don’t give a damn. (variant)
  • Cheat Ashley out of his business I suppose. (variant)
  • (Lacks a last line due to mss. damage.) (variant)

The “text” of yesteryear lacked the uniform sameness of the printed “text” of today.

When you think about your “favorite” version in the Bible, it is likely a “majority” reading but hardly the only one.

With the advent of the printing press, texts took on the opportunity to be uniformly produced in mass quantities.

With the advent of electronic texts, either due to editing or digital corruption, we are moving back towards non-uniform texts.

Will we see the birth of digital codicology and its allied fields for digital texts?

PS: Please forward the notice of this book to your local librarian.

I first saw this in a tweet by Kirk Lowery.

Crawling the WWW – A $64 Question

Sat, 01/24/2015 - 20:14


Topic Maps

Have you ever wanted to crawl the WWW? To make a really comprehensive search? Waiting for a private power facility and server farm? You need wait no longer!

Ross Fairbanks details in WikiReverse data pipeline details the creation of Wikireverse:

WikiReverse is a reverse web-link graph for Wikipedia articles. It consists of approximately 36 million links to 4 million Wikipedia articles from 900,000 websites.

You can browse the data at WikiReverse or downloaded from S3 as a torrent.

The first thought that struck me was the data set would be useful for deciding which Wikipedia links are the default subject identifiers for particular subjects.

My second thought was what a wonderful starting place to find links with similar content strings, for the creation of topics with multiple subject identifiers.

My third thought was, $64 to search a CommonCrawl data set!

You can do a lot of searches at $64 per before you get to the cost of a server farm, much less a server farm plus a private power facility.

True, it won’t be interactive but then few searches at the NSA are probably interactive.

The true upside being you are freed from the tyranny of page-rank and hidden algorithms by which vendors attempt to guess what is best for them and secondarily, what is best for you.

Take the time to work through Ross’ post and develop your skills with the CommonCrawl data.

Tooling Up For JSON

Sat, 01/24/2015 - 19:22


Topic Maps

I needed to explore a large (5.7MB) JSON file and my usual command line tools weren’t a good fit.

Casting about I discovered Jshon: Twice as fast, 1/6th the memory. From the home page for Jshon:

Jshon parses, reads and creates JSON. It is designed to be as usable as possible from within the shell and replaces fragile adhoc parsers made from grep/sed/awk as well as heavyweight one-line parsers made from perl/python. Requires Jansson

Jshon loads json text from stdin, performs actions, then displays the last action on stdout. Some of the options output json, others output plain text meta information. Because Bash has very poor nested datastructures, Jshon does not try to return a native bash datastructure as a tpical library would. Instead, Jshon provides a history stack containing all the manipulations.

The big change in the latest release is switching the everything from pass-by-value to pass-by-reference. In a typical use case (processing AUR search results for ‘python’) by-ref is twice as fast and uses one sixth the memory. If you are editing json, by-ref also makes your life a lot easier as modifications do not need to be manually inserted through the entire stack.

Jansson is described as: “…a C library for encoding, decoding and manipulating JSON data.” Usual ./configure, make, make install. Jshon has no configure or install script so just make and toss it somewhere that is in your path.

Under Bugs you will read: “Documentation is brief.”

That’s for sure!

Still, it has enough examples that with some practice you will find this a handy way to explore JSON files.


History Depends On Who You Ask, And When

Sat, 01/24/2015 - 16:52


Topic Maps

You have probably seen the following graphic but it bears repeating:

The image is from: Who contributed most to the defeat of Nazi Germany in 1945?

From the post:

A survey conducted in May 1945 on the whole French territory now released (confirming a survey in September 1944 with Parisians) showed that interviewees appear well aware of the power relations and the role of allies in the war, despite the censorship and the difficulty to access reliable information under enemy’s occupation.

A clear majority (57%) believed that the USSR is the nation that has contributed most to the defeat of Germany while the United States and England will gather respectively 20% and 12%.

But what is truly astonishing is that this vision of public opinion was reversed very dramatically with time, as shown by two surveys conducted in 1994 and 2004. In 2004, 58% of the population were convinced that USA played the biggest role in the Second World War and only 20% were aware of the leading role of USSR in defeating the Nazi.

This is a very clear example of how the propaganda adjusted the whole nation’s perception of history, the evaluation of the fundamental contribution to the allied victory in the World War II.

Whether this change in attitude was the result of “propaganda” or some less directed social process I cannot say.

What I do find instructive is that over sixty (60) years, less than one lifetime, public perception of the “truth” can change that much.

How much greater the odds that the “truth” of events one hundred years ago are different from the ones we hold now.

To say nothing of the “truth” of events several thousand years ago, which we have reported only a handful of times, reports that have been edited to suite particular agendas.

Or we have some physical relics that occur at one location, sans any contemporaneous documentation, which we would not understand in its ancient context but in ours.

That should not dissuade us from writing histories, but it should make us cautious about taking action based on historical “truths.”

I most recently saw this in a tweet by Anna Pawlicka.

A first look at Spark

Sat, 01/24/2015 - 15:57


Topic Maps

A first look at Spark by Joseph Rickert.

From the post:

Apache Spark, the open-source, cluster computing framework originally developed in the AMPLab at UC Berkeley and now championed by Databricks is rapidly moving from the bleeding edge of data science to the mainstream. Interest in Spark, demand for training and overall hype is on a trajectory to match the frenzy surrounding Hadoop in recent years. Next month's Strata + Hadoop World conference, for example, will offer three serious Spark training sessions: Apache Spark Advanced Training, SparkCamp and Spark developer certification with additional spark related talks on the schedule. It is only a matter of time before Spark becomes a big deal in the R world as well.

If you don't know much about Spark but want to learn more, a good place to start is the video of Reza Zadeh's keynote talk at the ACM Data Science Camp held last October at eBay in San Jose that has been recently posted.

After reviewing the high points of Reza Zadeh's presentation, Joseph points out another 4 hours+ of videos on using Spark and R together.

A nice collection for getting started with Spark and seeing how to use a standard tool (R) with an emerging one (Spark).

I first saw this in a tweet by Christophe Lalanne.

DiRT Digital Research Tools

Fri, 01/23/2015 - 19:21


Topic Maps

DiRT Digital Research Tools

From the post:

The DiRT Directory is a registry of digital research tools for scholarly use. DiRT makes it easy for digital humanists and others conducting digital research to find and compare resources ranging from content management systems to music OCR, statistical analysis packages to mindmapping software.

Interesting concept but the annotations are too brief to convey much information. Not to mention that within a category, say Conduct linguistic research or Transcribe handwritten or spoken texts, the entries have no apparent order, or should I say they are not arranged in alphabetical order by name. There may be some other order that is escaping me.

Some entries appear in the wrong categories, such as Xalan being found under Transcribe handwritten or spoken texts:

Xalan is an XSLT processor for transforming XML documents into HTML, text, or other XML document types. It implements XSL Transformations (XSLT) Version 1.0 and XML Path Language (XPath) Version 1.0.

Not what I think of when I think about transcribing handwritten or spoken texts. You?

I didn’t see a process for submitting corrections/comments on resources. I will check and post on this again. It could be a useful tool.

I first saw this in a tweet by Christophe Lalanne.

Digital Cartography [84]

Thu, 01/22/2015 - 23:35


Topic Maps

Digital Cartography [84] by Visual Loop.

From the post:

Welcome to the year’s first edition of Digital Cartography, our weekly column where we feature the most recent interactive maps that came to our way. And being this the first issue of 2015, of course that it’s fully packed with more than 40 new interactive maps and cartographic-based narratives.

That means that you’ll need quite a bit of time to spend exploring these examples, but if that isn’t enough, there’s always the list with our 100 favorite interactive maps of 2014 (part one and two), guaranteed to keep you occupied for the next day or so.

…[M]ore than 40 new interactive maps and cartographic-based narratives.

How very cool!

With a couple of notable exceptions (see the article) mostly geography based mappings. There’s nothing wrong with geography based mappings but it makes me curious why there isn’t more diversity in mapping?

Just as a preliminary thought, could it be that geography gives us a common starting point for making ourselves understood? Rather than undertaking a burden of persuasion before we can induce someone to use the map?

From what little I have heard (intentionally) about #Gamergate, I would say a mapping of the people, attitudes, expressions of same and the various forums would vary significantly from person to person. If you did a non-geographic mapping of that event(?) (sorry, I don’t have more precise language to use), what would it look like? What major attitudes, factors, positions would you use to lay out the territory?

Personally I don’t find the lack of a common starting point all that troubling. If a map is extensive enough, it will surely intersect some areas of interest and a reader can start to work outwards from that intersection. They may or may not agree with what they find but it would have the advantage of not being snippet sized texts divorced from some over arching context.

A difficult mapping problem to be sure, one that poses far more difficulties than one that uses physical geography as a starting point. Would even an imperfect map be of use to those trying to sort though issues in such a case?

Streaming Big Data with Spark, Spark Streaming, Kafka, Cassandra and Akka

Thu, 01/22/2015 - 20:47


Topic Maps

Webinar: Streaming Big Data with Spark, Spark Streaming, Kafka, Cassandra and Akka by Helena Edelson.

From the post:

On Tuesday, January 13 I gave a webinar on Apache Spark, Spark Streaming and Cassandra. Over 1700 registrants from around the world signed up. This is a follow-up post to that webinar, answering everyone’s questions. In the talk I introduced Spark, Spark Streaming and Cassandra with Kafka and Akka and discussed wh​​​​y these particular technologies are a great fit for lambda architecture due to some key features and strategies they all have in common, and their elegant integration together. We walked through an introduction to implementing each, then showed how to integrate them into one clean streaming data platform for real-time delivery of meaning at high velocity. All this in a highly distributed, asynchronous, parallel, fault-tolerant system.

Video | Slides | Code | Diagram

About The Presenter: Helena Edelson is a committer on several open source projects including the Spark Cassandra Connector, Akka and previously Spring Integration and Spring AMQP. She is a Senior Software Engineer on the Analytics team at DataStax, a Scala and Big Data conference speaker, and has presented at various Scala, Spark and Machine Learning Meetups.

I have long contended that it is possible to have a webinar that has little if any marketing fluff and maximum technical content. Helena’s presentation is an example of that type of webinar.

Very much worth the time to watch.

BTW, being so content full, questions were answered as part of this blog post. Technical webinars just don’t get any better organized than this one.

Perhaps technical webinars should be marked with TW and others with CW (for c-suite webinars). To prevent disorientation in the first case and disappointment in the second one.

XPath/XQuery/FO/XDM 3.1 Definitions – Deduped/Sorted/Some Comments! Version 0.1

Mon, 01/19/2015 - 15:11


Topic Maps

My first set of the XPath/XQuery/FO/XDM 3.1 Definitions, deduped, sorted, along with some comments is now online!

XPath, XQuery, XQuery and XPath Functions and Operators, XDM – 3.1 – Sorted Definitions Draft

Let me emphasize this draft is incomplete and more comments are needed on the varying definitions.

I have included all definitions, including those that are unique or uniform. This should help with your review of those definitions as well.

I am continuing to work on this and other work products to assist in your review of these drafts.

Reminder: Tentative deadline for comments at the W3C is 13 February 2015.

TinkerPop is moving to Apache (Incubator)

Mon, 01/19/2015 - 02:10


Topic Maps

TinkerPop is moving to Apache (Incubator) by Marko A. Rodriguez.

From the post:

Over the last (almost) year, we have been working to get TinkerPop into a recognized software foundation — with our eyes primarily on The Apache Software Foundation. This morning, the voting was complete and TinkerPop will become an Apache Incubator project on Tuesday January 16th.

The primary intention of this move to Apache was to:

  1. Further guarantee vendor neutrality and vendor uptake.
  2. Better secure our developers and users legally.
  3. Grow our developer and user base.

I hope people see this as a positive and will bear with us as we go through the process of migrating our infrastructure over the month of February. Note that we will be doing our 3.0.0.M7 release on Monday (Jan 15th) with it being the last TinkerPop release. The next one (M8 or GA) will be an Apache release. Finally, note that we will be keeping this mailing list with a mirror being on Apache’s servers (that was a hard won battle :).

Take care and thank you for using of our software, The TinkerPop.

So long as Marko keeps doing cool graphics, it’s fine by me.

More seriously increasing visibility can’t help but drive TinkerPop to new heights. Or for graph software, would that be to new connections?

Learn Statistics and R online from Harvard

Mon, 01/19/2015 - 01:59


Topic Maps

Learn Statistics and R online from Harvard by David Smith.

Starts January 19 (tomorrow)

From the post:

Harvard University is offering a free 5-week on-line course on Statistics and R for the Life Sciences on the edX platform. The course promises you will learn the basics of statistical inference and the basics of using R scripts to conduct reproducible research. You’ll just need a backround in basic math and programming to follow along and complete homework in the R language.

As a new course, I haven’t seen any of the content, but the presenters Rafael Irizarry and Michael Love are active contributors to the Bioconductor project, so it should be good. The course begins January 19 and registration is open through 27 April at the link below.

edX: Statistics and R for the Life Sciences

Apologies for the late notice!

Have you given any thought to an R for Voters course? Statistics using R on public data focused on current political issues? Something to think about. The talking heads on TV are already vetting possible candidates for 2016.

Obama backs call for tech backdoors [Government Frontdoors?]

Sun, 01/18/2015 - 01:37


Topic Maps

Obama backs call for tech backdoors

From the post:

President Obama wants a backdoor to track people’s social media messages.

The president on Friday came to the defense of British Prime Minister David Cameron’s call for tech companies to create holes in their technology to allow the government to track suspected terrorists or criminals.

“Social media and the Internet is the primary way in which these terrorist organizations are communicating,” Obama said during a press conference with Cameron on Friday.

“That’s not different from anybody else, but they’re good at it and when we have the ability to track that in a way that is legal, conforms with due process, rule of law and presents oversight, then that’s a capability that we have to preserve,” he said.

While Obama measured his comments, he voiced support for the views expressed by Cameron and FBI Director James Comey, who have worried about tech companies’ increasing trends towards building digital walls around users’ data that no one but them can access.

Rather than argue about tech backdoors someday, why not have government frontdoors?

ISPs can copy and direct all email traffic to and from .gov addresses to a big inbox on one of the cloud providers. So that the public can keep a closer eye on the activities of “our” government. Think of it as citizen oversight.

Surely no sensitive information about citizens finds its way into government email so we won’t need any filtering.

Petition your elected representatives for a government frontdoor. For federal, state and local governments. As taxpayers we own the accounts. Just like a private employer. The owners of those accounts wants access to them.

Now that would be open data that could make a real difference!

I first saw this in a tweet by Violet Blue.

PS: We also need phone records for office and cell phones of all government employees. Signals data I think they call it.

Bulk Collection of Signals Intelligence: Technical Options (2015)

Sun, 01/18/2015 - 01:07


Topic Maps

Bulk Collection of Signals Intelligence: Technical Options (2015)


The Bulk Collection of Signals Intelligence: Technical Options study is a result of an activity called for in Presidential Policy Directive 28, issued by President Obama in January 2014, to evaluate U.S. signals intelligence practices. The directive instructed the Office of the Director of National Intelligence (ODNI) to produce a report within one year “assessing the feasibility of creating software that would allow the intelligence community more easily to conduct targeted information acquisition rather than bulk collection.” ODNI asked the National Research Council (NRC) — the operating arm of the National Academy of Sciences and National Academy of Engineering — to conduct a study, which began in June 2014, to assist in preparing a response to the President. Over the ensuing months, a committee of experts appointed by the Research Council produced the report.

Believe it or not, you can’t copy-n-paste from the pre-publication PDF file. Truly irritating.

From the report:

Conclusion 1. There is no software technique that will fully substitute for bulk collection where it is relied on to answer queries about the past after new targets become known.

A key value of bulk collection is its record of past signals intelligence that may be relevant to subsequent investigations. If past events become interesting in the present, because intelligence-gathering priorities change to include detection of new kinds of threats or because of new events such as the discovery that an individual is a terrorist, historical events and the context they provide will be available for analysis only if they were previously collected. (Emphasis in the original)

The report dodges any questions about effectiveness or appropriateness of bulk collection of signals data. However, its number one conclusion provides all the ammunition one needs to establish that bulk signals intelligence gathering is a clear and present danger to the American people and any semblance of a democratic government.

Would deciding that all Muslims from the Middle East represented potential terrorist threats to the United States qualify as a change in intelligence-gathering priorities? So all the bulk signals data from Muslims and their contacts in the United States suddenly becomes fair game for the NSA to investigate?

I don’t think any practicing Muslim is a threat to any government but you saw how quickly the French backslide into bigotry after Charlie Hebdo. Maybe they didn’t have that far to go. Not any further than large segments of the U.S. population.

Our National Research Council is too timid voice an opinion other than to say if you don’t preserve signals records you can’t consult them in the future. But whether there is any danger or is this a good policy choice, they aren’t up for those questions.

The focus on signals intelligence makes you wonder how local and state police have operated all these years without bulk signals intelligence? How have they survived without it? Well, for one thing they are out in the communities they serve, not cooped up in cube farms with other people who don’t have any experience with the communities in question. Simply being a member of the community makes them aware of new comers, changes in local activity, etc.

Traditional law enforcement doesn’t stop crime as a general rule because that would require too much surveillance and resources to be feasible. When a crime has been committed, law enforcement gathers evidence and in a very large (90%+) number of cases, captures the people responsible.

Which is a interesting parallel to the NSA, which has also not stopped any terrorist plots as far as anyone knows. Well, there as that case in the State of Georgia where two aging alcoholics were boosting about producing Ricin and driving down I-285 throwing it out the window. The government got a convicted child molester to work as in informant to put those two very dangerous terrorists in jail. And I don’t think the NSA was in on that one anyway.

If the NSA has stopped a major terrorist plot, something that actually was going to be another 9/11, you know it would have been leaked long before now. The absence of such leaks is the best evidence for the lack of any viable terrorist threats in the United States that I can think of.

And what if we stop bulk signals data collection and there is another terrorist attack? So, what is your question? Bulk signals collection hasn’t stopped one so far so if we stop bulk signals collection and there is another terrorist attack, look at all the money we will have saved for the same result. Just as a policy matter, we shouldn’t spend money for no measurable result.

If you really think terrorism is a threat, take the money from bulk signal data collection and fund state and local police hiring, training and paying (long term, not just a grant) more local police officers out in their communities. That will do more to reduce the potential for all types of crimes, including those labeled as terrorism.

To put it another way, bulk signal data collection is a form of wealth sharing, wealth sharing from the public treasury to contractor’s. Wealth sharing that has been shown to be ineffectual against terrorism. Why continue it?

Facebook open sources tools for bigger, faster deep learning models

Sat, 01/17/2015 - 23:55


Topic Maps

Facebook open sources tools for bigger, faster deep learning models by Derrick Harris.

From the post:

Facebook on Friday open sourced a handful of software libraries that it claims will help users build bigger, faster deep learning models than existing tools allow.

The libraries, which Facebook is calling modules, are alternatives for the default ones in a popular machine learning development environment called Torch, and are optimized to run on Nvidia graphics processing units. Among the modules are those designed to rapidly speed up training for large computer vision systems (nearly 24 times, in some cases), to train systems on potentially millions of different classes (e.g., predicting whether a word will appear across a large number of documents, or whether a picture was taken in any city anywhere), and an optimized method for building language models and word embeddings (e.g., knowing how different words are related to each other).

“‘[T]here is no way you can use anything existing” to achieve some of these results, said Soumith Chintala, an engineer with Facebook Artificial Intelligence Research.

How very awesome! Keeping abreast of the latest releases and papers on deep learning is turning out to be a real chore. Enjoyable but a time sink none the less.

Derrick’s post and the release from Facebook have more details.

Apologies for the “lite” posting today but I have been proofing related specifications where one defines a term and the other uses the term, but doesn’t cite the other specification’s definition or give its own. Do those mean the same thing? Probably the same thing but users outside the process may or may not realize that. Particularly in translation.

I first saw this in a tweet by Kirk Borne.

Humanities Open Book: Unlocking Great Books

Sat, 01/17/2015 - 01:38


Topic Maps

Humanities Open Book: Unlocking Great Books

Deadline: June 10, 2015

A new joint grant program by the National Endowment for the Humanities (NEH) and the Andrew W. Mellon Foundation seeks to give a second life to outstanding out-of-print books in the humanities by turning them into freely accessible e-books.

Over the past 100 years, tens of thousands of academic books have been published in the humanities, including many remarkable works on history, literature, philosophy, art, music, law, and the history and philosophy of science. But the majority of these books are currently out of print and largely out of reach for teachers, students, and the public. The Humanities Open Book pilot grant program aims to “unlock” these books by republishing them as high-quality electronic books that anyone in the world can download and read on computers, tablets, or mobile phones at no charge.

The National Endowment for the Humanities (NEH) and the Andrew W. Mellon Foundation are the two largest funders of humanities research in the United States. Working together, NEH and Mellon will give grants to publishers to identify great humanities books, secure all appropriate rights, and make them available for free, forever, under a Creative Commons license.

The new Humanities Open Book grant program is part of the National Endowment for the Humanities’ agency-wide initiative The Common Good: The Humanities in the Public Square, which seeks to demonstrate and enhance the role and significance of the humanities and humanities scholarship in public life.

“The large number of valuable scholarly books in the humanities that have fallen out of print in recent decades represents a huge untapped resource,” said NEH Chairman William Adams. “By placing these works into the hands of the public we hope that the Humanities Open Book program will widen access to the important ideas and information they contain and inspire readers, teachers and students to use these books in exciting new ways.”

“Scholars in the humanities are making increasing use of digital media to access evidence, produce new scholarship, and reach audiences that increasingly rely on such media for information to understand and interpret the world in which they live,” said Earl Lewis, President of the Andrew W. Mellon Foundation. “The Andrew W. Mellon Foundation is delighted to join NEH in helping university presses give new digital life to enduring works of scholarship that are presently unavailable to new generations of students, scholars, and general readers.”

The National Endowment for the Humanities and the Andrew W. Mellon Foundation will jointly provide $1 million to convert out-of-print books into EPUB e-books with a Creative Commons (CC) license, ensuring that the books are freely downloadable with searchable texts and in formats that are compatible with any e-reading device. Books proposed under the Humanities Open Book program must be of demonstrable intellectual significance and broad interest to current readers.

Application guidelines and a list of F.A.Q’s for the Humanities Open Book program are available online at The application deadline for the first cycle of Humanities Open Book grants is June 10, 2015.

What great news to start a weekend!

If you decided to apply, remember that topic maps can support indexes for a book or across books or across books and including other material. You could make a classic work in the humanities into a portal that opens onto work prior to its publication, at the time of its publication, or since. Something to set yourself apart from simply making that text available.

Key Court Victory Closer for IRS Open-Records Activist

Sat, 01/17/2015 - 01:12


Topic Maps

Key Court Victory Closer for IRS Open-Records Activist by Suzanne Perry.

From the post:

The open-records activist Carl Malamud has moved a step closer to winning his legal battle to give the public greater access to the wealth of information on Form 990 tax returns that nonprofits file.

During a hearing in San Francisco on Wednesday, U.S. District Judge William Orrick said he tentatively planned to rule in favor of Mr. Malamud’s group, Public. Resource. Org, which filed a lawsuit to force the Internal Revenue Service to release nonprofit tax forms in a format that computers can read. That would make it easier to conduct online searches for data about organizations’ finances, governance, and programs.

“It looks like a win for Public. Resource and for the people who care about electronic access to public documents,” said Thomas Burke, the group’s lawyer.

The suit asks the IRS to release Forms 990 in machine-readable format for nine nonprofits that had submitted their forms electronically. Under current practice, the IRS converts all Forms 990 to unsearchable image files, even those that have been filed electronically.

That’s a step in the right direction but not all that will be required.

Suzanne goes on to note that the IRS removes donor lists from the 990 forms.

Any number of organizations will object but I think the donor lists should be public information as well.

Making all donors public may discourage some people from donating to unpopular causes but that’s a hit I would be willing to take to know who owns the political non-profits. And/or who funds the NRA for example.

Data that isn’t open enough to know who is calling the shots at organizations isn’t open data, its an open data tease.

What Counts: Harnessing Data for America’s Communities

Fri, 01/16/2015 - 22:44


Topic Maps

What Counts: Harnessing Data for America’s Communities Senior Editors: Naomi Cytron, Kathryn L.S. Pettit, & G. Thomas Kingsley. (new book, free pdf)

From: A Roadmap: How To Use This Book

This book is a response to the explosive interest in and availability of data, especially for improving America’s communities. It is designed to be useful to practitioners, policymakers, funders, and the data intermediaries and other technical experts who help transform all types of data into useful information. Some of the essays—which draw on experts from community development, population health, education, finance, law, and information systems—address high-level systems-change work. Others are immensely practical, and come close to explaining “how to.” All discuss the incredibly exciting opportunities and challenges that our ever-increasing ability to access and analyze data provide.

As the book’s editors, we of course believe everyone interested in improving outcomes for low-income communities would benefit from reading every essay. But we’re also realists, and know the demands of the day-to-day work of advancing opportunity and promoting well-being for disadvantaged populations. With that in mind, we are providing this roadmap to enable readers with different needs to start with the essays most likely to be of interest to them.

For everyone, but especially those who are relatively new to understanding the promise of today’s data for communities, the opening essay is a useful summary and primer. Similarly, the final essay provides both a synthesis of the book’s primary themes and a focus on the systems challenges ahead.

Section 2, Transforming Data into Policy-Relevant Information (Data for Policy), offers a glimpse into the array of data tools and approaches that advocates, planners, investors, developers and others are currently using to inform and shape local and regional processes.

Section 3, Enhancing Data Access and Transparency (Access and Transparency), should catch the eye of those whose interests are in expanding the range of data that is commonly within reach and finding ways to link data across multiple policy and program domains, all while ensuring that privacy and security are respected.

Section 4, Strengthening the Validity and Use of Data (Strengthening Validity), will be particularly provocative for those concerned about building the capacity of practitioners and policymakers to employ appropriate data for understanding and shaping community change.

The essays in section 5, Adopting More Strategic Practices (Strategic Practices), examine the roles that practitioners, funders, and policymakers all have in improving the ways we capture the multi-faceted nature of community change, communicate about the outcomes and value of our work, and influence policy at the national level.

There are of course interconnections among the essays in each section. We hope that wherever you start reading, you’ll be inspired to dig deeper into the book’s enormous richness, and will join us in an ongoing conversation about how to employ the ideas in this volume to advance policy and practice.

Thirty-one (31) essays by dozens of authors on data and its role in public policy making.

From the acknowledgements:

This book is a joint project of the Federal Reserve Bank of San Francisco and the Urban Institute. The Robert Wood Johnson Foundation provided the Urban Institute with a grant to cover the costs of staff and research that were essential to this project. We also benefited from the field-building work on data from Robert Wood Johnson grantees, many of whom are authors in this volume.

If you are pitching data and/or data projects where the Federal Reserve Bank of San Francisco/Urban Institute set the tone of policy making conversations, a must read. It is likely to have an impact on other policy discussions, but adjusted for local concerns and conventions. You could also use it to shape your local policy discussions.

I first saw this in There is no seamless link between data and transparency by Jennifer Tankard.