Another word for it

Subscribe to Another word for it feed
Updated: 2 days 9 hours ago

How to find bugs in MySQL

Sun, 04/20/2014 - 21:41

Categories:

Topic Maps

How to find bugs in MySQL by Roel Van de Paar.

From the post:

Finding bugs in MySQL is not only fun, it’s also something I have been doing the last four years of my life.

Whether you want to become the next Shane Bester (who is generally considered the most skilled MySQL bug hunter worldwide), or just want to prove you can outsmart some of the world’s best programmers, finding bugs in MySQL is a skill not reserved anymore to top QA engineers armed with a loads of scripts, expensive flash storage and top-range server hardware. Off course, for professionals that’s still the way to go, but now anyone with an average laptop and a standard HDD can have a lot of fun trying to find that elusive crash…

If you follow this post carefully, you may well be able to find a nice crashing bug (or two) running RQG (an excellent database QA tool). Linux would be the preferred testing OS, but if you are using Windows as your main OS, I would recommend getting Virtual Box and running a Linux guest in a suitably sized (i.e. large) VM. In terms of the acronym “RQG”, this stands for “Random Query Generator,” also named “randgen.”

If you’re not just after finding any bug out there (“bug hunting”), you can tune the RQG grammars (files that define what sort of SQL RQG executes) to more or less match your “issue area.” For example, if you are always running into a situation where the server crashes on a DELETE query (as seen at the end of the mysqld error log for example), you would want an SQL grammar that definitely has a variety of DELETE queries in it. These queries should be closely matched with the actual crashing query – crashes usually happen due to exactly the same, or similar statements with the same clauses, conditions etc.

Just in case you feel a bit old for an Easter egg hunt today, consider going on a MySQL bug hunt.

Curious, do you know of RQG-like suites for noSQL databases?

PS: RQG Documentation (github)

Annotating, Extracting, and Linking Legal Information

Sun, 04/20/2014 - 20:59

Categories:

Topic Maps

Annotating, Extracting, and Linking Legal Information by Adam Wyner. (slides)

Great slides, provided you have enough background in the area to fill in the gaps.

I first saw this at: Wyner: Annotating, Extracting, and Linking Legal Information, which has collected up the links/resources mentioned in the slides.

Despite decades of electronic efforts and several centuries of manual effort before that, legal information retrieval remains an open challenge.

Google Genomics Preview

Sun, 04/20/2014 - 20:40

Categories:

Topic Maps

Google Genomics Preview by Kevin.

From the post:

Welcome to the Google Genomics Preview! You’ve been approved for early access to the API.

The goal of the Genomics API is to encourage interoperability and build a foundation to store, process, search, analyze and share tens of petabytes of genomic data.

We’ve loaded sample data from public BAM files:

  • The complete 1000 Genomes Project
  • Selections from the Personal Genome Project

How to get started:

You will need to obtain an invitation to being playing.

Don’t be disappointed that Google is moving into genomics.

After all, gathering data and supplying a processing back-end for it is a critical task but not a terribly imaginative one.

The analysis you perform and the uses you enable, that’s the part that takes imagination.

Data Integration: A Proven Need of Big Data

Sun, 04/20/2014 - 20:21

Categories:

Topic Maps

When It Comes to Data Integration Skills, Big Data and Cloud Projects Need the Most Expertise by David Linthicum.

From the post:

Looking for a data integration expert? Join the club. As cloud computing and big data become more desirable within the Global 2000, an abundance of data integration talent is required to make both cloud and big data work properly.

The fact of the matter is that you can’t deploy a cloud-based system without some sort of data integration as part of the solution. Either from on-premise to cloud, cloud-to-cloud, or even intra-company use of private clouds, these projects need someone who knows what they are doing when it comes to data integration.

While many cloud projects were launched without a clear understanding of the role of data integration, most people understand it now. As companies become more familiar with the could, they learn that data integration is key to the solution. For this reason, it’s important for teams to have at least some data integration talent.

The same goes for big data projects. Massive amounts of data need to be loaded into massive databases. You can’t do these projects using ad-hoc technologies anymore. The team needs someone with integration knowledge, including what technologies to bring to the project.

Generally speaking, big data systems are built around data integration solutions. Similar to cloud, the use of data integration architectural expertise should be a core part of the project. I see big data projects succeed and fail, and the biggest cause of failure is the lack of data integration expertise.

Even if not exposed to the client, a topic map based integration analysis of internal and external data records should give you a competitive advantage in future bids. After all you won’t have to re-interpret the data and all its fields, just the new ones or ones that have changed.

Group Explorer 2.2

Sun, 04/20/2014 - 16:01

Categories:

Topic Maps

Group Explorer 2.2

From the webpage:

Primary features listed here, or read the version 2.2 release notes.

  • Displays Cayley diagrams, multiplication tables, cycle graphs, and objects with symmetry
  • Many common group-theoretic computations can be done visually
  • Compare groups and subgroups via morphisms (see illustration below)
  • Browsable, searchable group library
  • Integrated help system (which you can preview on the web)
  • Save and print images at any scale and quality

Are there symmetries in your data?

I first saw this in a tweet by Steven Strogatz.

BTW, Steven also points to this example of using Group Explorer: Cayley diagrams of the first five symmetric groups.

The Next Giant List of Digitised Manuscript Hyperlinks

Sun, 04/20/2014 - 15:50

Categories:

Topic Maps

The Next Giant List of Digitised Manuscript Hyperlinks by Sarah J. Biggs.

From the post:

It’s that time of year again, friends – when we inflict our quarterly massive list of manuscript hyperlinks upon an unsuspecting public. As always, this list contains everything that has been digitised up to this point by the Medieval and Earlier Manuscripts department, complete with hyperlinks to each record on our Digitised Manuscripts site. There will be another updated list here on the blog in three months; you can download the current version here: Download BL Medieval and Earlier Digitised Manuscripts Master List 10.04.13. Have fun!

The listing has reached one of my favorites: Yates Thompson MS 36, also known as: Dante Alighieri, Divina commedia. Publication date proposed to be after 1444. (Warning: Do not view with Chrome. Warns of a “redirect loop.” Displays fine with Firefox.)

Great description of the manuscript plus three hundred and ninety-nine (399) images.

But it does seem to just lay there doesn’t it?

Suggestions?

12 Things TEDx Speakers do that Preachers Don’t.

Sat, 04/19/2014 - 23:55

Categories:

Topic Maps

12 Things TEDx Speakers do that Preachers Don’t.

From the post:

Ever seen a TEDx talk? They’re pretty great. Here’s one I happen to enjoy, and have used in a couple of sermons. I’ve wondered for a long time, “How in the world do each of these talks end up consistently blowing me away?” So I did some research, and found the TEDx talk guidelines for speakers. Some of the advice was basic – but some of it was unexpected. Much of it, I think, is a welcome wake up call to preachers who are communicating in a 21st century postmodern, post-Christian context. Obviously, some of this doesn’t fit with a preacher’s ethos: but much of it does.

That said, here are 12 things TEDx speakers do that preachers usually don’t:

A great retelling of the guidelines for TEDx speakers!

With the conference season (summer) rapidly approaching, now is the time to take this advice to heart!

Imagine a conference presentation without the filler than everyone in the room already knows (or should to be attending the conference). I keep longing for papers that don’t repeat largely the same introduction as every other paper in the area.

Yes, graphs have nodes/vertices, edges/arcs and you are g-o-i-n-g t-o l-a-b-e-l t-h-e-m.

The advice for TEDx speakers is equally applicable to webcasts and podcasts.

New trends in sharing data science work

Sat, 04/19/2014 - 23:42

Categories:

Topic Maps

New trends in sharing data science work

Danny Bickson writes:

I got the following venturebeat article from my colleague Carlos Guestrin.

It seems there is an interesting trend of allowing data scientists to share their work: Imagine if a company’s three highly valued data scientists can happily work together without duplicating each other’s efforts and can easily call up the ingredients and results of each other’s previous work.

That day has come. As the data scientist arms race continues, data scientists might want to join forces. Crazy idea, right? Two San Francisco startups — Domino Data Lab and Sense — have emerged recently with software to let data scientists collaborate on multiple projects. In a way, it’s like code storehouse GitHub for the data science world. A Montreal startup named Plot.ly has been talking about the same themes, but it brings a more social twist. Another startup, Mode Analytics, is building software for data analysts to ask questions of data without duplicating previous efforts. And at least one more mature software vendor, Alpine Data Labs, has been adding features to help many colleagues in a company apply algorithms to code on one central hub.

If you aren’t already registered for GraphLab Conference 2014, notice that Alpine Data Labs, Domino Data Labs, Mode Analytics, Plot.ly, and, Sense will all be at the GraphLab Conference.

Go ahead, register for the GraphLab conference. At the very worst you will learn something. If you socialize a little bit, you will meet some of the brightest graph people on the planet.

Plus, when the history of “sharing” in data science is written, you will have attended one of the early conferences on sharing code for data science. After years of hoarding data (where you now see open data) and beginning to see code sharing, data science is developing a different model.

And you were there to cheer them on!

GraphChi Users Survey

Sat, 04/19/2014 - 21:02

Categories:

Topic Maps

GraphChi Users Survey

From the form:

This survey is used to find out about experiences of users of GraphChi. These results will be used in Aapo Kyrola’s Ph.D. thesis.

If you are using GraphChi, your experiences can help with Aapo Kyrola’s Ph.D. thesis.

Pass this along to anyone you know using GraphChi (and try GraphChi yourself).

Enter, Update, Exit… [D3.js]

Wed, 04/16/2014 - 00:49

Categories:

Topic Maps

Enter, Update, Exit – An Introduction to D3.js, The Web’s Most Popular Visualization Toolkit by Christian Behrens.

From the webpage:

Over the past couple of years, D3, the groundbreaking JavaScript library for data-driven document manipulation developed by Mike Bostock, has become the Swiss Army knife of web-based data visualization. However, talking to other designers or developers who use D3 in their projects, I noticed that one of the core concepts of it remains somewhat obscure and is often referred to as »D3’s magic«: Data joins and selections.

Given a solid command of basic JavaScript, this article should help you to wrap your head around these two fundamental concepts and get you started using D3 for your dataviz projects.

If you encounter anyone not already using D3.js, pass this page along to them.

I first saw this in a tweet by Halftone.

GraphChi-DB [src released]

Wed, 04/16/2014 - 00:33

Categories:

Topic Maps

GraphChi-DB

From the webpage:

GraphChi-DB is a scalable, embedded, single-computer online graph database that can also execute similar large-scale graph computation as GraphChi. it has been developed by Aapo Kyrola as part of his Ph.D. thesis.

GraphChi-DB is written in Scala, with some Java code. Generally, you need to know Scala quite well to be able to use it.

IMPORTANT: GraphChi-DB is early release, research code. It is buggy, it has awful API, and it is provided with no guarantees. DO NOT USE IT FOR ANYTHING IMPORTANT.

GraphChi-DB source code arrives!

Enjoy!

Wandora – New Version [TMQL]

Wed, 04/16/2014 - 00:23

Categories:

Topic Maps

Wandora – New Version

From the webpage:

It is over six months since last Wandora release. Now we are finally ready to publish new version with some very interesting new features. Release 2014-04-15 features TMQL support and embedded HTML browser, for example. TMQL is the topic map query language and Wandora allows the user to search, query and modify topics and associations with TMQL scripts. Embedded HTML browser expands Wandora’s internal visualizations repertoire. Wandora embedded HTTP server services are now available inside the Wandora application….

Change Log, Download.

Two of the biggest changes:

Download your copy today!

I will post a review by mid-May, 2014.

Interested to hear your comments, questions and suggestions in the mean time.

BTW, the first suggestion I have is that the download file should NOT be wandora.zip but rather wandora-(date).zip if nothing else. Ditto for the source files and javadocs.

The Bw-Tree: A B-tree for New Hardware Platforms

Tue, 04/15/2014 - 21:22

Categories:

Topic Maps

The Bw-Tree: A B-tree for New Hardware Platforms by Justin J. Levandoski, David B. Lomet, and, Sudipta Sengupta.

Abstract:

The emergence of new hardware and platforms has led to reconsideration of how data management systems are designed. However, certain basic functions such as key indexed access to records remain essential. While we exploit the common architectural layering of prior systems, we make radically new design decisions about each layer. Our new form of B-tree, called the Bw-tree achieves its very high performance via a latch-free approach that effectively exploits the processor caches of modern multi-core chips. Our storage manager uses a unique form of log structuring that blurs the distinction between a page and a record store and works well with flash storage. This paper describes the architecture and algorithms for the Bw-tree, focusing on the main memory aspects. The paper includes results of our experiments that demonstrate that this fresh approach produces outstanding performance.

With easy availability of multi-core chips, what new algorithms are you going to discover while touring SICP or TAOCP?

Is that going to be an additional incentive to tour one or both of them?

Why and How to Start Your SICP Trek

Tue, 04/15/2014 - 13:35

Categories:

Topic Maps

Why and How to Start Your SICP Trek by Kai Wu.

From the post:

This post was first envisioned for those at Hacker Retreat – or thinking of attending – before it became more general. It’s meant to be a standing answer to the question, “How can I best improve as a coder?”

Because I hear that question from people committed to coding – i.e. professionally for the long haul – the short answer I always give is, “Do SICP!” *

Since that never seems to be convincing enough, here’s the long answer. I’ll give a short overview of SICP’s benefits, then use arguments from (justified) authority and argument by analogy to convince you that working through SICP is worth the time and effort. Then I’ll share some practical tips to help you on your SICP trek.

* Where SICP = The Structure and Interpretation of Computer Programs by Hal Abelson and Gerald Sussman of MIT, aka the Wizard book.

BTW, excuse my enthusiasm for SICP if it comes across at times as monolingual theistic fanaticism. I’m aware that there are many interesting developments in CS and software engineering outside of the Wizard book – and no single book can cover everything. Nevertheless, SICP has been enormously influential as an enduring text on the nature and fundamentals of computing – and tends to pay very solid dividends on your investments of attention.

A great post with lots of suggestions on how to work your way through SICP.

What it can’t supply is the discipline to actually make your way through SICP.

I was at a Unicode Conference some years ago and met Don Knuth. I said something in the course of the conversation about reading some part of TAOCP and Don said rather wistfully that he wished he would met someone who had read it all.

It seems sad that so many of us have dipped into it here or there but not really taken the time to explore it completely. Rather like reading Romeo and Juliet for the sexy parts and ignoring the rest.

Do you have a reading plan for TAOCP after you finish SICP?

I first saw this in a tweet by Computer Science.

SIGBOVIK 2014

Mon, 04/14/2014 - 15:14

Categories:

Topic Maps

SIGBOVIK 2014 (pdf)

From the cover page:

The Association for Computational Heresy

presents

A record of the Proceeding of

SIGBOVIK 2014

The eight annual intercalary robot dance in celebration of workshop on symposium about Harry Q. Bovik’s 26th birthday.

Just in case news on computer security is as grim this week as last, something to brighten your spirits.

Enjoy!

I first saw this in a tweet by John Regehr.

tagtog: interactive and text-mining-assisted annotation…

Mon, 04/14/2014 - 13:55

Categories:

Topic Maps

tagtog: interactive and text-mining-assisted annotation of gene mentions in PLOS full-text articles by Juan Miguel Cejuela, et al.

Abstract:

The breadth and depth of biomedical literature are increasing year upon year. To keep abreast of these increases, FlyBase, a database for Drosophila genomic and genetic information, is constantly exploring new ways to mine the published literature to increase the efficiency and accuracy of manual curation and to automate some aspects, such as triaging and entity extraction. Toward this end, we present the ‘tagtog’ system, a web-based annotation framework that can be used to mark up biological entities (such as genes) and concepts (such as Gene Ontology terms) in full-text articles. tagtog leverages manual user annotation in combination with automatic machine-learned annotation to provide accurate identification of gene symbols and gene names. As part of the BioCreative IV Interactive Annotation Task, FlyBase has used tagtog to identify and extract mentions of Drosophila melanogaster gene symbols and names in full-text biomedical articles from the PLOS stable of journals. We show here the results of three experiments with different sized corpora and assess gene recognition performance and curation speed. We conclude that tagtog-named entity recognition improves with a larger corpus and that tagtog-assisted curation is quicker than manual curation.

Database URL: www.tagtog.net, www.flybase.org.

Encouraging because the “tagging” is not wholly automated nor is it wholly hand-authored. Rather the goal is to create an interface that draws on the strengths of automated processing as moderated by human expertise.

Annotation remains at a document level, which consigns subsequent users to mining full text but this is definitely a step in the right direction.

3 Common Time Wasters at Work

Sun, 04/13/2014 - 21:32

Categories:

Topic Maps

3 Common Time Wasters at Work by Randy Krum.

See Randy’s post for the graphic but #2 was:

Non-work related Internet Surfing

It occurred to me that “Non-work related Internet Surfing” is indistinguishable from….search. At least at arm’s length or better.

And so many people search poorly that a lack of useful results is easy to explain.

Yes?

So, what is the strategy to get the rank and file to use more efficient information systems than search?

Their non-use or non-effective use of your system can torpedo a sale just as quickly as any other cause.

Suggestions?