Sunday, October 19, 2014

Interesting dataset from

Was released in Strata NY. It contains US family records from 1820 onwards. The only data which is anonymized is the last name, all other details are true.

Here is a link.

Apache flink

Stratosphere project is a new system which is a competitor to Spark. Recently it was accepted as an Apache incubation project called Flink. My colleague and friend Prof. Seif Haridi from Stockholm is one of the key researchers working on this project. He is also the PhD advisor of Ali Ghodsi, one of the co-founders of Databricks.

Saturday, October 11, 2014

News from Recsys - GraphLab Create based solution wins second place

Yesterday I met Robi Palovicz at ACM Recsys. Robi is a PhD from The Hungarian Academy of Sciences, supervised by my friend and Colleague Andras Benczur. Robi was leading the team who won 2nd place at the Recsys challenge 2014.

His solution was based on Graphlab Create, and more interesting, it takes only 30 seconds to compute the winning solution with GraphLab Create!
Robi have promised to share their winning solution using an Ipython Notebook. Once it is ready we will publish it on our website and I will post a note here.

Friday, October 10, 2014

Microsoft Research Mountain View Shuts Down- What is going on in Microsoft??

I was an intern there in 2006 under Prof. Dahlia Malkhi - a great research center with some of the leading CS scientists like Leslie Lamport. Recently Microsoft has gone crazy and shutdown their center.

I hear that Facebook and others are celebrating. Everyone is trying to hire those guys.

Tuesday, September 30, 2014

Datapad - Acquired!

A couple of months ago I invited Wes McKinney the author of the popular Pandas Python library to give a talk at annual GraphLab Conference about his new startup Datapad. I just heard from my colleague Chris DuBois that his new startup Datapad was just acquired. One of the fastest exists ever!

Great job Wes!

Monday, September 22, 2014


I just got an interesting link from Norman He, Samsung SDS America: Tupleware is an academic project from Brown University who is significantly improving analytic performance vs. Hadoop and Spark.