Sunday, August 17, 2014

DataRobot raises 21M$ series A

Just got this from my collaborator Jay Gu: DataRobot raises 21M$ in series A. A Boston startup who is trying to automate data science. According to this blog post, the invested was led by NEA, who also invested in Databricks (Spark) as well as GraphLab.

A related company is SparkBeyond, an Israeli startup who raised 4M$. They also automate data science by automatically generating features and evaluating them using multiple algorithms.

Tuesday, August 12, 2014

Interesting paper from Dataiku about WCSD 2014

Dataiku recently won first prize at the Yandex WCSD 2014 competition. Here is a paper describing their methodology. Dataiku was recently present at our GraphLab Conference. They have a visual Excel like environment for data manipulation, cleaning and predictions.

Friday, August 8, 2014

Sparse K-means

I got from my collaborator Jay Gu the following recent paper: A Single-Pass Algorithm for Efficiently Recovering Sparse Cluster Centers of High-dimensional Data from ICML 2014. Basically it is K-means with L1 constraint on the cluster center. The results are sparse cluster centers, which may sense for example when clustering text documents together.

A second relevant paper I got from my collaborator Yao Wu is
Web-Scale K-Means Clustering by Scully from Google Pittsburgh. The paper uses mini batch to speed up computation and achieve sparsity using project gradient ascent.

Tuesday, August 5, 2014

Misc News

Collaborative filtering tutorial by Netflix

I got this from my collaborator Alice Zheng, a lecture about collaborative filtering by Xavier Amatriain from Netflix at the MLSS summer school organized by Alex Smola at CMU:

Deep learning @ Spotify

I got the following from my colleague Zach Nation: An Interesting blog post from Spotify about convolutional neural networks usage to learn latest factors for collaborative filtering. And here is the related NIPS paper.

Cloud Service @ Databricks

Just recently Databricks has announced their business model: cloud service running Apache spark.
Here is the keynote at the Spark summit:

MapGraph: First Multi-GPU Graph Analytics System by Systap

I just heard from my colleague Bryan Thompson from Systap that they have recently released MapGraph: the first distributed graph analytics framework which supports GPUs. Here is their blog post giving additional details.

Sunday, July 27, 2014

MapGraph: speeding up graph processing with GPUs

The following paper about MapGraph was recently published at the GRADES workshop. It explains the basic construction Systap is using for speeding up graph processing on GPUs. (Systap is a bay area startup working on accelerating graph computation).