For GraphLab matrix libraries: linear solvers, matrix factorization and clustering we recommend using this format for the input file. Once this format is used for the input, the same format will be used for the output files.

__Sparse matrices__

Matrix market has a very simple format: 2 header lines. Here are some examples. Here is

example 3x4 matrix:

A = 0.8147 0 0.2785 0 0.9058 0.6324 0.5469 0.1576 0.1270 0.0975 0.9575 0

And here is the matching matrix market file:

%%MatrixMarket matrix coordinate real general % Generated 27-Feb-2012 3 4 9 1 1 0.8147236863931789 2 1 0.9057919370756192 3 1 0.1269868162935061 2 2 0.6323592462254095 3 2 0.09754040499940952 1 3 0.2784982188670484 2 3 0.5468815192049838 3 3 0.9575068354342976 2 4 0.1576130816775483

The first line, include the header. coordinate means this is a sparse matrix.

The third row includes the matrix size: 3 4 9 means 3 rows, 4 columns and 9 non zero entries.

The rest of the rows include the edges. For example the 4th row is:

1 1 0.8147236863931789, namely means that it is the first row, first column and its value.

TIP: Sparse matrices should NOT include zero entries!

For example, the row 1 1 0 is not allowed!

TIP: First two numbers in each non-zero entry line should be integers and not double notation. For example 1e+2 is not a valid row/col number. It should be 100 instead.

TIP:

__Row/column number always start from one__(and not from zero as in C!)

TIP: It is not legal to include the same non zero entry twice. In GraphLab it will result in a duplicate edge error. Note that the number of edges in GraphLab starts in zero, so you have to add one to source and target values to detect the edge in the matrix market file.

__Dense matrices:__

Here is an example on how to save the same matrix as dense matrix:

A = 0.8147 0 0.2785 0 0.9058 0.6324 0.5469 0.1576 0.1270 0.0975 0.9575 0

%%MatrixMarket matrix array real general 3 4

0.8147

0.9058 0.1270 0 0.6324 0.0975 0.2785 0.5469 0.9575 0

0.1576

0

__Symmetric sparse matrices:__

Here is an example for sparse symmetric matrix:

B = 1.5652 0 1.4488 0 2.0551 2.1969 1.4488 2.1969 2.7814

And here is the matching matrix market file:

%%MatrixMarket matrix coordinate real symmetric % Generated 27-Feb-2012 3 3 5 1 1 1.5652 3 1 1.4488 2 2 2.0551 3 2 2.1969 3 3 2.7813

Note that each non-diagonal edges is written only once.

__Sparse vectors:__

Here is an example for sparse vector:

v = 1 0 0 1

%%MatrixMarket matrix coordinate real general % Generated 27-Feb-2012 1 4 2 1 1 1 1 4 1

__Working with matlab:__

download the files http://graphlab.org/mmwrite.m and http://graphlab.org/mmread.m

In Matlab you can save a dense matrix using:

>> mmwrite('filename', full(matrixname));And save a sparse matrix using:

>> mmwrite('filename', sparse(matrixname));For reading a sparse or dense matrix you can:

>> A = mmread('filename');

__Writing a conversion function in Java__

This section explains how to convert Mahout 0.4 sequence vectors into matrix market format.

Create a file named Vec2mm.java with the following content:

import java.io.BufferedWriter; import java.io.FileWriter; import java.util.Iterator; import org.apache.mahout.math.SequentialAccessSparseVector; import org.apache.mahout.math.Vector; import org.apache.mahout.math.VectorWritable; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.SequenceFile; /** * Code for converting Hadoop seq vectors to matrix market format * @author Danny Bickson, CMU * */ public class Vec2mm{ public static int Cardinality; /** * * @param args[0] - input svd file * @param args[1] - output matrix market file * @param args[2] - number of rows * @param args[3] - number of columns * @param args[4] - number oi non zeros * @param args[5] - transpose */ public static void main(String[] args){ try { if (args.length != 6){ System.err.println(Usage: java Vec2mm <input seq vec file> < output matrix market file> <number of rows> <number of cols> <number of non zeros> <transpose output>Compile this code using the command:); System.exit(1); } final Configuration conf = new Configuration(); final FileSystem fs = FileSystem.get(conf); final SequenceFile.Reader reader = new SequenceFile.Reader(fs, new Path(args[0]), conf); BufferedWriter br = new BufferedWriter(new FileWriter(args[1])); int rows = Integer.parseInt(args[2]); int cols = Integer.parseInt(args[3]); int nnz = Integer.parseInt(args[4]); boolean transpose = Boolean.parseBoolean(args[5]); IntWritable key = new IntWritable(); VectorWritable vec = new VectorWritable(); br.write("%%MatrixMarket matrix coordinate real general\n"); if (!transpose) br.write(rows + " " +cols + " " + nnz + "\n"); else br.write(cols + " " + rows + " " + nnz + "\n"); while (reader.next(key, vec)) { //System.out.println("key " + key); SequentialAccessSparseVector vect = (SequentialAccessSparseVector)vec.get(); //System.out.println("key " + key + " value: " + vect); Iterator iter = vect.iterateNonZero(); while(iter.hasNext()){ Vector.Element element = iter.next(); //note: matrix market offsets starts from one and not from zero if (!transpose) br.write((element.index()+1) + " " + (key.get()+1)+ " " + vect.getQuick(element.index())+"\n"); else br.write((key.get()+1) + " " + (element.index()+1) + " " + vect.getQuick(element.index())+"\n"); } } reader.close(); br.close(); } catch(Exception ex){ ex.printStackTrace(); } } }

javac -cp /mnt/bigbrofs/usr7/bickson/hadoop-0.20.2/lib/core-3.1.1.jar:/mnt/bigbrofs/usr7/bickson/mahout-0.4/taste-web/target/mahout-taste-webapp-0.5-SNAPSHOT/WEB-INF/lib/mahout-core-0.5-SNAPSHOT.jar:/mnt/bigbrofs/usr7/bickson/mahout-0.4/taste-web/target/mahout-taste-webapp-0.5-SNAPSHOT/WEB-INF/lib/mahout-math-0.5-SNAPSHOT.jar:/mnt/bigbrofs/usr7/bickson/hadoop-0.20.2/lib/commons-cli-1.2.jar:/mnt/bigbrofs/usr7/bickson/hadoop-0.20.2/hadoop-0.20.2-core.jar *.java

Now run it using the command:

java -cp .:/mnt/bigbrofs/usr7/bickson/hadoop-0.20.2/lib/core-3.1.1.jar:/mnt/bigbrofs/usr7/bickson/mahout-0.4/taste-web/target/mahout-taste-webapp-0.5-SNAPSHOT/WEB-INF/lib/mahout-core-0.5-SNAPSHOT.jar:/mnt/bigbrofs/usr7/bickson/mahout-0.4/taste-web/target/mahout-taste-webapp-0.5-SNAPSHOT/WEB-INF/lib/mahout-math-0.5-SNAPSHOT.jar:/mnt/bigbrofs/usr7/bickson/hadoop-0.20.2/lib/commons-cli-1.2.jar:/mnt/bigbrofs/usr7/bickson/hadoop-0.20.2/hadoop-0.20.2-core.jar:/mnt/bigbrofs/usr7/bickson/hadoop-0.20.2/lib/commons-logging-1.0.4.jar:/mnt/bigbrofs/usr7/bickson/hadoop-0.20.2/lib/commons-logging-api-1.0.4.jar:/mnt/bigbrofs/usr7/bickson/mahout-0.4/taste-web/target/mahout-taste-webapp-0.5-SNAPSHOT/WEB-INF/lib/google-collections-1.0-rc2.jar Vec2mm A.seq A.mm 25 100 2500 false

Where A.seq is the input file in SequentialAccessSparseVector key/value store. A.mm

will be the resulting output file in matrix market format. 25 is the number of rows, 100 columns, and 2500 non zeros. false - not to transpose the resulting matrix.

Depends on the saved vector type, you may want to change my code from SequentialAccessSparseVector to the specific type you need to convert.

In the post,

ReplyDelete"TIP: Sparse matrices should include zero entries!"

maybe have typo. should --> should not

Right?

Nice catch! fixed.

ReplyDeleteFor the dense matrix matrices, I think you miss the first two entries: 0.8147 and 0.9058.

ReplyDeleteNice catch! Example fixed. Thanks

DeleteI didn't get about dense matrix:

ReplyDeleteI suppose (correct me if i'm wrong) that all elements (including zero elements) must follow in order of rows and columns then.

So I don't understand why

1) there are strange indents

2) several elements have difference for 0.0001

Hi Alex,

Delete1) Indents were a blog formatting issue - now fixed.

2) I used matlab print code which did some rounding of the full numbers, I fixed it now to match.

Thanks for detecting the typos!