Voron and the FreeDB Dataset
i got tired of doing arbitrary performance testing, so i decided to take the freedb dataset and start working with that. freedb is a data set used to look up cd information based on the a nearly unique disk id. this is a good dataset, because it contains a lot of data (over three million albums, and over 40 million songs), and it is production data. that means that it is dirty . this makes it perfect to run all sort of interesting scenarios. the purpose of this post (and maybe the new few) is to show off a few things. first, we want to see how voron behaves with realistic data set. second, we want to show off the way voron works, its api, etc. to start with, i run my freedb parser, pointing it at /dev/null. the idea is to measure what is the cost of just going through the data is. we are using freedb-complete-20130901.tar.bz2 from sep 2013. after 1 minute, we went through 342,224 albums, and after 6 minutes we were at 2,066,871 albums. reading the whole 3,328,488 albums took about a bit over ten minutes. so just the cost of parsing and reading the freedb dataset is pretty expensive. the end result is a list of objects that looks like this: now, let us see how we want to actually use this. we want to be able to: lookup an album by the disk ids lookup all the albums by an artist*. lookup albums by album title*. this gets interesting, because we need to deal with questions such as: “given pearl jam, if i search for pearl, do i get them? do i get it for jam?” for now, we are going to go with case insensitive, but we won’t be doing full text search, we will allow, however, prefix searches. we are using the following abstraction for the destination: public abstract class destination { public abstract void accept(disk d); public abstract void done(); } basically, we read data as fast as we can, and we shove it to the destination, until we are done. here is the voron implementation: public class vorondestination : destination { private readonly storageenvironment _storageenvironment; private writebatch _currentbatch; private readonly jsonserializer _serializer = new jsonserializer(); private int counter = 1; public vorondestination() { _storageenvironment = new storageenvironment(storageenvironmentoptions.forpath("freedb")); using (var tx = _storageenvironment.newtransaction(transactionflags.readwrite)) { _storageenvironment.createtree(tx, "albums"); _storageenvironment.createtree(tx, "ix_artists"); _storageenvironment.createtree(tx, "ix_titles"); tx.commit(); } _currentbatch = new writebatch(); } public override void accept(disk d) { var ms = new memorystream(); _serializer.serialize(new jsontextwriter(new streamwriter(ms)), d); ms.position = 0; var key = new slice(endianbitconverter.big.getbytes(counter++)); _currentbatch.add(key, ms, "albums"); if(d.artist != null) _currentbatch.multiadd(d.artist.tolower(), key, "ix_artists"); if (d.title != null) _currentbatch.multiadd(d.title.tolower(), key, "ix_titles"); if (counter%1000 == 0) { _storageenvironment.writer.write(_currentbatch); _currentbatch = new writebatch(); } } public override void done() { _storageenvironment.writer.write(_currentbatch); } } let us go over this in detail, shall we? in line 10 we create a new storage environment. in this case, we want to just import the data, so we can create the storage inline. on lines 13 – 15, we create the relevant trees. you can think about voron trees in a very similar manner to the way you think about tables. they are a way to separate data into different parts of the storage. note that this still all reside in a single file, so there isn’t a physical separation. note that we created an albums tree, which will contain the actual data. and ix_artists, ix_titles trees. those are indexes into the albums tree. you can see them being used just a little lower. in the accept method, you can see that we use a writebatch, a native voron notion that allows us to batch multiple operations into a single transaction. in this case, for every album, we are making 3 writes. first, we write all of the data, as a json string, into a stream and put it in the albums tree. then we create a simple incrementing integer to be the actual album key. finally, we add the artist and title entries (lower case, so we don’t have to worry about case sensitivity in searches) into the relevant indexes. at 60 seconds, we written 267,998 values to voron. in fact, i explicitly designed it so we can see the relevant metrics. at 495 seconds we have reads 1,995,385 entries from the freedb file, we parsed 1,995,346 of them and written to voron 1,610,998. as you can imagined, each step is running in a dedicated thread, so we can see how they behave on an individual basis. the good thing about this is that i can physically see the various costs, it is actually pretty cool here is the voron directory at 60 seconds: you can see that we have two journal files active (haven’t been applied to the data file yet) and the db.voron file is at 512 mb. the compression buffer is at 32 mb (this is usually twice as big as the biggest transaction, uncompressed). the scratch buffer is used to hold in flight transaction information (until we send it to the data file), and you can see it is sitting on 256mb in size. at 15 minutes, we have the following numbers: 3,035,452 entries read from the file, 3,035,426 parsed and 2,331,998 written to voron. note that we are reading the file & writing to voron on the same disk, so that might impact the read performance. at that time, we can see the following on the disk: note that we increase the size of most of our files by factor of 2, so some of the space in the db.voron file is probably not used. note that we needed more scratch space to handle the in flight information. the entire process took 22 minutes, start to finish. although i have to note that this hasn’t been optimized at all, and i know we are doing a lot of stupid stuff through it. you might have noticed something else, we actually “crashed” closed the voron db, this was done to see what would happen when we open a relatively large db after an unordered shutdown. we’ll actually get to play with the data in my next post. so far this has been pretty much just to see how things are behaving. and… i just realized something, i forgot to actually add an index on disk id . which means that i have to import the data again. but before that, i also wrote the following: public class jsonfiledestination : destination { private readonly gzipstream _stream; private readonly streamwriter _writer; private readonly jsonserializer _serializer = new jsonserializer(); public jsonfiledestination() { _stream = new gzipstream(new filestream("freedb.json.gzip", filemode.createnew, fileaccess.readwrite), compressionlevel.optimal); _writer = new streamwriter(_stream); } public override void accept(disk d) { _serializer.serialize(new jsontextwriter(_writer), d); _writer.writeline(); } public override void done() { _writer.flush(); _stream.dispose(); } } this completed in ten minutes, for 3,328,488 entries. or a rate of about 5,538 per / second. the result is a 845mb gzip file. i had twofold reasons to want to do this. first, this gave me something to compare ourselves to, and more to the point, i can re-use this gzip file for my next tests, without having to go through the expensive parsing of the freedb file. i did just that and ended up with the following: public class voronentriesdestination : entrydestination { private readonly storageenvironment _storageenvironment; private writebatch _currentbatch; private int counter = 1; public voronentriesdestination() { _storageenvironment = new storageenvironment(storageenvironmentoptions.forpath("freedb")); using (var tx = _storageenvironment.newtransaction(transactionflags.readwrite)) { _storageenvironment.createtree(tx, "albums"); _storageenvironment.createtree(tx, "ix_diskids"); _storageenvironment.createtree(tx, "ix_artists"); _storageenvironment.createtree(tx, "ix_titles"); tx.commit(); } _currentbatch = new writebatch(); } public override int accept(string d) { var disk = jobject.parse(d); var ms = new memorystream(); var writer = new streamwriter(ms); writer.write(d); writer.flush(); ms.position = 0; var key = new slice(endianbitconverter.big.getbytes(counter++)); _currentbatch.add(key, ms, "albums"); int count = 1; foreach (var diskid in disk.value("diskids")) { count++; _currentbatch.multiadd(diskid.value(), key, "ix_diskids"); } var artist = disk.value("artist"); if (artist != null) { count++; _currentbatch.multiadd(artist.tolower(), key, "ix_artists"); } var title = disk.value("title"); if (title != null) { count++; _currentbatch.multiadd(title.tolower(), key, "ix_titles"); } if (counter % 100 == 0) { _storageenvironment.writer.write(_currentbatch); _currentbatch = new writebatch(); } return count; } public override void done() { _storageenvironment.writer.write(_currentbatch); _storageenvironment.dispose(); } } now we are actually properly disposing of things, and i also decreased the size of the batch, to see how it would respond. note that it is now being fed directly from the gzip file, at a greatly reduced cost. i also added tracking note only for how many albums we write, but also how many entries . by entries i mean, how many voron entries (which include the values we add to the index). i did find a bug where we would just double the file size without due consideration to its size, so now we are doing smaller file size increases. word of warning : i didn’t realized until after i was done with all the benchmarks, but i actually run all of those in debug configuration, which basically means that it is utterly useless as a performance metric. that is especially true because we have a lot of verifier code that runs in debug mode. so please don’t take those numbers as actual performance metrics, they aren’t valid. time # of albums # of entries 4 minutes 773,398 3,091,146 6 minutes 1,126,998 4,504,550 8 minutes 1,532,858 6,126,413 18 minutes 2,781,698 11,122,799 24 minutes 3,328,488 13,301,496 the status of the file system midway during the run. you can see that now we increase the file is smaller increments. and that we are using more scratch space, probably because we are under very heavy write load. after the run: scratch & compression are only used when the database is running, and deleted on close. the database is 7gb in side, which is quite respectable. now, to working with it, but i’ll save that for my next post, this one is long enough already.
February 20, 2014
·
3,344 Views
·
0 Likes
Comments
Sep 28, 2022 · Marcy Tillman
I don't really like that approach, because where the data is sitting is a critical aspect.
If you are putting the data behind an API call, that means that a lot of functionality is lost.
You can't do your own filtering or aggregation, can't join between your own data and published one, etc. It also ties your own availability to the API endpoint. And eventually, you create a true mesh, where a single node failing brings down the whole system.
Independent and isolated pieces are far healthier in my eyes.