Compact Database Feature

IMPORTANT: This post is out-of-date and new documentation is on the way. As of Codebook 4, Compact Database only runs a VACUUM on the local database. Please see this note below.


Over time your Codebook database will grow in size. Not only because of adding data, but also as a result of performing many sync operations.

Codebook stores additional information to track changes to your database which assist in making decisions on what to exchange when performing a sync operation. Codebook currently can’t determine when to clean out this tracking information because it’s unknown whether these changes have been synced with all instances of Codebook.

We’ve included a new feature in Codebook for macOS and Codebook for Windows called Compact Database (found under the File menu) which cleans out this tracking information and reclaims space in your database.

You have to sync all your changes with each copy of Codebook before your database can be compacted, or your changes will be lost.

Compacting your database may substantially reduce its size and speed up sync operations.

Here are the steps for how to properly compact your database and have it propagate to all your other copies of Codebook:

  1. Sync every copy of Codebook to ensure that all changes are seen (the Compact Database feature isn’t permitted if you haven’t synced in the last 5 minutes).
  2. Go to File → Compact Database…
  3. If you’re using a Cloud sync provider: Perform an Overwrite operation from the database you just compacted, and then Restore operation from every other device you use Codebook on.
    If you’re using WiFi sync: Perform a Restore operation from every other device you use Codebook on (restoring from the database you just compacted).
    If you’re using Local Folder sync: Perform an Overwrite operation from the database you just compacted. If any other Codebook devices are syncing with the same Local Folder, perform a restore operation from those devices.

That’s it! Continue syncing as regular afterwards.

Note: There should be no need to compact your database frequently. Most likely annually or semiannually is enough.

You did not cover Local Folder Sync. I use that method. My Local Folder happens to be a folder in the local copy of OneDrive. That gets synced to OneDrive. I did your steps 1 and 2. I do the same thing on my other devices and thus they are all actually using the same folder. Especially nice now that we have the new On Demand feature in the latest Windows 10 update. For your step 3, I do the compression on my desktop. Overwrite the Local Folder then go to other devices and restore from Local Folder. With On Demand feature they are all in sync and compressed

2 Likes

@lwetzel

Thanks for posting with this additional information, it will definitely help anyone looking for the info. I’ve updated the post to include Local Folder sync as well.

Cheers,
Micah

This information is from before Codebook 4.0. Does the new shard-based synchronization system affect the compact database feature in any way? I’m using WiFi sync.

I am still syncing this way.

Hi @jtrh and @lwetzel,

While the Compact Database option remains at the bottom of the File menu in Codebook for macOS, it isn’t the same feature as it was prior to Codebook 4. Basically, now it runs a VACUUM on the database, that will compact it somewhat, and this can make database operations a little bit more efficient (generally not enough that you would notice–if the UI is blocking or unresponsive definitely let us know!)

This means that Compact Database (perhaps we should rename the menu item to “Vacuum Database”) only affects the local encrypted database. This doesn’t affect any other copies of Codebook on your other devices, or Sync data stored in a cloud service remote (i.e. Dropbox and Google Drive). If you decide to run Compact Database, afterward you can go about your business, there’s no need to Sync the change to other devices anymore (indeed, it would not affect them if you did.) Our apologies that we haven’t already updated the confirmation sheet and this post to note the change.

Compact Database was originally put in place to help mitigate an excessive amount of data that could accumulate over time in the database from the previous Sync implementation. Due to the way the old Sync system had to exchange entire copies of the database file, Sync could end up taking a long time to complete. That is no longer an issue with the updated Sync system. You don’t have to worry about the size of the db anymore, or using this to make Sync run faster.

We’ll update the original post soon to note the change in behavior, and get that dialog updated in Codebook for macOS that redirects the user here with the “More Info” button.

Also, in case it’s not clear, Local Folder option for Sync is still there, we have no plans to remove it.

Please let us know if you have any other questions!

1 Like

That begs the question then: What about accumulating shards in the database directory. I’ve got hundreds of them in my cloud storage location (Google Drive). Are they automatically cleaned up by anything, or would that be a good use for the Compact Database command?

Hi @Dan_Danz, thanks for posting your question. Could you clarify for us whether it is the number of shards files in Google Drive that are concerning, or if it happens to be a matter of the amount of storage taken up? Would you mind also telling us the size of the Zetetic folder in your Google Drive?

Thanks!

It would help me understand if you could point me to some documentation for how shards are used, and the new sync scheme methods.

Hi @Dan_Danz, thanks for that screenshot, it’s helpful to know more about your experience so far.

We are currently expanding the documentation on replication, but we haven’t yet gone into much technical detail about the changeset files (a.k.a. shards). In short, the set of shards you have there and the associated encrypted metadata db file can be used to reconstitute your entire database on a new device from scratch, or from the last time that device checked in.

If you wanted to “clean” those changesets up into one changeset file you could do that by “re-establishing the remote”, as we originally called it internally: whack the Zetetic folder on Google Drive, sync with Google Drive fresh, Codebook will treat it like a “new” remote and write out one big changeset file that contains all the changes needed to build your database from scratch.

The next version of Codebook, 4.1, will provide an Overwrite operation you could use to do this within Codebook without having to go deleting the files out on the service yourself, etc.

I wouldn’t recommend you do this just to reduce the number of files stored, as it would make Sync much more inefficient (slower) than it would be otherwise. It means all your other devices will need to download and replay this entire changeset while likely having to ignore most of it, instead of processing just what they need.

Thanks, William…

I’ve concluded that the succinct answer to “What is in the shards directory?” is contained in this sentence in the beta doc although it never mentions “shards”.

In Codebook 4 we updated Sync to separate user changes into discrete chunks of data that can be passed around on sync (rather than entire databases being exchanged and compared).

And I now understand why I should not flush the sub-directory shards.

However, I think it can grow uncontrollably, and as long as a given change has not been applied to at least one of many incarnations of the composite database, the shard with these changes should not be deleted. So, is there any way to compact the shards directory? Perhaps the meta-data could maintain a last-updated timestamp for each of the replicas; any shards with a timestamp less than the earliest replica timestamp could thus be removed. Possible?

Hey @Dan_Danz

Glad to hear the explanation from the documentation was helpful.

I’d like to mention that aside from the first couple changeset files within the Shards directory (which contain entire replays of your databases), all the changeset files are very small, so even though the actual file counts are continually growing each time you sync, the actual sizes are very small. We like to think of the changesets required to reconstitute your database as growing reasonably and predictably. They allow us to bring any stale device up-to-date while also being able to set up a new device.

We always appreciate hearing suggestions from Codebook users, but unfortunately your suggestion wouldn’t work. Specifically we need to preserve the entire history of the Shards directory in case a new device is added and needs to apply the entirety of its changes. Each Codebook client is already tracking the “High Water Mark” of what it’s seen and what’s been seen by this specific remote, ensuring that it’s as efficient as possible and only downloading/applying/uploading changes that are unseen.

We may investigate providing an automated way of “consolidating” these changes in the future, but at this time because the file sizes are so small we haven’t seen the need.

That being said, as @wgray mentioned, in the next release of Codebook, 4.1, there will be an easy way for a user to clear out and re-establish their remote by utilizing the overwrite operation if they so desire.

We’ve got some more information in this other post that may help to put the design of the new sync system into perspective.