Plaintext vs SqlCipher DB file size


#1

First of all, let me express my appreciation for the wonderful job the Zetetic team is doing for the Community. You guys are really great and thank you!

As a new user of SQLCIPHER I’m somewhat intrigued by the file size comparison of plain and sqlcipher db (of same schema and data content).

Plain db file size: 384KB; sqlcipher version of the same: 183KB

I carefully followed the use of PRAGMA key and sqlcipher_export() as explained here.

What puzzles me is that encryption is not compression and I didn’t expect the file size to reduce by this much. I inspected the encrypted version with a binary/hex viewer/editor in Visual Studio and sure it looks as random as it should be. The schemas and data contents seem identical although I haven’t carried out 100% verification. Interestingly, converting back to plain db produces the exact file size as the original plain db, so this gave me some confidence that all is fine. Or, is unicode at play here?

Could someone shed some light if encryption reduces the file size significantly (which by the way is good news). Or, is the problem that of the plain db bloating the file size unnecessarily? Viewing in binary/hex viewer indicates various contigous block of all zeros at different locations.

Perhaps the issue I’m having is that I never looked into the internal mechanics of SQLite for plain and encrypted formats (with regard to the impact of various parameters on file size, etc), which I am now keen to explore with the use of sqlcipher. Hence my first question on the topic here at Zetetic.

[UPDATE]

Having looked at things more closely, I think I am discovering something interesting which could explain what I couldn’t a few hours ago.

  • Plain db uses 96 pages of 4096-bytes each.

  • SQLCIPHER db uses 183 pages of 1024-byte each.

So what appears to be a compression (or possibly the opposite) is all dependent on the stored data, i.e., whether the fixed page size is used efficiently or not.

I am sure someone with some experience can explain this better.


#2

Hello @Dave - Thanks for getting in touch about SQLCipher. You are correct that SQLCipher does not perform any data compression. Typically when comparing databases you will actually notice that SQLCipher databases are slightly larger than their plaintext counterparts when the same page size is used. This is because SQLCipher reserves a small amount of each page to store the page Initialization Vector and MAC.

That said, as you noted in the most recent update, when using the latest version of SQLCipher there is a difference in the default page size between encrypted and non-encrypted databases. In the case of the former, the SQLCipher page size will be 1024 bytes (of which 976 will be available for data), vs. the 4096 bytes per page in the latter. The structure of the database and the type of data stored therein can greatly effect how the pages are packed, and thus the resulting database size.

If you’d like to confirm this is the case in your example, change the page size on your plaintext database to match what is used in SQLCipher, 1024 bytes, e.g.

> pragma page_size=1024;
> vacuum;

Alternately, change the SQLCipher page size when you attach the database, e.g:

> attach database 'encrypted.db' as encrypted key 'testkey';
> pragma encrypted.cipher_page_size = 4096;
> select sqlcipher_export('encrypted');
> detach database encrypted;

In either case, the expectation would be that the encrypted database is slightly larger that the plaintext database.


#3

Thank you so much. I can understand clearly now.