We’re in the process of obtaining our commercial license and have a question regarding identification of encrypted SQLite databases. We have an architecture where plugins can be developed to both identify and decide how certain kinds of files are ingested within a processing engine.
We have a proprietary SQLite file format (non-encrypted right now) and have relied on the well-defined database header to handle the identification. I should mention that we set a custom application id to separate this special/proprietary DB from any ordinary SQLite database. We’re already aware that once we go with SQLCipher, that header can no longer be used since it’s fully encrypted.
I know the database header info can be accessed via PRAGMA queries; however, this whole design requires a quick method to identify such database files without having to connecting, query, etc. Is it possible to prevent SQLCipher from encrypting the header? Or can we identify these database files in some other way? Be aware that file extensions are considered untrustworthy in our business domain.
Hi @scott_parillo - thanks for getting in touch about this. Our past answer to this question has always been that it is not possible to identify SQLCipher databases. However, your post prompted us to re-examine this and develop the proposed approach documented here which would use raw key semantics to establish a fixed database header for your application’s encrypted database files:
Can you take a look and let us know if this would work for your requirements?
Sorry for the late response - I didn’t realize somebody had actually taken the time already Let me make sure I understand this because we have only been licensed for a couple of days now.
At this moment, we have a server-side password that’s used to drive the sqlite3_key function. This password uses a random crypto algorithm and has a fixed number of characters. Based on what I just read, it seems that the password could occupy up to the first 64 characters and then apply some well-known string consisting of 32 characters. In doing so, reading the first 16 bytes of the file should reveal a constant value across all encrypted databases.
If I have understood this correctly - This would definitely meet our needs.At the end of the linked document, it suggests that a future version would have a predictable header and metadata. Perhaps I missed it but I read the PDF that was linked but didn’t quite find what I was looking for. Could you possibly shed more light on how this would work? We already spent time recently to restructure our database schema just so we could access non-secured metadata info without requiring a key. Would this proposed v4 change address this need as well?
@scott_parillo I’m really glad to hear that this approach is working for you. I wanted to get back to you regarding the final question in this thread. The proposed v4 changes would implement a fixed header on the database file, which would allow identification. It would also include some metadata, but this metadata is specific to SQLCipher (e.g. how many KDF iterations are required to derive the key to open the database, whether hmac is enabled, etc). The header wouldn’t really be usable to store general-purpose metadata.
That said, since SQLCipher does allow you to attach databases, an alternate option might be to use an unencrypted database for metadata. You could simply open the unencrypted database to get the metadata, then use it as necessary.