We’re using SQLCipher 4.3.0 via Encrypted Core Data and are running into a rare data corruption bug.
We got our hands on 3 corrupted databases from our users and we’re noticing some patterns:
Running PRAGMA cipher_integrity_check results in ~0.5% of pages erroring out with the message HMAC verification failed for page XYZ
The encrypted data is all zeros for the corrupted pages. Pages adjacent to the corrupted ones have valid data
Did you ever run into anything similar ?
Could you recommend any steps that would help us debug / fix the issue ?
Hello @tspop - Thanks for reaching out. We have had some reports of similar situations. There was a change made in 4.4.2 that fixed a difficult to reproduce bug where an underlying cryptographic provider failure could cause corruption. If a database using WAL journal mode encountered a cryptographic provider error (i.e. underlying encryption routines failed to operate) during the encryption of a WAL frame, but a checkpoint operation ran anyway despite the error state, then a corrupted page could make its way into the write ahead log and back to the database. There was also a recent upstream fix in SQLite where applications using SAVEPOINTs could encounter a DB corruption. Finally, outside of SQLite / SQLCipher, there have also been some reports of write failures on mac solid state drives resulting in 4K blocks inside files.
A few questions:
Is the application is using WAL journal mode?
Outside of CoreData, does your application ever use SAVEPOINT?
Is the app running on iOS, macOS, or both?
What cryptographic provider are you using, OpenSSL, CoreCrypto, or other?
Hello @tspop - thanks for getting back to us. That is a combination of settings where we haven’t had specific issues reported before. I would still definitely start with upgrading to 4.5.1, since there are issues addressed there. Are there any specific behaviors or actions that seem to precede such a corruption (e.g. application crash, force quitting application, hard power-off)?
In the example cases where you have databases, are all the zero pages contiguous, or are they scattered around different places in the database?
Are there any specific behaviors or actions that seem to precede such a corruption (e.g. application crash, force quitting application, hard power-off)?
Looks like it the majority happen when the app gets updated.
In the example cases where you have databases, are all the zero pages contiguous, or are they scattered around different places in the database?
Out of the reports we have so far:
7/8 have fully contiguous all zero pages.
one of them has a gap (pages 573-580, 585-588 are all zeros)