Rare DB Corruption on macOS

Hello!

We’ve been running into occasional DB corruption issues, at a rate of approximately 1/500 users over the last month. This shows up in the form of database disk image is malformed (code: 11).

We didn’t run into any of these issues when using regular SQLite, and migrated to SQLCipher just last month. We’ve kept all our settings identical, but noticed that we accidentally disabled WAL when switching. (I’ve just re-enabled WAL, as of today)

We’re currently:

  • macOS only
  • Using SQLCipher 4.5.3 via SQLite.swift
  • Using CommonCrypto (SQLCIPHER_CRYPTO_CC)

We’ll update this thread with any updates from our experiment of going back to WAL, but we’d love any other thoughts on how we could better investigate the source of this corruption. We have some corrupted user DBs, and can run additional diagnostics if needed.

Some more research has made us wonder whether this is a macOS + SQLite issue and not a SQLCipher issue:

  • macOS lies about fsync - artificially claiming that fsync has completed before it actually completes. You can apparently fix this by adding F_FULLSYNC.

  • Whenever a system lies about fsync this gives you some odds of corruption when a crash happens after the reported successful write but before actual persistence.

  • WAL is much more robust to this than the rollback journal - hence disabling WAL would exacerbate this issue significantly.

  • Our app tends to write much more often than most apps (writing multiple times every ~2s, the entire time the user’s computer is on)

We’ll post an update in a week or two about whether it turns out this was just a macOS issue, or if it’s still an unresolved issue that’s potentially within SQLCipher.