MongoDB/NoSQL: Keeping Document Change History

mongoose document versioning
nosql document versioning
mongodb history
mongodb development
mongodb changeset
mongodb durability
mongodb large documents
mongodb challenges

A fairly common requirement in database applications is to track changes to one or more specific entities in a database. I've heard this called row versioning, a log table or a history table (I'm sure there are other names for it). There are a number of ways to approach it in an RDBMS--you can write all changes from all source tables to a single table (more of a log) or have a separate history table for each source table. You also have the option to either manage the logging in application code or via database triggers.

I'm trying to think through what a solution to the same problem would look like in a NoSQL/document database (specifically MongoDB), and how it would be solved in a uniform way. Would it be as simple as creating version numbers for documents, and never overwriting them? Creating separate collections for "real" vs. "logged" documents? How would this affect querying and performance?

Anyway, is this a common scenario with NoSQL databases, and if so, is there a common solution?


Good question, I was looking into this myself as well.

Create a new version on each change

I came across the Versioning module of the Mongoid driver for Ruby. I haven't used it myself, but from what I could find, it adds a version number to each document. Older versions are embedded in the document itself. The major drawback is that the entire document is duplicated on each change, which will result in a lot of duplicate content being stored when you're dealing with large documents. This approach is fine though when you're dealing with small-sized documents and/or don't update documents very often.

Only store changes in a new version

Another approach would be to store only the changed fields in a new version. Then you can 'flatten' your history to reconstruct any version of the document. This is rather complex though, as you need to track changes in your model and store updates and deletes in a way that your application can reconstruct the up-to-date document. This might be tricky, as you're dealing with structured documents rather than flat SQL tables.

Store changes within the document

Each field can also have an individual history. Reconstructing documents to a given version is much easier this way. In your application you don't have to explicitly track changes, but just create a new version of the property when you change its value. A document could look something like this:

{
  _id: "4c6b9456f61f000000007ba6"
  title: [
    { version: 1, value: "Hello world" },
    { version: 6, value: "Foo" }
  ],
  body: [
    { version: 1, value: "Is this thing on?" },
    { version: 2, value: "What should I write?" },
    { version: 6, value: "This is the new body" }
  ],
  tags: [
    { version: 1, value: [ "test", "trivial" ] },
    { version: 6, value: [ "foo", "test" ] }
  ],
  comments: [
    {
      author: "joe", // Unversioned field
      body: [
        { version: 3, value: "Something cool" }
      ]
    },
    {
      author: "xxx",
      body: [
        { version: 4, value: "Spam" },
        { version: 5, deleted: true }
      ]
    },
    {
      author: "jim",
      body: [
        { version: 7, value: "Not bad" },
        { version: 8, value: "Not bad at all" }
      ]
    }
  ]
}

Marking part of the document as deleted in a version is still somewhat awkward though. You could introduce a state field for parts that can be deleted/restored from your application:

{
  author: "xxx",
  body: [
    { version: 4, value: "Spam" }
  ],
  state: [
    { version: 4, deleted: false },
    { version: 5, deleted: true }
  ]
}

With each of these approaches you can store an up-to-date and flattened version in one collection and the history data in a separate collection. This should improve query times if you're only interested in the latest version of a document. But when you need both the latest version and historical data, you'll need to perform two queries, rather than one. So the choice of using a single collection vs. two separate collections should depend on how often your application needs the historical versions.

Most of this answer is just a brain dump of my thoughts, I haven't actually tried any of this yet. Looking back on it, the first option is probably the easiest and best solution, unless the overhead of duplicate data is very significant for your application. The second option is quite complex and probably isn't worth the effort. The third option is basically an optimization of option two and should be easier to implement, but probably isn't worth the implementation effort unless you really can't go with option one.

Looking forward to feedback on this, and other people's solutions to the problem :)

MongoDB/NoSQL: Keeping Document Change History, Good question, I was looking into this myself as well. Create a new version on each change. I came across the Versioning module of the Mongoid driver for Ruby  One can have a current NoSQL database and a historical NoSQL database. There will be a an nightly ETL ran everyday. This ETL will record every value with a timestamp, so instead of values it will always be tuples (versioned fields). It will only record a new value if there was a change made on the current value, saving space in the process.


We have partially implemented this on our site and we use the 'Store Revisions in a separate document" (and separate database). We wrote a custom function to return the diffs and we store that. Not so hard and can allow for automated recovery.

Building with Patterns: The Document Versioning Pattern, This pattern is all about keeping the version history of documents available and usable. We could construct a system that uses a dedicated  As MongoDB BOLhere The maximum BSON document size is 16 megabytes. Each document in MongoDB has a specifically allocated size when it is created. Updates that increase the size of the document must allocate a new document large enough to accommodate the updated document on disk and move the document.


Why not a variation on Store changes within the document ?

Instead of storing versions against each key pair, the current key pairs in the document always represents the most recent state and a 'log' of changes is stored within a history array. Only those keys which have changed since creation will have an entry in the log.

{
  _id: "4c6b9456f61f000000007ba6"
  title: "Bar",
  body: "Is this thing on?",
  tags: [ "test", "trivial" ],
  comments: [
    { key: 1, author: "joe", body: "Something cool" },
    { key: 2, author: "xxx", body: "Spam", deleted: true },
    { key: 3, author: "jim", body: "Not bad at all" }
  ],
  history: [
    { 
      who: "joe",
      when: 20160101,
      what: { title: "Foo", body: "What should I write?" }
    },
    { 
      who: "jim",
      when: 20160105,
      what: { tags: ["test", "test2"], comments: { key: 3, body: "Not baaad at all" }
    }
  ]
}

Structuring change history data in MongoDB / NoSQL, As per @DANIEL WATROUS Blog here The size of documents and the frequency with which they are changed must factor in to the retention of  I'm thinking of trying a document store db for a project that needs to keep track of changes. I'd like to prototype with firebase, but I think the schema/implementation would be the same in MongoDb. The (simplified) setup is that I have three tables/collections Project, Asset and User.


One can have a current NoSQL database and a historical NoSQL database. There will be a an nightly ETL ran everyday. This ETL will record every value with a timestamp, so instead of values it will always be tuples (versioned fields). It will only record a new value if there was a change made on the current value, saving space in the process. For example, this historical NoSQL database json file can look like so:

{
  _id: "4c6b9456f61f000000007ba6"
  title: [
    { date: 20160101, value: "Hello world" },
    { date: 20160202, value: "Foo" }
  ],
  body: [
    { date: 20160101, value: "Is this thing on?" },
    { date: 20160102, value: "What should I write?" },
    { date: 20160202, value: "This is the new body" }
  ],
  tags: [
    { date: 20160101, value: [ "test", "trivial" ] },
    { date: 20160102, value: [ "foo", "test" ] }
  ],
  comments: [
    {
      author: "joe", // Unversioned field
      body: [
        { date: 20160301, value: "Something cool" }
      ]
    },
    {
      author: "xxx",
      body: [
        { date: 20160101, value: "Spam" },
        { date: 20160102, deleted: true }
      ]
    },
    {
      author: "jim",
      body: [
        { date: 20160101, value: "Not bad" },
        { date: 20160102, value: "Not bad at all" }
      ]
    }
  ]
}

How do you design schema for tracking change history in NoSQL , Add three attributes to the project and asset documents - WrittenWhen , WrittenBy and IsActive . Populate these in the "real" data. When a  Store changes within the document Each field can also have an individual history. Reconstructing documents to a given version is much easier this way. In your application you don't have to explicitly track changes, but just create a new version of the property when you change its value.


For users of Python (python 3+, and up of course) , there's HistoricalCollection that's an extension of pymongo's Collection object.

Example from the docs:

from historical_collection.historical import HistoricalCollection
from pymongo import MongoClient
class Users(HistoricalCollection):
    PK_FIELDS = ['username', ]  # <<= This is the only requirement

# ...

users = Users(database=db)

users.patch_one({"username": "darth_later", "email": "darthlater@example.com"})
users.patch_one({"username": "darth_later", "email": "darthlater@example.com", "laser_sword_color": "red"})

list(users.revisions({"username": "darth_later"}))

# [{'_id': ObjectId('5d98c3385d8edadaf0bb845b'),
#   'username': 'darth_later',
#   'email': 'darthlater@example.com',
#   '_revision_metadata': None},
#  {'_id': ObjectId('5d98c3385d8edadaf0bb845b'),
#   'username': 'darth_later',
#   'email': 'darthlater@example.com',
#   '_revision_metadata': None,
#   'laser_sword_color': 'red'}]

Full disclosure, I am the package author. :)

Representing Revision Data in MongoDB, Representing Revision Data in MongoDB this using a recursive relationship within a single table or by splitting the table out and storing version details in a secondary table. Historical documents need to have a unique identifier, just like the top level entity. mongodb nosql revision schema version. This pattern is all about keeping the version history of documents available and usable. We could construct a system that uses a dedicated version control system in conjunction with MongoDB. One system for the few documents that change and MongoDB for the others. This would be potentially cumbersome.


nassor/mongoose-history: Keeps a history of all changes of , Keeps a history of all changes of a document. Contribute to nassor/mongoose-​history development by creating an account on GitHub. Which NoSQL database do you recommend and how would the schema look for the following web application requirements. There can be lot of users (500k+) Every user can enter his/her documents. Every user will probably create 10-200 documents per month. Every document will be small (around 100 words) User can tag documents with his/her own tags


Further Thoughts on How to Track Versions with MongoDB, Revisit of Choice 2 (Embed Versions in a Single Document) version and update version history, is very effective at ensuring consistency of versions in ingest a new document version whilst maintaining consistency with previous versions,  Both versions work, but have trade-offs: Embedding * Pro: Works with atomic updates. You can update the original document and append changes to a "changelist"; all in one atomic update.


What are some good strategies to store document versions in , You can update the original document and append changes to a "changelist" all in one Con: Your change history is now part of every document you load. Is it better to use MySQL than MongoDB for storing users (with passwords), then use Why is MongoDB the market leader in the NoSQL space when MarkLogic  MongoDB is a document oriented database which provides high performance, high availability and easy scalability. It is classified as a NoSQL database which breaks the traditional table based relational database structure and provides a JSON like document structure with dynamic schema called BSON format.