Postgres: How to clear transaction ID for anonymity and data reduction
We're running an evaluation platform where users can comment on certain things. A key feature is that people can comment only once, and every comment is made in anonymity.
We're using Postgres for all our data. We want to save a flag in the database that a user created a comment (so they cannot comment again). In a separate table but within the same transaction, we want to save the comment itself without any link to the user.
However, postgres saves the transaction ID of every tuple inserted into the database (
xmin of the system columns). So now there's a link between the user and their comment which we have to avoid!
Vacuuming alone does not help as it does not clear the transaction ID. See the "Note" box in the "24.1.5. Preventing Transaction ID Wraparound Failures" section in the postgres docs.
Putting those inserts in different transactions, doesn't really solve anything since transaction IDs are consecutive.
We could aggregate comments from multiple users to one large text in the database with some separators, but since old versions of this large text would be kept by postgres at least until the next vacuum, that doesn't seem like a full solution. Also, we'd still have the order of when the user added their comment, which would be nice to not save as well.
Re-writing all the tuples in those tables periodically (by a dummy UPDATE to all of them), followed by a vacuum would probably erase the "insert history" sufficiently, but that too seems like a crude hack.
Is there any other way within postgres to make it impossible to reconstruct the insertion history of a table?
Perhaps you could use something like
postgres_fdw to write to tables using a remote connection (either to the current database or another database), and thereby separate
xmin values, even though you as a user think you are doing it all in the "same transaction."
Regarding the concerns about tracking via reverse-engineering sequential xmin values, since dblink is asynchronous, this issue may become moot at scale, when many users are simultaneously adding comments to the system. This might not work if you need to be able to rollback after encountering an error—it really depends on how important it is for you to confine the operations into one transaction.
Documentation: 12: 19.4. Resource Consumption, Sets the maximum number of transactions that can be in the “prepared” state shared memory region that holds PostgreSQL's shared buffers and other shared data. Possible values are mmap (for anonymous shared memory allocated using The intent of this feature is to allow administrators to reduce the I/O impact of� allows for PostgreSQL to clear cached query plans from functions Useful to anyone using plpgsql, especially those using temp tables. XML Support This new data type (XML) validates input for well-formedness and has a set of type-safe operations. SQL/XML publishing functions, per SQL:2003.
I don't think there is a problem.
In your comment you write that you keep a flag with the user (however exactly you store it) that keeps track of which posting the user commented on. To keep that information private, you have to keep that flag private so that nobody except the user itself could read it.
If no other user can see that information, then no other user can see the
xmin on the corresponding table entries. Then nobody could make a correlation with the
xmin on the comment, so the problem is not there.
The difficult part is how you want to keep the information private which postings a user commented on. I see two ways:
Don't use database techniques to do it, but write the application so that it hides that information from the users.
Use PostgreSQL Row Level Security to do it.
There is no way you can keep the information from a superuser. Don't even try.
Documentation: 13: 27.2. The Statistics Collector, It also tracks the total number of rows in each table, and information about When the server shuts down cleanly, a permanent copy of the statistics data If the current query is the first of its transaction, this column is equal to the query_start column. To reduce confusion for users expecting a different model of lag, the lag� The PostgreSQL System Catalog is a schema with tables and views that contain metadata about all the other objects inside the database and more. With it, we can discover when various operations happen, how tables or indexes are accessed, and even whether or not the database system is reading information from memory or needing to fetch data from disk.
You could store the users with their flags and the comments on different database clusters (and use distributed transactions), then the
xmins would be unrelated.
Make sure to disable
To make it impossible to correlate the transactions in the databases, you could issue random
which do nothing but increment the transaction counter.
Managing Transaction ID Exhaustion (Wraparound) in PostgreSQL, Learn how to prevent transaction ID wraparound in PostgreSQL through some The following query is the one that we use at Crunchy Data in our be a huge reduction in IO & WAL generation during future vacuum operations. maybe even remove it and just worry about monitoring for wraparound itself. PostgreSQL writes data to the kernel's disk cache, from where it will be flushed to physical storage in due time. Many operating systems are not smart about managing this and allow large amounts of dirty data to accumulate before deciding to flush it all at once, causing long delays for new I/O requests until the flushing finishes.
Three reasons why VACUUM won't remove dead rows from a table , Whenever rows in a PostgreSQL table are updated or deleted, dead rows are left behind. A tuple is not needed if the transaction ID of the deleting transaction ( as A replication slot is a data structure that keeps the PostgreSQL server Use the pg_drop_replication_slot() function to drop replication slots� Another anonymous way of dealing in Bitcoin transactions is doing face-to-face cash transactions. Services like Localcryptos provide this service. How is a face-to-face interaction through an Escrow service anonymous? Follow the recommended do’s and don’ts to stay anonymous on Localcryptos: Always Access Paxful using a VPN or Tor.
A beginner's guide to PostgreSQL's UPDATE and autovacuum , Let us assume that the table will contain a single row (id = 16): Rule: Long transactions can delay cleanup and cause table bloat. and UPDATE / VACUUM: The nice thing is that DROP TABLE will simply delete data files on disk , which is very fast. Statistics cookies collect information anonymously. Spring Data JPA is part of Spring Data family. Spring Data makes it easier to create Spring driven applications that use new ways to access data, such as non-relational databases, map-reduction frameworks, cloud services, as well as well-advanced relational database support.
Common DBA Tasks for PostgreSQL, Describes Amazon RDS implementations of some common DBA tasks for DB instances running the PostgreSQL database engine. Use the flush() and clear() methods of the EntityManager regularly, to control the size of the first-level cache. Otherwise an OutOfMemoryException can occur when the maximum size is reached. To enable batch processing, it is important that you set the FlushMode to FlushModeType.COMMIT .