Split table in (postgreSQL) randomly 50/50
sql random split
sql random sample by group
sql server random sample
sql split table into multiple tables
sql sampling data
sql divide evenly
I have a table with two columns in postgresql: original id and duplicate id.
original_id duplicate_id 1 1 2 2 3 3 4 4 5 5 6 6
I would like to randomly split this table in 50/50, so I can put a specific tag in each
original_id duplicate_id tag 1 1 control 2 2 treatment 3 3 treatment 4 4 control 5 5 treatment 6 6 control
What is important: 1. The selection has to be random 2. The split has to be 50/50 (or the closest to this if the number of rows is odd)
You can select half of the rows in a random order with this query:
select * from my_table order by random() limit (select count(*)/ 2 from my_table)
Use it to tag the rows:
with control as ( select * from my_table order by random() limit (select count(*)/ 2 from my_table) ) select *, case when t in (select t from control t) then 'control' else 'treatment' end from my_table t;
Splitting a table into two tables randomly with 50-50 percent of , -- insert half of the rows into table 2 INSERT INTO table2 SELECT * FROM table1 WHERE rand() < 0.50 ; -- insert the rest rows into table 3 INSERT INTO table3 SELECT t1. * FROM table1 AS t1 LEFT JOIN table2 AS t2 ON t1.pk = t2.pk WHERE t1.pk IS NULL ; After that, we can drop table1 . We could use a random function to split the table in 2 (almost equal) halves.:-- insert half of the rows into table 2 INSERT INTO table2 SELECT * FROM table1 WHERE rand() < 0.50 ; -- insert the rest rows into table 3 INSERT INTO table3 SELECT t1.* FROM table1 AS t1 LEFT JOIN table2 AS t2 ON t1.pk = t2.pk WHERE t1.pk IS NULL ;
You can use
rownumber() OVER (ORDER BY random()) to assign a random number to each record. Then use it in a
CASE to assign either the tag
'treatment' depending on the number being less than (or equal) than the half of the count of rows in the table or not.
SELECT that looks like this:
SELECT original_id, duplicate_id, CASE WHEN rn <= (SELECT count(*) / 2 FROM elbat) THEN 'control' ELSE 'treatment' END tag FROM (SELECT original_id, duplicate_id, row_number() OVER (ORDER BY random()) rn FROM elbat) x;
If you want an
UPDATE (I'm not sure on this), assuming, that the pair of
duplicate_id is unique, this could look like:
UPDATE elbat t SET tag = CASE WHEN rn <= (SELECT count(*) / 2 FROM elbat) THEN 'control' ELSE 'treatment' END FROM (SELECT original_id, duplicate_id, row_number() OVER (ORDER BY random()) rn FROM elbat) x WHERE x.original_id = t.original_id AND x.duplicate_id = t.duplicate_id;
SELECT result on the Fiddle gives a nice example, that the order of the rows returned can be totally different from the physical one, if the optimizer likes it better that way.)
sql - How to split a table?, Hi, I want to split a table to 2 small tables. The 1st one contains 60% records which are randomly selected from the source table. How to do it? your entries, this should give a 50/50 split, which is reuseable for future things Re: [GENERAL] How to split a table? Felix Zhang < [hidden email] > schrieb: > Hi, > > I want to split a table to 2 small tables. The 1st one contains 60% records > which are randomly selected from the source table.
I would use window functions:
select t.*, (case when seqnum <= cnt / 2 then 'treatment' else 'control end) as tag from (select t.*, count(*) over () as cnt, row_number() over (order by random() as seqnum from t ) t;
Actually, random is random. So, you don't need the count. You can use modulo arithmetic instead:
select t.*, (case when row_number() over (order by random()) % 2 = 1 then 'treatment' else 'control' end) as tag from t;
Is it possible to split our entire list into two random groups to test a , If you want to adjust the test split, such as 25-75 instead of 50-50, this is where you would do so. (If you're comfortable with using Excel formulas, an alternate Christoph Frick i do my A/B-Group splitting usually by &1 the serial of the table. assuming, that there are no irregularities in the process of creating your entries, this should give a 50/50 split, which is reuseable for future things and there never is a intersection of the two groups. -- cu
You can make the
random() generate the values 1 or 2 using the formula:
(random() + 1)::int
select t.*, case (random() + 1)::int when 1 then 'treatment' else 'control' end as tag from t;
(random() * (upper_limit - 1) + lower_limit)::int will generate numbers between upper_limit and lower_limit (inclusive). If upper limit is 2 then the multiplication can be removed (because it would be
* 1 which doesn't change anything), but if you want to e.g. generate four random values you can use that as well:
select t.*, case (random() * 3 + 1)::int when 1 then 'treatment' when 2 then 'control' when 3 then 'something' else 'some other thing' end as tag from t;
RANDOM function - Amazon Redshift, Generates a random value greater than or equal to 0.0 and less than 1.0. Subject: Re: [SQL] [GENERAL] How to split a table? Felix Zhang <felix(dot)zhang(dot)2005(at)gmail(dot)com> schrieb: > Hi, > > I want to split a table to 2 small tables. The 1st one contains 60% records > which are randomly selected from the source table. > How to do it? Why do you want to do this? Andreas-- Really, I'm not out to destroy Microsoft.
SQL SELECT RANDOM, SQL select random() function with sql, tutorial, examples, insert, update, delete, select, join, database, table, join. 0 POSTGRESQL Selecting only duplicate accounts that have at least one duplicate phone number May 24 '18 0 Eliminate rows with names that are slightly different Jun 3 '18 0 Split table in (postgreSQL) randomly 50/50 Sep 4 '18
ML Design Pattern #5: Repeatable sampling, Use a well-distributed column to split your data into train/valid/test Many machine learning tutorials will suggest that you split your data randomly into training, validation, and test in C++ (and hence: Java or Python) and in BigQuery SQL. CREATE OR REPLACE TABLE mydataset.mytable ASSELECT PostgreSQL offers a way to specify how to divide a table into pieces called partitions. The table that is divided is referred to as a partitioned table.The specification consists of the partitioning method and a list of columns or expressions to be used as the partition key.
PostgreSQL Database Limits, The maximum number of columns that can be accommodated in a PostgreSQL table depends on the configured block size and the type of the column. For the PostgreSQL also provides versions of these functions that use the regular function invocation syntax (see Table 9-7). Note: Before PostgreSQL 8.3, these functions would silently accept values of several non-string data types as well, due to the presence of implicit coercions from those data types to text .
- Maybe you can use the random() function and a CASE statement. If random() < .5 return control else return treatment. Manual: postgresql.org/docs/10/static/index.html
- This would be a great solution, unfortunately it does not guarantee even distribution in partitions. You may get e.g. 5/1 (5
control) as well as 1/5 or 4/2 etc.