Random line using sed

bash random line from file
random line from file python
select random lines from a file linux
pick random lines
random lines unix
linux select n random lines
random select lines
sed extract lines matching pattern

I want to select a random line with sed. I know shuf -n and sort -R | head -n does the job, but for shuf you have to install coreutils, and for the sort solution, it isn't optimal on large data :

Here is what I tested :

echo "$var" | shuf -n1

Which gives the optimal solution but I'm afraid for portability that's why I want to try it with sed.

`var="Hi
 i am a student
 learning scripts"`

output:
i am a student

output:
hi

It must be Random.

It depends greatly on what you want your pseudo-random probability distribution to look like. (Don't try for random, be content with pseudo-random. If you do manage to generate a truly random value, go collect your nobel prize.) If you just want a uniform distribution (eg, each line has equal probability of being selected), then you'll need to know a priori how many lines of are in the file. Getting that distribution is not quite so easy as allowing the earlier lines in the file to be slightly more likely to be selected, and since that's easy, we'll do that. Assuming that the number of lines is less than 32769, you can simply do:

N=$(wc -l < input-file)
sed -n -e $((RANDOM % N + 1))p input-file

-- edit --

After thinking about it for a bit, I realize you don't need to know the number of lines, so you don't need to read the data twice. I haven't done a rigorous analysis, but I believe that the following gives a uniform distribution:

awk 'BEGIN{srand()} rand() < 1/NR { out=$0 } END { print out }' input-file

-- edit -- Ed Morton suggests in the comments that we should be able to invoke rand() only once. That seems like it ought to work, but doesn't seem to. Curious:

$ time for i in $(seq 400); do awk -v seed=$(( $(date +%s) + i)) 'BEGIN{srand(seed); r=rand()} r < 1/NR { out=$0 } END { print out}'  input; done | awk '{a[$0]++} END { for (i in a) print i, a[i]}' | sort
1 205
2 64
3 37
4 21
5 9
6 9
7 9
8 46

real    0m1.862s
user    0m0.689s
sys     0m0.907s
$ time for i in $(seq 400); do awk -v seed=$(( $(date +%s) + i)) 'BEGIN{srand(seed)} rand() < 1/NR { out=$0 } END { print out}'  input; done | awk '{a[$0]++} END { for (i in a) print i, a[i]}' | sort
1 55
2 60
3 37
4 50
5 57
6 45
7 50
8 46

real    0m1.924s
user    0m0.710s
sys     0m0.932s

scripts - How to display a random line from a text file?, You can use shuf utility to print random lines from file Just for fun, here is a pure bash solution which doesn't use shuf , sort , wc , sed , head� In the section rnd=$ ((1 + $RANDOM % $ (wc -l < example_file.txt))), we choose a random number from within range of 1 to the number of lines in the file In the section sed -n “$ {rnd}p” example_file.txt, we instruct sed to print the line whose number is equal to the previously obtained random number

var="Hi
i am a student
learning scripts"

mapfile -t array <<< "$var"      # create array from $var

echo "${array[$RANDOM % (${#array}+1)]}"
echo "${array[$RANDOM % (${#array}+1)]}"

Output (e.g.):

learning scripts
i am a student

See: help mapfile

Using sed to extract lines in a text file, Hi Peter, How to extract a random line assigned by a script variable, in sed? Reply� Sed examples with equivalent commands 1. Print a random line (equivalent to sort -R – | head -1) sed -n "$ ( ( RANDOM % 100 ))p" - 2.Print the last line in file (equivalent to head -1 -) sed -n '$p' - 3. Print the first 10 lines in a file (equivalent to head -n 10 -) sed '10q' - 4. Print the first

This seems to be the best solution for large input files:

awk -v seed="$RANDOM" -v max="$(wc -l < file)" 'BEGIN{srand(seed); n=int(rand()*max)+1} NR==n{print; exit}' file

as it uses standard UNIX tools, it's not restricted to files that are 32,769 lines long or less, it doesn't have any bias towards either end of the input, it'll produce different output even if called twice in 1 second, and it exits immediately after the target line is printed rather than continuing to the end of the input.


Update:

Having said the above, I have no explanation for why a script that calls rand() once per line and reads every line of input is about twice as fast as a script that calls rand() once and exits at the first matching line:

$ seq 100000 > file

$ time for i in $(seq 500); do
    awk -v seed="$RANDOM" -v max="$(wc -l < file)" 'BEGIN{srand(seed); n=int(rand()*max)+1} NR==n{print; exit}' file;
done > o3

real    1m0.712s
user    0m8.062s
sys     0m9.340s

$ time for i in $(seq 500); do
    awk -v seed="$RANDOM" 'BEGIN{srand(seed)} rand() < 1/NR{ out=$0 } END { print out}' file;
done > o4

real    0m29.950s
user    0m9.918s
sys     0m2.501s

They both produced very similar types of output:

$ awk '{a[$0]++} END { for (i in a) print i, a[i]}' o3 | awk '{sum+=$2; max=(NR>1&&max>$2?max:$2); min=(NR>1&&min<$2?min:$2)} END{print NR, sum, min, max}'
498 500 1 2

$ awk '{a[$0]++} END { for (i in a) print i, a[i]}' o4 | awk '{sum+=$2; max=(NR>1&&max>$2?max:$2); min=(NR>1&&min<$2?min:$2)} END{print NR, sum, min, max}'
490 500 1 3

Final Update:

Turns out it was calling wc that (unexpectedly to me at least!) was taking most of the time. Here's the improvement when we take it out of the loop:

$ time { max=$(wc -l < file); for i in $(seq 500); do awk -v seed="$RANDOM" -v max="$max" 'BEGIN{srand(seed); n=int(rand()*max)+1} NR==n{print; exit}' file; done } > o3

real    0m24.556s
user    0m5.044s
sys     0m1.565s

so the solution where we call wc up front and rand() once is faster than calling rand() for every line as expected.

How can I delete multiple random lines from a text file using sed , You probably wanted to use RANDOM % 90 rather then & . That's where the zeroes come from (deleting line 1 is OK, on the next run, the lines� You can add a blank line on every n-th line of the input file very easily using sed. The next commands add a blank line on every third line of input-file. $ sed 'n;n;G;' input-file. Use the following to add the blank line on every second line. $ sed 'n;G;' input-file 43. Printing the Last N-th Lines

on bash shell, first initialize seed to # line cube or your choice

$ i=;while read a; do let i++;done<<<$var; let RANDOM=i*i*i

$ let l=$RANDOM%$i+1 ;echo -e $var |sed -En "$l p"

if move your data to varfile

$ echo -e $var >varfile
$ i=;while read a; do let i++;done<varfile; let RANDOM=i*i*i

$ let l=$RANDOM%$i+1 ;sed -En "$l p" varfile

put the last inside loop e.g. for((c=0;c<9;c++)) { ;}

Read Random Line From a File in Linux, Using awk. Another way we can approach the problem is with the use of awk. The first approach is when we know the number of lines in the file� Range specified is inclusive of those line numbers. $ sed -n '2,4p' ip.txt range substitution pattern. $ can be used to specify last line. Space can be used between address and command for clarity. $ sed -n '3,$ s/ [aeiou]//gp' ip.txt sbstttn pttrn smpl. GNU sed.

Using GNU sed and bash; no wc or awk:

f=input-file
sed -n $((RANDOM%($(sed = $f | sed '2~2d' | sed -n '$p')) + 1))p $f

Note: The three seds in $(...) are an inefficient way to fake wc -l < $f. Maybe there's a better way -- using only sed of course.

Bash Sed Examples – Linux Hint, Here is a list of examples sed commands with corresponding equivalent commands, where some will be covered in detail below. 1. Print a random line� But by using this, I have to change the value of n manually right? I want that shell to automatically chose another line in random. Not exactly needed to be in random. But some other line. – Anandu M Das Sep 18 '14 at 13:25

How to delete random line from file using Unix command, sed -i "${line}d" myDataFile.txt. You can add these commands in a script file and execute the script,. line=$(� To sample random lines from command line, default shuf from core utils is probably the best choice. It is very easy to use and outperforms others in term of execution time. However, everything depends on a task. For machine learning problems sampling is not a bottleneck and might not require fastest execution.

How to shuffle lines in the file in linux, We can't shuffle line using single sed command, but we will do by combining Out put of this command will always be randomly shuffled lines. Deleting line <N> to <M> inclusive: sed '<N>,<M>d' file user@debian ~ % sed '2,4d' file line1 line5 Printing / deleting lines specifying patterns: Sample file: First line Start printing / deleting here Random line Random line Random line Stop printing / deleting here Last line Printing lines matching <pattern>: sed -n '/<pattern>/p' file

Random line using sed - bash - iOS, I want to select a random line with sed. I know shuf -n and sort -R | head -n does the job, but for shuf you have to install coreutils, and for the sort solution, it isn't� I need a sed line that will take STDM111 and change it to STDM161 the STDM will always be constant but the 3 numbers after will be random, I just need it to always replace the middle number with 6 regardless of what the numbers are.

Comments
  • Welcome to SO. Stack Overflow is a question and answer page for professional and enthusiastic programmers. Add your own code to your question. You are expected to show at least the amount of research you have put into solving this question yourself.
  • Thank yeah no problem i will edit
  • Is the number of lines known?
  • doing this with bash and sed would be even less efficient, what's wrong with installing coreutils?
  • Why do you need to use sed specifically? There are lots of tools available on just about every system that might be a better fit.
  • I like the awk solution and I agree, it seems like it should work but couldn't you just call rand() once in the BEGIN section and use a variable instead of calling it once per line of input? Since srand() by default seeds with the current seconds since epoch value it'll produce the same output if you run it twice within 1 sec - if you care you could change that by awk -v seed="$RANDOM" 'BEGIN{srand(seed)...'
  • Subsequent calls to rand() will produce new values...it's only the same if you re-spawn awk. But you're right! We only need to call it once! That'll speed it up quite a bit.
  • Its the re-spawning awk case I was talking about. It's unlikely you'd call it twice in 1 second but if you did you'd get the same output (unless you got lucky and crossed a seconds since the epoch change). Personally I usually don't care to handle that but since the OP seems pretty focused on randomness I thought I'd suggest seed="$RANDOM" if it matters.
  • Tried calling rand only once in BEGIN, and the output is heavily skewing to the beginning of the file. Not sure why...
  • @EdMorton I did not. Running the scripts for timing now.
  • @WilliamPursell - would you mind checking the above and trying the timings to see if you get similar results?
  • On my laptop, the 2 run script (running wc -l) actually faster. On only one run of each, 12.864 real time, vs 24.625.
  • But it occurs to me that we are perhaps missing the modern miracle of huge memory! For this sort of thing these days it's probably easier to just read the whole file into memory and then select a line randomly in END!