Random line using sed
random line from file python
select random lines from a file linux
pick random lines
random lines unix
linux select n random lines
random select lines
sed extract lines matching pattern
I want to select a random line with sed
. I know shuf -n
and sort -R | head -n
does the job, but for shuf
you have to install coreutils
, and for the sort solution
, it isn't optimal on large data :
Here is what I tested :
echo "$var" | shuf -n1
Which gives the optimal solution but I'm afraid for portability
that's why I want to try it with sed
.
`var="Hi i am a student learning scripts"` output: i am a student output: hi
It must be Random.
It depends greatly on what you want your pseudo-random probability distribution to look like. (Don't try for random, be content with pseudo-random. If you do manage to generate a truly random value, go collect your nobel prize.) If you just want a uniform distribution (eg, each line has equal probability of being selected), then you'll need to know a priori how many lines of are in the file. Getting that distribution is not quite so easy as allowing the earlier lines in the file to be slightly more likely to be selected, and since that's easy, we'll do that. Assuming that the number of lines is less than 32769, you can simply do:
N=$(wc -l < input-file) sed -n -e $((RANDOM % N + 1))p input-file
-- edit --
After thinking about it for a bit, I realize you don't need to know the number of lines, so you don't need to read the data twice. I haven't done a rigorous analysis, but I believe that the following gives a uniform distribution:
awk 'BEGIN{srand()} rand() < 1/NR { out=$0 } END { print out }' input-file
-- edit -- Ed Morton suggests in the comments that we should be able to invoke rand() only once. That seems like it ought to work, but doesn't seem to. Curious:
$ time for i in $(seq 400); do awk -v seed=$(( $(date +%s) + i)) 'BEGIN{srand(seed); r=rand()} r < 1/NR { out=$0 } END { print out}' input; done | awk '{a[$0]++} END { for (i in a) print i, a[i]}' | sort 1 205 2 64 3 37 4 21 5 9 6 9 7 9 8 46 real 0m1.862s user 0m0.689s sys 0m0.907s $ time for i in $(seq 400); do awk -v seed=$(( $(date +%s) + i)) 'BEGIN{srand(seed)} rand() < 1/NR { out=$0 } END { print out}' input; done | awk '{a[$0]++} END { for (i in a) print i, a[i]}' | sort 1 55 2 60 3 37 4 50 5 57 6 45 7 50 8 46 real 0m1.924s user 0m0.710s sys 0m0.932s
scripts - How to display a random line from a text file?, You can use shuf utility to print random lines from file Just for fun, here is a pure bash solution which doesn't use shuf , sort , wc , sed , head� In the section rnd=$ ((1 + $RANDOM % $ (wc -l < example_file.txt))), we choose a random number from within range of 1 to the number of lines in the file In the section sed -n “$ {rnd}p” example_file.txt, we instruct sed to print the line whose number is equal to the previously obtained random number
var="Hi i am a student learning scripts" mapfile -t array <<< "$var" # create array from $var echo "${array[$RANDOM % (${#array}+1)]}" echo "${array[$RANDOM % (${#array}+1)]}"
Output (e.g.):
learning scripts i am a student
See: help mapfile
Using sed to extract lines in a text file, Hi Peter, How to extract a random line assigned by a script variable, in sed? Reply� Sed examples with equivalent commands 1. Print a random line (equivalent to sort -R – | head -1) sed -n "$ ( ( RANDOM % 100 ))p" - 2.Print the last line in file (equivalent to head -1 -) sed -n '$p' - 3. Print the first 10 lines in a file (equivalent to head -n 10 -) sed '10q' - 4. Print the first
This seems to be the best solution for large input files:
awk -v seed="$RANDOM" -v max="$(wc -l < file)" 'BEGIN{srand(seed); n=int(rand()*max)+1} NR==n{print; exit}' file
as it uses standard UNIX tools, it's not restricted to files that are 32,769 lines long or less, it doesn't have any bias towards either end of the input, it'll produce different output even if called twice in 1 second, and it exits immediately after the target line is printed rather than continuing to the end of the input.
Update:
Having said the above, I have no explanation for why a script that calls rand() once per line and reads every line of input is about twice as fast as a script that calls rand() once and exits at the first matching line:
$ seq 100000 > file $ time for i in $(seq 500); do awk -v seed="$RANDOM" -v max="$(wc -l < file)" 'BEGIN{srand(seed); n=int(rand()*max)+1} NR==n{print; exit}' file; done > o3 real 1m0.712s user 0m8.062s sys 0m9.340s $ time for i in $(seq 500); do awk -v seed="$RANDOM" 'BEGIN{srand(seed)} rand() < 1/NR{ out=$0 } END { print out}' file; done > o4 real 0m29.950s user 0m9.918s sys 0m2.501s
They both produced very similar types of output:
$ awk '{a[$0]++} END { for (i in a) print i, a[i]}' o3 | awk '{sum+=$2; max=(NR>1&&max>$2?max:$2); min=(NR>1&&min<$2?min:$2)} END{print NR, sum, min, max}' 498 500 1 2 $ awk '{a[$0]++} END { for (i in a) print i, a[i]}' o4 | awk '{sum+=$2; max=(NR>1&&max>$2?max:$2); min=(NR>1&&min<$2?min:$2)} END{print NR, sum, min, max}' 490 500 1 3
Final Update:
Turns out it was calling wc
that (unexpectedly to me at least!) was taking most of the time. Here's the improvement when we take it out of the loop:
$ time { max=$(wc -l < file); for i in $(seq 500); do awk -v seed="$RANDOM" -v max="$max" 'BEGIN{srand(seed); n=int(rand()*max)+1} NR==n{print; exit}' file; done } > o3 real 0m24.556s user 0m5.044s sys 0m1.565s
so the solution where we call wc
up front and rand()
once is faster than calling rand()
for every line as expected.
How can I delete multiple random lines from a text file using sed , You probably wanted to use RANDOM % 90 rather then & . That's where the zeroes come from (deleting line 1 is OK, on the next run, the lines� You can add a blank line on every n-th line of the input file very easily using sed. The next commands add a blank line on every third line of input-file. $ sed 'n;n;G;' input-file. Use the following to add the blank line on every second line. $ sed 'n;G;' input-file 43. Printing the Last N-th Lines
on bash shell, first initialize seed to # line cube or your choice
$ i=;while read a; do let i++;done<<<$var; let RANDOM=i*i*i $ let l=$RANDOM%$i+1 ;echo -e $var |sed -En "$l p"
if move your data to varfile
$ echo -e $var >varfile $ i=;while read a; do let i++;done<varfile; let RANDOM=i*i*i $ let l=$RANDOM%$i+1 ;sed -En "$l p" varfile
put the last inside loop e.g. for((c=0;c<9;c++)) { ;}
Read Random Line From a File in Linux, Using awk. Another way we can approach the problem is with the use of awk. The first approach is when we know the number of lines in the file� Range specified is inclusive of those line numbers. $ sed -n '2,4p' ip.txt range substitution pattern. $ can be used to specify last line. Space can be used between address and command for clarity. $ sed -n '3,$ s/ [aeiou]//gp' ip.txt sbstttn pttrn smpl. GNU sed.
Using GNU sed
and bash
; no wc
or awk
:
f=input-file sed -n $((RANDOM%($(sed = $f | sed '2~2d' | sed -n '$p')) + 1))p $f
Note: The three sed
s in $(...)
are an inefficient way to fake wc -l < $f
. Maybe there's a better way -- using only sed
of course.
Bash Sed Examples – Linux Hint, Here is a list of examples sed commands with corresponding equivalent commands, where some will be covered in detail below. 1. Print a random line� But by using this, I have to change the value of n manually right? I want that shell to automatically chose another line in random. Not exactly needed to be in random. But some other line. – Anandu M Das Sep 18 '14 at 13:25
How to delete random line from file using Unix command, sed -i "${line}d" myDataFile.txt. You can add these commands in a script file and execute the script,. line=$(� To sample random lines from command line, default shuf from core utils is probably the best choice. It is very easy to use and outperforms others in term of execution time. However, everything depends on a task. For machine learning problems sampling is not a bottleneck and might not require fastest execution.
How to shuffle lines in the file in linux, We can't shuffle line using single sed command, but we will do by combining Out put of this command will always be randomly shuffled lines. Deleting line <N> to <M> inclusive: sed '<N>,<M>d' file user@debian ~ % sed '2,4d' file line1 line5 Printing / deleting lines specifying patterns: Sample file: First line Start printing / deleting here Random line Random line Random line Stop printing / deleting here Last line Printing lines matching <pattern>: sed -n '/<pattern>/p' file
Random line using sed - bash - iOS, I want to select a random line with sed. I know shuf -n and sort -R | head -n does the job, but for shuf you have to install coreutils, and for the sort solution, it isn't� I need a sed line that will take STDM111 and change it to STDM161 the STDM will always be constant but the 3 numbers after will be random, I just need it to always replace the middle number with 6 regardless of what the numbers are.
Comments
- Welcome to SO. Stack Overflow is a question and answer page for professional and enthusiastic programmers. Add your own code to your question. You are expected to show at least the amount of research you have put into solving this question yourself.
- Thank yeah no problem i will edit
- Is the number of lines known?
- doing this with bash and sed would be even less efficient, what's wrong with installing coreutils?
- Why do you need to use sed specifically? There are lots of tools available on just about every system that might be a better fit.
- I like the awk solution and I agree, it seems like it should work but couldn't you just call
rand()
once in the BEGIN section and use a variable instead of calling it once per line of input? Since srand() by default seeds with the current seconds since epoch value it'll produce the same output if you run it twice within 1 sec - if you care you could change that byawk -v seed="$RANDOM" 'BEGIN{srand(seed)...'
- Subsequent calls to
rand()
will produce new values...it's only the same if you re-spawn awk. But you're right! We only need to call it once! That'll speed it up quite a bit. - Its the re-spawning awk case I was talking about. It's unlikely you'd call it twice in 1 second but if you did you'd get the same output (unless you got lucky and crossed a seconds since the epoch change). Personally I usually don't care to handle that but since the OP seems pretty focused on randomness I thought I'd suggest
seed="$RANDOM"
if it matters. - Tried calling rand only once in BEGIN, and the output is heavily skewing to the beginning of the file. Not sure why...
- @EdMorton I did not. Running the scripts for timing now.
- @WilliamPursell - would you mind checking the above and trying the timings to see if you get similar results?
- On my laptop, the 2 run script (running wc -l) actually faster. On only one run of each, 12.864 real time, vs 24.625.
- But it occurs to me that we are perhaps missing the modern miracle of huge memory! For this sort of thing these days it's probably easier to just read the whole file into memory and then select a line randomly in END!