Mining dictionary for sed search strings

Mining dictionary for sed search strings

grep
awk
grep command
sed insert text after match
sed insert line at end of file
find and grep command in linux with example
linux commands for data analysis
sed append to end of line after match

For fun I was mining the dictionary for words that sed could use to modify strings. Example:

sed settee    <<< better
sed statement <<< dated

Outputs:

beer demented

These sed swords must be at least 5 letters long, and begin with s, then another letter, which can appear only 3 times, with at least one other letter between the first and second instances, and with the third instance as the final letter.

I used sed to generate a word list, and it seems to work:

d=/usr/share/dict/american-english
sed -n '/^s\([a-z]\)\(.*\1\)\{2\}$/{
            /^s\([a-z]\)\(.*\1\)\{3\}$/!{/^s\([a-z]\)\1/!p}}' $d | 
xargs echo

Output:

sanatoria sanitaria sarcomata savanna secede secrete secretive segregate selective selvedge sentence sentience sentimentalize septette sequence serenade serene serpentine serviceable serviette settee severance severe sewerage sextette stateliest statement stealthiest stoutest straightest straightjacket straitjacket strategist streetlight stretchiest strictest structuralist

But that sed code runs three passes through each line, which seems excessively long and kludgy. How can that code be simplified, while still outputting the same word list?

grep or awk answers would also be OK.


awk to the rescue!

code is cleaner with awk and reads as the spec: split the word based on the second char, three instances of the char will split the word into 4 segments; 2nd one should have at least one char and the last one should be empty.

$  awk '/^s/{n=split($1,a,substr($1,2,1)); 
             if(n==4 && length(a[2])>0 && a[4]=="") print}' /usr/share/dict/american-english | xargs

sanatoria sanitaria sarcomata savanna secede secrete secretive segregate selective selvedge sentence sentience sentimentalize septette sequence serenade serene serpentine serviceable serviette settee severance severe sewerage sextette stateliest statement stealthiest stoutest straightest straightjacket straitjacket strategist streetlight stretchiest strictest structuralist

Text mining on the command line, You may copy it, give it away or re-use it under the terms So I would delete all the unnecessary lines from the file using 'sed' as below: sort all the words first, then I use 'uniq' command with '-c' flag to find out the frequency  Sed is known as Stream Editor, a tool is designed to find and replace text in the text file. It’s very useful like you need to change any word in multiple files and you don’t want to do it manually because of its time killing. Replace Text in Same Single. The following command will search for all string “Hello” in /opt/docs/welcome.txt file.


very cool idea. I think you're more restrictive than necessary

sed -nE '/^s(.)[^\1]+\1[^\1]*\1g?$/p'

seems to work fine. It generated 518 words for me. I only have /usr/share/dict/words dictionary file though.

sabadilla sabakha sabana sabbatia sabdariffa sacatra saccharilla saccharogalactorrhea saccharorrhea saccharosuria saccharuria sacralgia sacraria sacrcraria sacrocoxalgia sadhaka sadhana sahara saintpaulia salaceta salada salagrama salamandra saltarella salutatoria ... stuntist subbureau sucuriu sucuruju sulphurou surucucu syenite-porphyry symphyseotomy symphysiotomy symphysotomy symphysy symphytically syndactyly synonymity synonymously synonymy syzygetically syzygy

an interesting find is

$ sed snow-nodding <<< now-or-never
noddior-never

Mining the Talk: Unlocking the Business Value in Unstructured , The set of utilities (including the editor sed and the filter grep) provided by Unix kind of text sequence that can be represented as a dictionary of string patterns. A. Sed is a stream editor. A stream editor is used to perform basic text transformations on an input stream – a file or input from a pipeline. Adblock detected 😱 My website is made possible … Continue reading "Sed Find and Display Text Between Two Strings or Words"


A speedy pcregrep method, (.025 seconds user time):

d=/usr/share/dict/american-english
pcregrep '^s(.)((?!\1).)+\1((?!\1).)*\1$' $d | xargs echo

Output:

sanatoria sanitaria sarcomata savanna secede secrete secretive segregate selective selvedge sentence sentience sentimentalize septette sequence serenade serene serpentine serviceable serviette settee severance severe sewerage sextette stateliest statement stealthiest stoutest straightest straightjacket straitjacket strategist streetlight stretchiest strictest structuralist


Code inspired by: Regex: Match everything except backreference

String replacement using a dictionary, The 1st sed turns dictionary.txt into a script-file (editing commands, one per You can easily add or change any of the regexes or replacement strings. keys %replace ); $search = qr/($search)/; print "Using match regex of:  sed Find and Replace Text Between Two Strings or Words I am looking for a sed in which I can recognize all of the text in between two indicators and then replace it with a place holder. For instance, the 1st indicator is a list of words


15 Practical Grep Command Examples In Linux / UNIX, Checking for full words, not for sub-strings using grep -w. If you want to search for a word, and to avoid it to match the substrings use -w option. Pattern Matching and Permuted Term Indexing with Command Line Tools in Linux is a fast way to search for fixed strings a dictionary for text mining is to get


Unix Sed Tutorial: Append, Insert, Replace, and Count File Lines, This article is part of the on going Unix sed command tutorial series. In this article let us review how to append, insert, replace a line in a file and how to get line numbers of Be careful to first test the strings, because on other computers, it can work different from mine. All rights reserved | Terms of Service. The first sed expression escapes the $ which is a regex metacharacter. The second extracts just the variable name, then we use indirection to get the value in our current shell and use it in the sed expression. Edit. Rather than rewriting the file so many times, it's probably more efficient to do it like this, building the arguments list for sed:


Text Mining on the Command Line, The first part of the tutorial, I show how bash commands like 'grep,' 'sed,' 'tr,' You may copy it, give it away or re-use it under the terms to sort all the words first, then I use 'uniq' command with '-c' flag to find out the frequency of each word. Q. I have a word abc and cde and I want to replace them both with xyz. How can I do that in Shell scripting? We can not do this without using sed command. Below are the ways you can do that. Below example will try to replace abc or cde with xyz. I used …