Best way to map words with multiple spellings to a list of key words?

word mapping examples
word map for vocabulary
semantic mapping
word map template
word map generator
exercise 2 using words correctly answers
semantics activities for english language learners
how to teach spelling in a fun way

I have a pile of ngrams of variable spelling, and I want to map each ngram to it's best match word out of a list of known desired outputs.

For example, ['mob', 'MOB', 'mobi', 'MOBIL', 'Mobile] maps to a desired output of 'mobile'.

Each input from ['desk', 'Desk+Tab', 'Tab+Desk', 'Desktop', 'dsk'] maps to a desired output of 'desktop'

I have about 30 of these 'output' words, and a pile of about a few million ngrams (much fewer unique).

My current best idea was to get all unique ngrams, copy and paste that into Excel and manually build a mapping table, took too long and isn't extensible. Second idea was something with fuzzy (fuzzy-wuzzy) matching but it didn't match well.

I'm not experienced in Natural Language terminology or libraries at all so I can't find an answer to how this might be done better, faster and more extensibly when the number of unique ngrams increases or 'output' words change.

Any advice?

The classical approach would be, to build a "Feature Matrix" for each ngram. Each word maps to an Output which is a categorical value between 0 and 29 (one for each class)

Features can for example be the cosine similarity given by fuzzy wuzzy but typically you need many more. Then you train a classification model based on the created features. This model can typically be anything, a neural network, a boosted tree, etc.

Phonics from A to Z: A Practical Guide, Children practice many of the skills important to reading when they write (Clay, writing and receive instruction in mapping sounds to spellings, the better they can They dictated a list of words for children to spell, then gave each word a score (name) 3 points: The word name is spelled in a way that captures the entire  Word list activities: Geography - Map Skills. Learn about the words: Geography - Map Skills using Look, Say, Cover, Write, Check, spelling games, spelling tests and printable activities.

Since you only have about 30 classes you could just determine a distance metric, like say Levenshtein distance in the case of single words, and assign each ngram the class to which it is nearest.

That way no need to store a whole lotta ngrams.

(If the ngram is the whole array, maybe average the distance across each element in the array).

Connecting Word Meanings Through Semantic Mapping, With direct instruction and repeated practice, struggling students will find that using semantic maps is a very good way of expanding their vocabulary. A spelling test is an evaluation of a student’s ability to spell words correctly. Spelling tests are usually given in school during language arts classes, to see how well students have learned the most recent spelling lessons.

The idea is to use a prefix tree to build a dictionary that maps word from your list to its biggest unique super-string. once we build this, for words whose superstring are same as the word itself, we try to do fuzzy match of the closest words from the list and return its superstring. so "dsk" would find "des" or "desk" to be the closest and we extract its superstring.

import org.apache.commons.collections4.Trie;
import org.apache.commons.collections4.trie.PatriciaTrie;

import java.util.*;
import java.util.SortedMap;

public class Test {

    static Trie trie = new PatriciaTrie<>();

    public static int cost(char a, char b) {
        return a == b ? 0 : 1;
    }

    public static int min(int... numbers) {
        return Arrays.stream(numbers).min().orElse(Integer.MAX_VALUE);
    }

    // this function taken from https://www.baeldung.com/java-levenshtein-distance
    static int editDistance(String x, String y) {
        int[][] dp = new int[x.length() + 1][y.length() + 1];

        for (int i = 0; i <= x.length(); i++) {
            for (int j = 0; j <= y.length(); j++) {
                if (i == 0) {
                    dp[i][j] = j;
                } else if (j == 0) {
                    dp[i][j] = i;
                } else {
                    dp[i][j] = min(dp[i - 1][j - 1] + cost(x.charAt(i - 1), y.charAt(j - 1)), dp[i - 1][j] + 1,
                            dp[i][j - 1] + 1);
                }
            }
        }

        return dp[x.length()][y.length()];
    }

    /*
     * custom dictionary that map word to its biggest super string.
     *  mob -> mobile,  mobi -> mobile,  desk -> desktop
     */
    static void initMyDictionary(List<String> myList) {

        for (String word : myList) {
            trie.put(word.toLowerCase(), "0"); // putting 0 as default
        }

        for (String word : myList) {

            SortedMap<String, String> prefixMap = trie.prefixMap(word);

            String bigSuperString = "";

            for (Map.Entry<String, String> m : prefixMap.entrySet()) {
                int max = 0;
                if (m.getKey().length() > max) {
                    max = m.getKey().length();
                    bigSuperString = m.getKey();
                }
                // System.out.println(bigString + " big");
            }

            for (Map.Entry<String, String> m : prefixMap.entrySet()) {
                m.setValue(bigSuperString);
                // System.out.println(m.getKey() + " - " + m.getValue());
            }
        }

    }

    /*
     * find closest words for a given String.
     */
    static List<String> findClosest(String q, List<String> myList) {

        List<String> res = new ArrayList();
        for (String w : myList) {
            if (editDistance(q, w) == 1) // just one char apart edit distance
                res.add(w);
        }
        return res;

    }

    public static void main(String[] args) {

        List<String> myList = new ArrayList<>(
                Arrays.asList("mob", "MOB", "mobi", "mobil", "mobile", "desk", "desktop", "dsk"));

        initMyDictionary(myList); // build my custom dictionary using prefix tree

        // String query = "mob"
        // String query = "mobile";
        // String query = "des";
        String query = "dsk";

        // if the word and its superstring are the same, then we try to find the closest
        // words from list and lookup the superstring in the dictionary.
        if (query.equals(trie.get(query.toLowerCase()))) {
            for (String w : findClosest(query, myList)) { // try to resolve the ambiguity here if there are multiple closest words
                System.out.println(query + " -fuzzy maps to-> " + trie.get(w));
            }

        } else {
            System.out.println(query + " -maps to-> " + trie.get(query));
        }

    }

}

Word Maps | Classroom Strategies, A word map is a visual organizer that promotes vocabulary development. Using a graphic organizer, students think about terms or concepts in several ways. Enhancing students' vocabulary is important to developing their reading. lessons, and activities designed to help young children learn how to read and read better. I have a dataset (survey) and a column of birth_country, where people have written their country of birth. An example of it: 1 america 2 usa 3 american 4 us of a 5 united states

Curriculum Focus, Display a large map of the Arctic Circle and explain to children how, for many centuries, the North West Passage-a route linking the Atlantic and Pacific Oceans at the top of the world. Again, it may be helpful to provide some key words or assist children with accurate spelling especially concerning the names of places. Students must choose the spelling word. This deck is self-correcting and will alert students of wrong answers. The deck is randomized so students can play multiple times. Each word has sound! ️No Prep ️No Printing ️Self-Correcting ️Just Click & Assign. This product is 1 of 5 activities I use each week with each spelling pattern:

How to teach spelling words so they stick, Do you know how to teach spelling words in a creative and effective way that helps As spelling involves sound-letter mapping, some words can be spelled by ear. Many early spelling words come from the Dolch list, a selection of terms that make up Nonetheless, learning good spelling habits from the start is important. Introduction: Technology has connected us like never before and the world has been converted to a global village where a person can interact with anyone from one corner of the world to another. If you do not believe us, then just have a look at these facts yourself.

Chapter 2. Working with Words: Which Word Is Right? – Writing for , Familiarize yourself with the following list of commonly confused words. Recognizing I studied maps of the city to know where to rent a new apartment. The best way to master new words is to understand the key spelling rules. Keep in  5th Grade Spelling List E-5 The 5th grade spelling words presented on this page all have r-controlled vowels. List: ferment, guitar, hangar, scorch, explorer, churn, pier, thirsty, garden, computer, charge, amaze, evict, charcoal, quarter, consumer, error, scarce, porch, orchard, nerve, concrete, heartfelt, spider, and clerk.

Comments
  • To clarify, is each ngram an array like ` ['mob', 'MOB', 'mobi', 'MOBIL', 'Mobile']` or just the word 'Mobile'? Also, are you looking to save on memory or gain in speed?