I find myself needing to detect whether a string contains any characters that do not exist in English or French. The string is gathered via HTML input field.

I have found a few examples of how to normalize the string, but this is NOT what I need to do (french accents should not be normalized).

So far, I find myself doing this:

if (str.includes(invalidAccents)){
    //do something
} else {
    //do something else

with a list I put together stored in an array:

var invalidAccents = [
    'á', 'ã', 'ä', 'å', 'ą', 'æ',
    'ĉ', 'ć',
    'í', 'ì', 'ł',
    'ñ', 'ń',
    'ó', 'ò', 'õ', 'ö', 'ø', 'œ', 'ó',
    'ŝ', 'ś',
    'ú', 'ŭ',
    'ÿ', 'ý',
    'ž', 'ź', 'ż',
    'Á', 'Ã', 'Ä', 'Å', 'Ą', 'Æ',
    'Ĉ', 'Ć',
    'Í', 'Ì', 'Ł',
    'Ñ', 'Ń',
    'Ó', 'Ò', 'Õ', 'Ö', 'Ø', 'Ó', 'Œ',
    'Ŝ', 'Ś',
    'Ú', 'Ŭ',
    'Ÿ', 'Ý',
    'Ž', 'Ź', 'Ż',

but this is far from efficient and far from exhaustive.

Does anyone have an alternative solution or at least a place where I can find a complete list of accents to complete what I've got going?

Well, the short answer is: aside using lists, you can use Unicode ranges, but this way requires to iterate strings and check each character separately..

See charCodeAt and such.

If you look at Unicode table, you can see that codes 192-214, 217-221, 224-229 etc correspond to symbols with accents (I'd recommend to check it youself, I'm not sure whether 'ß' is called a symbol with accent).

Iterating to check those can look like:

function containsAccented(str) {
    const ranges = [[192,214], [217,221], ....];
    for(let c of str) {
        let code = c.charCodeAt(0);
        for(let range of ranges)
            if(code >= range[0] && code <= range[1])
                return true;
    return false;

If you want to automate somehow creating the ranges themselves, you have to find a proper definition of "not found in English or French" and whether some service describes those, not sure if there's any.

Create a list of the valids one, this list is kwown and short.

You can find inspiration with the iso-8859-15 charset:

  • It would be a lot simpler to check the string only contains allowed characters rather than check for illegal characters. There are only a limited number of accents allowed in English/French, but an enormous amount of characters in Unicode.
  • fair point. Still, thats over 70 characters in an array. Is there not a more efficient method?
  • Just use a regex ^[A-Za-z]+$
  • @QuentinVeronthis would return false for french accents. This is not acceptable.
  • console.log(containsAccented('hello')); returns undefined <- should be false
  • console.log(containsAccented('héllo')); returns true <- should be false
  • console.log(containsAccented('héllö')); returns true <- should be true
  • @Sweepster, like I said, you have to adjust the ranges by yourself according to your needs. Presumably also invert the logic like JacquesB has suggested
  • I did set up the ranges as needed. Even if I invert the logic, it doesn't explain why I'm getting undefined.