Corpus of 21st Century Scots Texts - Levenshtein

A Corpus of 21st Century Scots Texts

Intro a b c d e f g h i j k l m n o p q r s t u v w x y z Texts Writers Statistics Top200 Search Compare

Levenshtein Distance

- basic concord - pre-sorted concord - post-sorted concord - map and chronology - chronogrid - fine-grain concord -

Similar words to hindi in Corpus

Levenshtein	Double Levenshtein	SoundEx	MetaPhone	Manually curated
hindi (0) - 2 freq hindu (1) - 8 freq hinds (1) - 3 freq hind (1) - 14 freq windi (1) - 2 freq hindit (1) - 1 freq kinda (2) - 175 freq bindin (2) - 6 freq hide (2) - 189 freq midi (2) - 1 freq mindo (2) - 1 freq rinds (2) - 1 freq hund (2) - 1 freq handie (2) - 3 freq dinda (2) - 1 freq inti (2) - 218 freq binds (2) - 7 freq handis (2) - 1 freq bind (2) - 20 freq mindin (2) - 156 freq hands (2) - 175 freq mind (2) - 2334 freq hende (2) - 1 freq windy (2) - 36 freq finds (2) - 39 freq	hindi (0) - 2 freq hind (1) - 14 freq hindu (1) - 8 freq hnd (2) - 3 freq hende (2) - 1 freq hynd (2) - 1 freq hand (2) - 319 freq ahind (2) - 11 freq hunda (2) - 1 freq hond (2) - 3 freq handie (2) - 3 freq hunde (2) - 1 freq handy (2) - 56 freq hindit (2) - 1 freq windi (2) - 2 freq hinds (2) - 3 freq hund (2) - 1 freq sandi (3) - 5 freq linda (3) - 53 freq hink (3) - 436 freq eind (3) - 1 freq haand (3) - 104 freq bindie (3) - 1 freq hanoi (3) - 1 freq inde (3) - 4 freq	SoundEx code - H530 haund - 384 freq hummed - 10 freq handy - 56 freq haundie - 1 freq haunt - 28 freq hin't - 1 freq haunit - 2 freq hint - 78 freq hoond - 2 freq hand - 319 freq hant - 6 freq hunt - 62 freq honed - 5 freq him-it - 1 freq hent - 6 freq haundy - 18 freq hannit - 20 freq hantie - 4 freq hamewith - 9 freq hained - 27 freq honey-dew - 1 freq haunmaid - 1 freq handee - 4 freq handie - 3 freq hind - 14 freq haun't - 4 freq hemmed - 3 freq 'haund - 3 freq hem't - 1 freq hannet - 2 freq hummit - 1 freq haand - 104 freq hindu - 8 freq him-hit - 1 freq honeyed - 1 freq hunda - 1 freq haun-med - 1 freq hainit - 6 freq him-id - 1 freq houmit - 1 freq hunde - 1 freq hinnied - 1 freq hamada - 1 freq haint - 3 freq haunnit - 2 freq hende - 1 freq heymouthe - 1 freq hynd - 1 freq ��hunty - 1 freq hound - 11 freq ��hand - 1 freq hnd - 3 freq hindi - 2 freq hond - 3 freq henwudie - 1 freq henwuddie - 3 freq haun-made - 1 freq hammett - 1 freq heymooth - 1 freq huntie - 1 freq heynd - 1 freq hmt - 1 freq hand - 1 freq hnuty - 1 freq hmdt - 1 freq honeat - 1 freq hund - 1 freq hnid - 1 freq hunt' - 1 freq handw - 1 freq	MetaPhone code - HNT haund - 384 freq handy - 56 freq haundie - 1 freq haunt - 28 freq hin't - 1 freq haunit - 2 freq hint - 78 freq hoond - 2 freq hand - 319 freq hant - 6 freq hunt - 62 freq honed - 5 freq hent - 6 freq haundy - 18 freq hannit - 20 freq hantie - 4 freq hained - 27 freq honey-dew - 1 freq handee - 4 freq handie - 3 freq hind - 14 freq haun't - 4 freq 'haund - 3 freq hannet - 2 freq haand - 104 freq hindu - 8 freq hunda - 1 freq hainit - 6 freq hunde - 1 freq hinnied - 1 freq haint - 3 freq haunnit - 2 freq hende - 1 freq ��hunty - 1 freq hound - 11 freq ��hand - 1 freq hindi - 2 freq hond - 3 freq huntie - 1 freq heynd - 1 freq hand - 1 freq honeat - 1 freq hund - 1 freq hunt' - 1 freq handw - 1 freq	HINDI
Time to execute Levenshtein function - 0.609313 milliseconds The Levenshtein distance is the number of characters you have to replace, insert or delete to transform one word into another, its useful for detecting typos and alternative spellings	Time to execute Double Levenshtein function - 1.290787 milliseconds In a stroke of genius, this runs the Levenshtein function twice, once without vowels and adds the distance together, giving double weight to consonants.	Time to execute SoundEx function - 0.114351 milliseconds Soundex is a phonetic algorithm for indexing names by sound, as pronounced in English. The goal is for homophones to be encoded to the same representation so that they can be matched despite minor differences in spelling.	Time to execute MetaPhone function - 0.287568 milliseconds Metaphone is a phonetic algorithm, published by Lawrence Philips in 1990, for indexing words by their English pronunciation.[1] It fundamentally improves on the Soundex algorithm by using information about variations and inconsistencies in English spelling and pronunciation to produce a more accurate encoding, which does a better job of matching words and names which sound similar.	Time to execute Manually curated function - 0.000879 milliseconds Manual Curation uses a lookup table / lexicon which has been created by hand which links words to their lemmas, and includes obvious typos and spelling variations. Not all words are covered.

Web Analytics