In this project, you'll learn how to compare words against a dictionary to find potential typos. Two types of input format will be discussed — plain text and Markdown.
- Save dictionary words as a
setdata type for fast comparison
- Split input text and compare words against the dictionary set
- Scrub punctuation characters from input words and ignore case to reduce false mismatches
- Extract words from a Markdown file after removing code blocks, inline code and hyperlinks
- Handle multiple word files and recursively process all Markdown files from a given path
The following modules and concepts will be utilized in this project:
While the number of false mismatches ran into hundreds of entries, the time spent crawling through them was well worth it. I found repeated words, hard to spot typos in character names, etc. Creating reference files with series specific names and words helped reduce the mismatches for sequels.
I used the project for the Markdown files of this ebook too. Found typos like