Finding typos

In this project, you'll learn how to compare words against a dictionary to find potential typos. Two types of input format will be discussed — plain text and Markdown.

Project summary

  • Save dictionary words as a set data type for fast comparison
  • Split input text and compare words against the dictionary set
  • Scrub punctuation characters from input words and ignore case to reduce false mismatches
  • Extract words from a Markdown file after removing code blocks, inline code and hyperlinks
  • Handle multiple word files and recursively process all Markdown files from a given path

The following modules and concepts will be utilized in this project:

Real world influence

I started this project to help myself as a beta/gamma reader for fantasy books from the Mage Errant and The Legends of the First Empire series.

While the number of false mismatches ran into hundreds of entries, the time spent crawling through them was well worth it. I found repeated words, hard to spot typos in character names, etc. Creating reference files with series specific names and words helped reduce the mismatches for sequels.

I used the project for the Markdown files of this ebook too. Found typos like entried, accomodated, tast and reponsible.