Exercises

  • Add a function that finds whole words repeated next to other. For example, the the should be caught but not his history.
    • The md_files/sample.md example shown in this project already has one such issue.
  • Improve the spell_check() function to also split entries like with/without. Currently it only splits on whitespace characters.
  • The typos.py program hard codes the input directories and output filename. Modify the program to accept such data as CLI arguments. These arguments should also have a default value to make it easier to execute the program for similarly structured projects.
    • You can also use packages like Gooey to create a GUI from this CLI program.
  • Change the typos.py program so that it works for both plain text and Markdown input files based on filename extensions.

Further Reading

  • Spell checkers and related:
    • wikipedia: Spell checker
    • TextBlob — Spelling correction, splitting text into words and sentences, sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more
    • spylls — Pure Python spell-checker, (almost) full port of Hunspell
    • languagetool — Open Source proofreading software for English and other languages
    • proselint — linter for English prose
  • Python-Markdown — A Python implementation of John Gruber's Markdown with Extension support
  • Python re(gex)? — my ebook on Regular Expressions