Loading...

Course Description

500 million tweets, 6,300 scientific publications, and 800 Wikipedia articles are published daily.
The ubiquity of language corpora represents one of the most fundamental opportunities in
machine learning - the ability to connect computing systems more directly to human interaction.
In this course, you will learn how to transform, train, and apply models using unstructured
natural language data.


By the end of this course, you should be able to answer questions like: How does the
Washington Post recommend related articles? How does Facebook censor hate speech? How
does Alexa know what you mean?

The target audience is industry-minded data scientists who seek to derive business insights
from natural language data. This course will be a good fit for students who have successfully
completed the Data Science Certificate, and, more generally, data science professionals who
are looking to study advanced topics as part of their continuing professional education.

Course Outline

The course of study will be broken into 4, 3-hour sessions that are composed of 1.5 hours of
instruction, a half an hour of demonstration, a half-hour workshop, and two 15 minute breaks.


1. Session 1: Introduction to NLP  


a. Unstructured data and machine learning
b. Text corpora and organization: NLTK
c. Preprocessing: segmentation, tokenization, parts of speech
d. Named entity recognition


2. Session 2: Document Classification 


a. Vectorization: one-hot encoding, bag-of-words, TF-IDF
b. Document Classification: Naive Bayes
c. Advanced Classifiers: Logistic Regression (maximum entropy), Gradient Boosting
d. Adding context: n-grams, kNNs and distance metrics


3. Session 3: Document Similarity  


a. RNNs/LSTMs
b. Word embeddings and word similarity (word2vec)
c. Document and sentence similarity (doc2vec)
d. Document clustering and semantic similarity search
e. LDA Topic Modeling


4. Session 4: Chatbot Foundations  


a. Underpinning in document similarity
b. Utterances, intents, and slots
c. Chat framework and simple implementation

Course Objectives

Upon successful completion of this course, students will know how to:
? Prepare and preprocess text data
? Build classification models on free text documents
? Generate numerical “embedded” representations of text data
? Leverage embeddings for document and word similarity search
? Combine natural language processing methods to create a chatbot
Loading...
Thank you for your interest in this course. Unfortunately, the course you have selected is currently not open for enrollment. Please complete a Course Inquiry so that we may promptly notify you when enrollment opens.
Required fields are indicated by .