Course Description

Applying machine learning techniques to language has resulted in some impressive gains over the last few decades; tasks like spam detection, entity recognition, and information extraction have become increasingly automated thanks to advances in natural language processing and text analysis. Other language modeling tasks remain challenging: question-and-answer systems, automated summarization, and until fairly recently, machine translation were among the tasks that were still considered “unsolved” by the machine learning community.

Recent advances in research, including word2vec, long short-term memory networks, and transfer learning have dramatically improved the potential for automated translation between languages. And yet, human contextual understandings are still difficult to fully encode in language data, and machine translations often suffer from both contextual and semantic errors (such as missing sarcasm or incorrectly applying honorifics). 

In this course, we will explore the state of the art in open source language modeling tools, and investigate their efficacies and weaknesses for a range of machine translation tasks.

This course is part of the Data Science and Machine Learning tracks of the Advanced Data Science Certificate.

Course Objectives

Upon successful completion of the course, students will be able to:

  • Engage modern machine translation word embeddings such as BERT and ELMo.

  • Compare and contrast neural architectures for machine translation such as auto encoder-decoder networks of RNNs and LSTMs.

  • Use transfer learning solutions to quickly train initial language translation models.

  • Build an intuition around the types of algorithms and machine learning techniques that are most appropriate for natural language translation.

  • Understand the nuances and sensitivities around human language translation.

  • Identify and assess data sources for use in training machine learning models for translation.

  • Evaluate the effectiveness and efficacies of language models using statistical and user-evaluation methods.


Enrollment in this course is open to all students and applies credit toward the Advanced Data Science Certificate Data Science and Machine Learning tracks. 

Applies Towards the Following Certificates

Thank you for your interest in this course. Unfortunately, the course you have selected is currently not open for enrollment. Please complete a Course Inquiry so that we may promptly notify you when enrollment opens.
Required fields are indicated by .