Running word2vec in a remote Jupyter notebook

Recent years have produced many promising data sets and algorithms for machine learning. New techniques like deep learning require significant computational power, often beyond what you may have on your desk. These bleeding edge tools frequently have specific dependencies, and can require significant effort to maintain a development environment.

Full-Text Search within Closed Captions

Youtube automatically generates closed captions for videos. crawls these, and allows you to search for a phrase within a video and start playback where the phrase occurs.

Machine-generated transcriptions include timestamps, but also many transcription errors. If we can obtain captions and a corrected transcript for a speech, these can be aligned using the words that do match. In the spots that differ, we can update the language with the corrected wording from the transcript.