Comparing “lecture search” and Youtube

From a discussion on Hacker news:

Youtube does have automatic transcription for videos. It’s not too hard to link this to a topic hierarchy (maybe they already do this). It seems like a hard problem at their scale, since unlike Spotify, the list of genres isn’t knowable.

I’ve been building a search engine for lectures as a research project. For a small list of videos I find that browsing topic taxonomy is really nice compared to the recommenders that try to guess your intent.

There are commercial systems for automatically tagging the text (e.g. Watson) which hierarchies which don’t go into niche areas – e.g. the Watson taxonomy tagger does 1,000 tags.

For more niche topics, I’ve explored Watson’s entity recognition system, e.g. to recognize the names of diseases. The advantage is it picks up terms it hasn’t seen- The problem is you can only identify entities that someone has trained a system to recognize.

The UI challenges are interesting as well. If spotified identified 100 genres that interested me, they could pick any arbitrary subset of playlists and I’d be pretty happy. If I used youtube to get home repair videos, and then they showed me videos about repairing parts of my house that aren’t broken, it’d get pretty irritating.

2 Replies to “Comparing “lecture search” and Youtube”

  1. So would you say that the reason the Netflix recommendation system is so much better is because they hand curate the content with licensing and whatnot? While the method for getting content is different, it’s the same style of review–the star ratings is similar to the Youtube likes, plus they both offer the ability to add things to “preferred” lists.

    1. Netflix and Spotify start from better material because people are getting paid in the process. Netflix is obviously also hand-picking/negotiating agreements, and you can do that when the number of movies/shows is “small” (100s, 1000s, 10000s). Obviously people uploading to youtube can put up whatever they want and so it’s harder to define what “good” videos are without context. You might find a video to be “good” today if it was about Trump, that was utterly terrible and unwatchable when it was made, for instance.

      You can see that Netflix struggles a bit because the content is so valuable, and what they don’t control can get yanked away. Compare this to musicians and spotify – there are way more musicians than people who will pay $10 for their CD so I think Spotify has a huge power advantage, more like Comcast / FIOS do over TV subscriptions. I couldn’t build a search engine where I charged for videos, because I don’t have any negotiating leverage. There are some good examples of how badly this goes – crackle,, etc.

      Movies are also like music, in that there are somewhat well defined categories and lengths. You can say “I’m interested in Action movies”, where on Youtube you might say “I’m interested in Age of Empires videos” or “I’m interested in videos that help me fix my sink” – the youtube interests are like an ever expanding taxonomy so it’s harder to teach a computer.

      I also think that you are more motivated to star things accurately on Netflix. On youtube you can see that people are just being jerks (downvoting competitors or some hated youtube start, stuff they disagree with, etc) so it’s hard to know if the up/down rating has any meaning at all.

Leave a Reply

Your email address will not be published. Required fields are marked *