Introducing a powerful extractive summarization tool for YouTube videos

The Yuan requests your support! Our content will now be available free of charge for all registered subscribers, consistent with our mission to make AI a human commons accessible to all. We are therefore requesting donations from our readers so we may continue bringing you insightful reportage of this awesome technology that is sweeping the world. Donate now

By Parisa Naraei | Jan 17, 2024

Image courtesy of and under license from Shutterstock.com

Almost everyone knows about YouTube videos, but far fewer are familiar with the behind-the-scenes tools and tech that make these videos possible. AI researcher and tech expert Parisa Naraei describes in detail how some of these tools work and what users should know about them.

TORONTO, ONTARIO - Videos are a predominant part of the modern internet experience. Hundreds of millions of hours of content are readily available on various online platforms, most notably YouTube. One can easily find content on almost any topic imaginable, but curating it is far more arduous. This project describes the creation of Laconic, a deep learning-based extractive video summarization tool that extracts the essential parts of any video so users can then focus on the most important segments.

Origins, purpose of Laconic

Attention spans are shrinking worldwide - especially among youngsters - even as students are expected to learn a seemingly ever-expanding amount of information.¹ With video platforms like YouTube uploading hundreds of hours of video content per minute, one has access to a plethora of video resources from which to educate oneself. This is often too much of a good thing: With so much content, how can one efficiently locate an object sought? One of the answers comes from automatic video summarization - which is the objective of this project, dubbed ‘Laconic.’

Laconic generates summaries of longer videos by selecting key elements and most interesting materials. Generating a transcript of an original video and using a machine learning model to identify the key elements in the text and create multiple clips of these achieves this. The output is composed of a set of video clips extracted from the original video and edited together into one shorter version of the original. By watching a summary by Laconic, users can digest the contents of a long video in just a few minutes and determine the usefulness of the video to decide whether to watch it in its entirety.

This report begins by defining what Laconic is and what it comprises. Then, it provides a detailed description of all the com

The content herein is subject to copyright by The Yuan. All rights reserved. The content of the services is owned or licensed to The Yuan. Such content from The Yuan may be shared and reprinted but must clearly identify The Yuan as its original source. Content from a third-party copyright holder identified in the copyright notice contained in such third party’s content appearing in The Yuan must likewise be clearly labeled as such.

GET STARTED

- or -