Loading...
Thumbnail Image
Publication

Improving the temporal consistency, user controllability, and semantic correctness of automatic video colourisation

Ward, Rory
Citation
Abstract
Automatic video colourisation systems that render black-and-white films in colour have recently gained attention due to their ability to adapt old movies to today’s entertainment industry. Three challenges these systems face are explored in this thesis: temporal consistency, controllability, and correctness. While current research predominantly focuses on image-based colourisation, the domain of video-based colourisation remains relatively unexplored. Many existing video colourisation techniques operate sequentially, overlooking the critical aspect of temporal coherence. This approach can result in inconsistencies across frames, leading to undesirable effects such as abrupt colour transitions, commonly known as flickering. This thesis proposes two methods to alleviate this temporal inconsistency. The first uses exemplar-based automatic video colourisation, where the exemplar is chosen from a collection of inconsistently colourised frames. The second combines the generative capabilities of a finetuned latent diffusion model with an autoregressive conditioning mechanism. In addition to the need for temporal consistency, there is also a requirement for controllability, as colourisation is an under-constrained problem with multiple valid ways to add colour to a black-and-white film. Although the user will have some preference for the colourisation that should be produced, many existing systems do not allow for intuitive user interaction. This thesis proposes two solutions to the inherent uncontrollability: an exemplar-guided automatic video colouriser that leverages facial recognition technology and a text-guided automatic video colourisation framework. Aside from temporal consistency and controllability, semantic correctness is also a strong requirement of automatic video colourisers. As automatic video colourisation frameworks are often artificial intelligence systems, they are prone to hallucination, which in this case manifests as unrealistic colour outputs. This thesis addresses this issue by increasing the semantic correctness of these systems. The prospects, scenarios, and challenges of such an approach are initially explored. Then, a concrete application of this methodology is proposed, which leverages external knowledge in automatic text-guided video colourisation. This framework is further expanded to include semantic similarity search and probabilistic colour knowledge. In addressing these three challenges, this thesis aims to enhance the standard of automatic video colourisation as a whole.
Publisher
University of Galway
Publisher DOI
Rights
CC BY-NC-ND