Friday, July 5th, 2024
Advancements in generative AI for text, images, video, sound, and music have enabled us to create impressive content from relatively simple text prompts. However, I’m sometimes asked why text-to-music seems to have lagged behind the others in terms of quality, fidelity, or utility. This disparity is particularly interesting given that music data is often smaller and more easily quantized than image or video data, and AI has been actively used in music production since at least the 1950s.
Of course, several very impressive text-to-music products have recently emerged which provide a valid rebuttal, but none have quite managed to achieve the same level of relative quality or popularity as their counterparts in other media. This perception is probably influenced by several factors, such as the unprecedented demand and pace of innovation with generative text and images, or the challenges of the auditory uncanny valley. But text-to-music also has a unique and more historical obstacle: taxonomy.
Wednesday, June 12th, 2024
I’ve been experimenting with AI content creation. The goal was to create a simple pipeline that could consistently generate characters following a style and theme, splice and edit videos together, and apply light visual effects and color grading.
The result is Cute and Cuddly, which recently passed 100K subscribers on YouTube.
Virtually all of the process is automated, from pulling character ideas from headlines, to rendering, style matching, outpainting, and publishing the videos. A simple heuristic also determines when to pull down underperforming videos. The main area that’s manual is the music, because it’s fun :-)
Wednesday, May 24th, 2023
I’ve been using AI to create art, articles, music, and code. Here are my notes and impressions so far. Let’s see how well this ages.
Large language and generative models are powerful performance enhancers that bridge the gap between natural and formal language, but
they do not diminish the significance of human expertise in tackling novel challenges. The human brain is a pre-trained multimodal neural
network that provides the crucial context and motivation for problem-solving, inasmuch as humans are the ultimate consumers of the output.
In addition to supplying motivation, human expertise reduces AI model complexity. It’s expensive and time consuming to train large models
from scratch, and models are generally restrained by the size and quality of their datasets. There is a finite rate at which new training
data can be generated and sanitized. If you ask ChatGPT (May 2023) for help with the Mojo programming language, it will warn you that no such
language existed as of September 2021, its most recent training snapshot. We can refer ChatGPT to the web and use fine tuning to incrementally
improve GPT’s knowledge, but these approaches have their own trade-offs in terms of cost and effectiveness, and ultimately still rely on the
quality of the dataset.
Tuesday, January 10th, 2023
There’s a pattern with disruptive technologies where a new foundational capability emerges (e.g. the web, smartphone) and it sparks a wave of startups and investment,
followed by a significant market correction. Startups explore new experiences and capabilities enabled by technology while investors compete for their attention with sky
high valuations in a rush to avoid missing out. Reality eventually sets in, and the businesses that formed around a novel experience but lacked a viable business model are
forced to consolidate or fold. Technology that was initially a catalyst for disruption becomes commoditized and broadly incorporated into many products. The winners are few
but the spoils are tremendous.
Saturday, October 16th, 2021
Engineering is fundamentally about invention and discovery – pushing the boundaries of technology in a direction that unlocks new capabilities or experiences.
Innovation is lucrative, and the demand for qualified engineers has historically far outstripped supply. Within virtually every software company you’ll find ambitious product
roadmaps poised to deliver value, but hampered by time-to-market. If AI can significantly shorten the product development cycle, then businesses will need to decide whether to
accelerate their roadmap or maintain their current pace with fewer workers.
Friday, October 23rd, 2015
Over the last decade we've witnessed a transformation in human
interfaces brought on largely by advances in machine learning. Automated phone assistants, voice and text translators, self driving
cars, and much more, are all reliant on various forms of machine learning.
The randomized decision forest is a machine learning algorithm
that's useful for these kinds of tasks. This algorithm gained significant popularity over the last several years and serves as the heart
of the tracking algorithm within the Microsoft's Kinect product.
This article presents a casual introduction to randomized decision forests, follows with a simple example to highlight the
process, and lastly discusses important development considerations. This will not be a rigorous discussion and is intended
for those with a general interest in the subject without any prior experience.