Bertolami.com

Why Generative Music Is Hard

Friday, July 5th, 2024

Advancements in generative AI for text, images, video, sound, and music have enabled us to create impressive content from relatively simple text prompts. However, I’m sometimes asked why text-to-music seems to have lagged behind the others in terms of quality, fidelity, or utility. This disparity is particularly interesting given that music data is often smaller and more easily quantized than image or video data, and AI has been actively used in music production since at least the 1950s.

Of course, several very impressive text-to-music products have recently emerged which provide a valid rebuttal, but none have quite managed to achieve the same level of relative quality or popularity as their counterparts in other media. This perception is probably influenced by several factors, such as the unprecedented demand and pace of innovation with generative text and images, or the challenges of the auditory uncanny valley. But text-to-music also has a unique and more historical obstacle: taxonomy.

Continue reading...

Cute and Cuddly (AI)

Wednesday, June 12th, 2024

I’ve been experimenting with AI content creation. The goal was to create a simple pipeline that could consistently generate characters following a style and theme, splice and edit videos together, and apply light visual effects and color grading.

The result is Cute and Cuddly, which recently passed 100K subscribers on YouTube.

Virtually all of the process is automated, from pulling character ideas from headlines, to rendering, style matching, outpainting, and publishing the videos. A simple heuristic also determines when to pull down underperforming videos. The main area that’s manual is the music, because it’s fun :-)

Continue reading...

The Relationship Between AI and Human Expertise

Wednesday, May 24th, 2023

I’ve been using AI to create art, articles, music, and code. Here are my notes and impressions so far. Let’s see how well this ages.

Large language and generative models are powerful performance enhancers that bridge the gap between natural and formal language, but they do not diminish the significance of human expertise in tackling novel challenges. The human brain is a pre-trained multimodal neural network that provides the crucial context and motivation for problem-solving, inasmuch as humans are the ultimate consumers of the output.

In addition to supplying motivation, human expertise reduces AI model complexity. It’s expensive and time consuming to train large models from scratch, and models are generally restrained by the size and quality of their datasets. There is a finite rate at which new training data can be generated and sanitized. If you ask ChatGPT (May 2023) for help with the Mojo programming language, it will warn you that no such language existed as of September 2021, its most recent training snapshot. We can refer ChatGPT to the web and use fine tuning to incrementally improve GPT’s knowledge, but these approaches have their own trade-offs in terms of cost and effectiveness, and ultimately still rely on the quality of the dataset.

Continue reading...

The Spoils of Artificial Intelligence

Tuesday, January 10th, 2023

There’s a pattern with disruptive technologies where a new foundational capability emerges (e.g. the web, smartphone) and it sparks a wave of startups and investment, followed by a significant market correction. Startups explore new experiences and capabilities enabled by technology while investors compete for their attention with sky high valuations in a rush to avoid missing out. Reality eventually sets in, and the businesses that formed around a novel experience but lacked a viable business model are forced to consolidate or fold. Technology that was initially a catalyst for disruption becomes commoditized and broadly incorporated into many products. The winners are few but the spoils are tremendous.

Continue reading...

Why AI Unleashes Human Innovation

Saturday, October 16th, 2021

Engineering is fundamentally about invention and discovery – pushing the boundaries of technology in a direction that unlocks new capabilities or experiences. Innovation is lucrative, and the demand for qualified engineers has historically far outstripped supply. Within virtually every software company you’ll find ambitious product roadmaps poised to deliver value, but hampered by time-to-market. If AI can significantly shorten the product development cycle, then businesses will need to decide whether to accelerate their roadmap or maintain their current pace with fewer workers.

Continue reading...

Randomized Decision Forests

Friday, October 23rd, 2015

Over the last decade we've witnessed a transformation in human interfaces brought on largely by advances in machine learning. Automated phone assistants, voice and text translators, self driving cars, and much more, are all reliant on various forms of machine learning.

The randomized decision forest is a machine learning algorithm that's useful for these kinds of tasks. This algorithm gained significant popularity over the last several years and serves as the heart of the tracking algorithm within the Microsoft's Kinect product.

This article presents a casual introduction to randomized decision forests, follows with a simple example to highlight the process, and lastly discusses important development considerations. This will not be a rigorous discussion and is intended for those with a general interest in the subject without any prior experience.

Continue reading...

previous page | 1 2 3 4 5 6 | next page