On Shared Musical Agency
Interviewing marts~ to learn about his story and artistic vision.
I met him by chance over breakfast in Amsterdam.
We were both there for the AI Song Contest, and he sat down with us after a shy hello. It would have been easy to finish breakfast, exchange polite goodbyes, and go our separate ways. After all, it was early in the morning and our brains were still booting up.
But after complaining about the cold weather and the mediocre buffet coffee, we began a conversation about our shared passion: AI music.
We discussed Pure Data, generative music, and online communities. He described sharing agency and controlling generative AI processes with a constellation of parameters. His unique vocabulary revealed that his background was different from mine. He is neither from academia nor industry, but from Berlin and the Internet.
His name is Martin Heinze (marts~), and I’m glad our paths crossed that morning.
A simple breakfast became an unexpected reminder that ordinary moments often bring the most interesting encounters.
Today, I’m sharing a (written) interview with Martin, starting with the breakfast we had when we first met.
Q: Do you remember what you ate the day we met? I remember having a delicious apple pie.
I had Croissant with jam, Pain au chocolate and then moved on to Musli but got so excited talking to you guys I forgot to finish up.
Q: For those who don’t know you, can you tell us about your artistic background?
I graduated at the University of Media Arts and Design (HfG) Karlsruhe which is closely entangled with ZKM, one of the key institutions for media art and technology in Germany. During that time I naturally came into contact with a lot of concepts, works, artistic practices and of course people. All of this had a lasting impression, even though at the time I found it quite difficult to transform it into something practical.
At the same time I started producing electronic music in the periphery of the Drum & Bass continuum, had releases on various music labels, played DJ gigs internationally and launched my own record label Weevil Neighbourhood.
Over the years I found it more and more difficult to enjoy writing music in my established routines. This was when earlier influences and maybe implicit knowledge from my time in Karlsruhe came into play. I began diving into concepts of generative music and algorithmic composition, changed my technical setup, and basically transitioned from production to development. This happened pretty much at the same time that early practical examples of generative AI in music (not only in the audio domain) started to surface, e.g., Dadabots’ Relentless Doppelganger YouTube livestream.
At that point I had written and produced various albums, EPs, singles and remixes in a traditional way that makes for a decent audio data set to train neural nets with - and now there were deep learning algorithms that would allow building novel kinds of instruments based on audio data for use in real-time settings. So it became one of my main practical research topics to create these instruments based on my own music and learn how to use them to spawn new audio items.
Q: In your work, you put a strong emphasis on ethics. Can you tell us more about your stands and how those translate to your artistic practice? I’m particularly interested in your actions to compensate the environmental footprint, as you are the first AI musician I’m hearing to talk about that.
I wouldn’t say that my artistic practice is primarily guided by ethics, but I generally feel inclined to reflect on privilege and responsibility. Privilege comes in various forms, sometimes it’s structurally determined, sometimes arbitrary; it can also be temporary. Acting responsibly considering your own privileges just seems reasonable for the long run, be it in regards to the creative work of other individuals when curating your training datasets or the environment we all live in.
The yearly environmental footprint of one person living in Germany is currently more than 10 tons of CO2 equivalents while Paris 1.5°C targets required less than 2 tons per person globally by 2030. I find that enough reason to aim at reducing that footprint no matter whether you have trained a neural net in your life or not. In the EU, the most effective regulatory instrument for additionally compensating for your environmental footprint at the moment is buying (and retiring) fractions of EU Allowances for CO2 emissions, which I chose to do.
Q: AI should be used to create art in ways that are both ethical and compelling. Ethics are important, but being compelling is just as important. What’s your compelling reason to use AI in your artistic practice?
Expanding creativity. Working with neural nets has radically changed my practice and perspective on music creation from writing pieces to “finding” them in or “spawning” them from the reproductive amalgamations of generative AI. This is primarily an artist centric view. But I think it is also noticeable in the results and might change listening expectations over time. For me it also leads away from a tradition of creating recorded artifacts more towards a spontaneous, participatory and ephemeral music creation and consumption practice. My optimistic intuition is that this might actually result in more humans prosuming music rather than machines flooding services with arbitrarily generated music for algorithms to consume.
Q: Could you share an example where AI expanded your creativity? Nao Tokui wrote that once he got goosebumps after his AI DJ picked two tracks that he couldn’t imagine working together. Did you find yourself in similar situations?
Absolutely. This happens to me each time I load a newly trained neural audio model for the first time after a successful training run. First, I usually just feed it some random noise and let it run for a while, listening, observing, learning about its character and the sound information it can reproduce.
That stimulation of creativity for me comes by finding a sweet spot in sharing agency when making music with neural audio models but not fully giving it up. I learned that when I embrace a certain lack of control I’m used to from deterministic production routines, I’m being rewarded with surprising results I wouldn’t have aimed for in the first place. And you could say that sharing agency already starts in the training process, where you as a human agent determine the training data and the general set up in which the training itself is being carried out. However, what exactly happens in the training process is out of your hands.
Q: You also release your tools open source. Can you tell us more about your technical setup?
I train models on cloud services with a set of Jupyter notebooks i’ve created.
For composition, I’m using Pure Data almost exclusively these days. I’ve developed a set of custom abstractions I use to build frameworks for semi-generative or algorithmic use cases. These are basic interfaces to the abstract and obscure latent space of the model types I’m employing (RAVE, vschaos2, AFTER, MSPrior). I inject information that mimicks latent embeddings and temporarily conserve parameter constellations that lead to musically interesting results - it’s a bit like tuning and eventually playing an instrument.
I maintain a YouTube channel where I showcase patches and frameworks that are built with these abstractions and how I engage into improvisation with the models. I call this “Latent Jamming” because it works a bit like a jam session with multiple agents inside latent space.
Q: Could you share some videos with us and walk us through them?
Certainly. On a baseline technical level, it all revolves around finding the right parameter constellations on signal level while operating in the latent space of the models.
In all three examples you can see similar approaches at play: create noise (or even more generally random arrays of values between -1 and 1), scale values and add an offset per latent dimension. This is mainly the tuning part: playing around with different value arrays, noise seeds, and offsets helps in understanding the model’s general sonic behaviour. Then, macro scale comes into focus by establishing repetition or rhythmicality; technically, this is done by retriggering noise seeds or value arrays or by changing offsets gradually or abruptly on certain triggers - think sequencing here.
The first example shows earlier experiments with this; these resulted in a compilation of recordings called “Spoor”.
I was particularly interested in different lengths of loops or rhythmical figures and how they would still be recognizable when applied in latent space.
In the second example I was already generalizing earlier experiments into reusable abstractions; also, I started working with multiple models in one setup, trying to turn the general sound aesthetics into something less abstract. Output documentation of this is “Saatgut Proxy” or “Latent Jams”.
The third video shows an example of my most recent setup. I’m using between 3 and 5 models in one framework now and added another control layer that works either in analogy to a randomized automation in a given range of parameter values or more sophisticated interaction with latent mimikry. This latest framework is the one I created my contribution to the AI Song Contest 2025 with.
Q: What are the main challenges that you face as an AI musician?
Finding simple words for what the AI part in my music is. Considering reservations against AI, especially in the creative domain, I tend to use other terminology like deep learning, neural audio synthesis etc. - that usually makes it worse because I need to go more into the technical details making it even less tangible.
Q: Interesting you mention language. I see, for example, that you use the word “spawning” introduced by Holly Herndon or introduce the concepts of “parameter constellations” or “latent-jamming”. Do you think those new concepts make our music less accessible to wider audiences? Maybe the term generative music, its classical definition, beyond AI, already defines our music.
I think for the time being these terms can help to differentiate seemingly novel aspects of music creation against conventional ones primarily to encourage dialogue and discourse. We might find out on the way that they’re not necessary because existing terminology and concepts are sufficient. There are indicators that this is the case when thinking of concepts of generative music, as you said. I use “spawning” for example because I like the implicit meaning of arbitrariness in how entities come into being as opposed to “creating”, “writing”, “producing”, “developing” or other terminology that implies creative intent. For me it also implies a closed system: spawning new audio entities structurally cannot go beyond what the models have seen during training, it requires outside-system interaction or system-bending in order to create something new - pretty much the same as with conventional instruments. In “Latent Jamming”, the improvisation itself is certainly not a new approach, but as of now I think that acting with intent inside an unstable complex system while embracing and enforcing its instability might qualify as new. Does all this terminology make it less accessible to wider audiences? Maybe. Can it help fostering discourse and dialogue with the outcome of a more differentiated view on “AI Music”? Same.
Disclaimer. The views expressed in this content are those of Martin Heinze and do not reflect my own opinions or those of my employer. Only the introduction was written by me.



I've been to a performance in 2023, Mouja by Nicola Privato from Intelligent Instruments Lab, that delved into very similar ideas, with the tools of the time, mainly RAVE. Very fascinating soundscapes