Spotify’s Plans For AI Generated Music, Podcasts, and R ecommendations, According To Its Co-President, CTO, and CPO …

7 months ago

Html
Text

Spotify's Gustav Söderström talks about AI music, Notebook LM podcasts, and the nuance of building better discovery using LLMs.͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏

Forwarded this email? Subscribe here for more

If you’re finding value from Big Technology, please consider subscribing for $8/month to support our independent tech journalism:

Subscribe Today

Spotify’s Plans For AI Generated Music, Podcasts, and Recommendations, According To Its Co-President, CTO, and CPO Gustav Söderström

Spotify's Gustav Söderström talks about AI music, Notebook LM podcasts, and the nuance of building better discovery using LLMs.

Alex Kantrowitz

Nov 13

READ IN APP

Spotify has its hands full with generative AI. People are using tools like Suno and NotebookLM to generate synthetic music and podcasts that could fill its service. Meanwhile, the company sees the rise of LLMs as an opportunity to put users in dialogue with their recommendations, helping it respond to their feedback and serve the right mix at the right time.

To understand how Spotify plans to address these challenges and opportunities, I sat down with Gustav Söderström — the company’s co-president, chief technology officer, and chief product officer — for a long, deep conversation on Big Technology Podcast.

You can listen to the full conversation today on Apple Podcasts, Spotify, or your app of choice (and we’re launching video podcasts with Spotify this week). But I thought I’d send out the full transcript because I found the conversation fascinating. Here’s my discussion with Söderström, edited lightly for clarity and length.

Alex Kantrowitz: You run product at Spotify. Do you want AI-generated music on your platform?

Gustav Söderström: If you think about music, it's going through a journey of more capable tools. Go way back, even if you were a musical genius like Bach, you literally needed access to an orchestra to be able to realize that genius. Even if you could play multiple instruments yourself, you couldn't play them at the same time. Then we got to recorded music, and you could record one instrument at a time. So you got more and more independent. And then somewhere around the 80s, the synthesizer came along and meant that you didn't have to be able to play all the instruments yourself. You could sort of "fake" the drums using the synthesizer and the guitar and so forth. So I think there's been this progression of more powerful tools that enabled more and more creativity.

Then somewhere in the 90s, the digital audio workstation came along — and being a Swede, I'm very proud of this — and Avicii came along. Avicii wasn't very proficient at any one instrument or a singer. In a previous world, he would not have been considered a very creative person because he couldn't realize that without access to this tool: the digital audio workstation.

Turns out he was one of the most creative people we had that we are very, very proud of. For him, the digital audio workstation was what Steve Jobs would call "a bicycle for the mind". It meant that he could get more productive, and he could express his genius. And the big question with this next round of tools is the same: is it amplifying creativity, or is it replacing people?

I think it's amplifying creativity. It's giving more and more people access to be creative. You need even less motor skills than on a piano. You need less technical skills than their own workstation. So I think of them as tools.

And there's this interesting question of what is A.I. music? I think people say A.I. music, and they mean something that was prompted with not too much of a prompt and not too much work, like it's one hundred percent A.I. But the truth is that much of music being made today is a combination. I think many of the big artists are using AI for parts of their songs, or parts of the track, or the drums, etc.

So I think there's actually a scale between zero A.I. and one hundred percent A.I. And we're on this progression where it's actually going to be very difficult to say, what is an A.I. song?

Do you welcome this stuff on your platform? Let's say somebody does prompt one hundred percent A.I.; Spotify could fill up with songs that are A.I. prompted. It's very easy to create these songs and then upload them to the internet. How do you feel about those? Do you want them?

So there are two questions there. One is, what is Spotify about? We're a tool for for creators, and if creators want to use A.I. to enhance their music, as long as we follow the legislation and copyright laws, we want them to be able to monetize their music and pay out right? So for us, we're trying to support creators, and the music catalog has grown tremendously since we started, from tens of millions of tracks to hundreds of millions of tracks, and I think it's going to keep expanding.

But what I think is important for us to figure out for our job and the rest of the music industry, is if you go back to the years of piracy, there was this technology called peer to peer and file sharing that was amazing. We actually incorporated that technology into to Spotify.

But before Spotify, the technology preceded the business model. It was great for consumers. They could now get all of this music for free, but it didn't work for creators. And I think we're in the same period of time now where the technology has preceded the business model.

So I think that the technology is great but I do think we need to find a way for for the creators who have participated in this to be reimbursed. So that's something that we, and the rest of the industry, are thinking about. If we can find your business model, yeah, I think we could unlock a tremendous amount.

And then there's a separate question: with these model and the way they were trained, will that be considered legal or not? For example, in the US, these companies are now being sued. So I think that question will be decided by legislation.

But let's assume that there is one of these models, whether it has to be retrained on other data or not. Is that an interesting tool for us? If it was trained legally? Yes, if creators can participate in it.

Meta has A.I. image generators. The company’s feeds have filled with lots of A.I. generated images, they're engaging and Meta seems to be okay with this. And now some of the top content on a meta platform is Shrimp Jesus, which combines two of people's great loves, Jesus and seafood

I've seen that.

So, from a Spotify perspective: If these songs are generated by A.I. music generators become engaging, and they follow the rules, is that good for Spotify?

Well, I think if creators are using these technologies — where they are creating music in a legal way that we reimburse and people listen to them — and are successful, we should let people listen to them.

I think what is different is that I don't think it's our job to generate that music instead of the creators, right? That's a key difference. Are we, as a platform, for creators?Then we can have a discussion on which tools are they allowed to use, like the curious data or the workstation, but not LLM. Maybe we shouldn't decide that for them.

But there is the question, should we generate all the music ourselves? And that's where we're saying, "No, we're not going to generate that music." But maybe other platforms will, because it's cheap content, right?

So that's the key difference. We decided what we want to be in this world, and it's a platform for creators.

Okay, so there's a potential world where one of these tools seems to have violated copyright, and you might ban creators from uploading music that has used that tool?

We have detection systems for if it's a derivative of work of something that already exists. So we have systems to take these creators down. If you're creating something completely new that isn't a derivative of anything, and there isn't a copyright infringement, then the labels tell us. So that's the other question: what are these models trained on? But we're not creating the model, so we're watching what happens there, and we're going to follow the law.

But I think from a high level, this should be a very exciting tool for creators, for musicians, for authors, for podcasters. I think if you look at something like Notebook LM, for example, it was actually created by a journalist and a writer as a tool.

My bet is that these are bicycles for the mind, but sort of bicycles for the mind on steroids, right? And that when these shifts happens, there is always tension between the people who don't use these tools. It feels a little bit like cheating, and people are saying, "No, I want to be creative too!" And it's always a different, difficult transition period.

It's the story of technology. What you're describing is what happens within tech companies when you think you have something figured out, and then, new innovation…

That's what makes it fun, what makes it exciting.

As you think deeper about it, do we go to a place where you can start to prompt music that is going to be better than any song that you might listen to that has been created for certain moods. Music touches the heart. And if AI can do that, create the perfect song for whatever mood you’re in, does that become the future of music? Is that something that you can discount?

So I think there's two things. Music is used for many different things, right?

You have, for example, music that you're using to study. The extreme version of that is people listen to white noise. So, would white noise be generated? It's actually already artificially generated.

It's one of the top podcast formats.

Exactly. So there's a scale here, and I think you're right for certain things. Maybe you could create better white noise. Maybe you could create always varying ambient music for your studying, maybe for gaming, maybe that music should automatically adjust what's happening on the screen. So I think we're going to see lots of AI generated music for those use cases.

But there's another use case for that, which I think is very important. A lot of people use music to build their identity, right? Especially when you're a teenager, you go to a concert, you buy the jacket from that concert. Why did you buy that jacket? Well, it's like a pin. You're identifying with this band. You're building your own identity through this band.

I don't think that will work with AI generated music, because there is no one behind it. So I think some music, and I'm sure this is happening already. I'm sure many publishers are generating music for for coffee tables and so forth. That will probably happen.

But I do think the human need for for having someone to believe in an actual artist that you care about. I don't think Taylor Swift will be replaced by an AI, not because the music couldn't sound similar, but because the whole point is Taylor Swift and belonging to something. So I think it's not a binary answer, I think both. Both will probably happen.

Two years ago, I might have fully agreed with you that there's always going to be that need for the story and the human connection. Now I'm not so sure, because I do think that this stuff can be good enough.

What tends to happen in these worlds is that the thing that is scarce gets even more valuable. So one bet would be that true human connection gets more valuable than ever when a lot of what you talk to in the future may be LLMs. That would be my bet.

I'm hoping that's the case, because part of the business that I'm running is predicated on the idea of connecting to a human who can dissect and break stuff down is valuable. So I'm hoping that is the case but I also I'm not as sure as I used to be.

It's wise to not be sure of anything right now, given the place of pace of progress,

I think that brings us right into NotebookLM, this Google product that you can put notes in, which then generates this podcast with two co-hosts that sound ridiculously human.

Yeah, they do.

The AI hosts do a good enough job of breaking things down and I started to see them showing up in the second half of episodes, where people are like, "We're going to do the episode, and in the second half, we're going to give you the A.I. to listen to." But what happens if they end up being the first half? Spotify has made a big move into podcasts. What do you think about the rise of these A.I. podcast hosts?

I think NotebookLM is very impressive. You could predict, given the evolution of voice quality of these things and the understanding of a language model, that this would happen. So I'm not at all surprised that you can generate 'talk' audio that is engaging to listen to.

But what I think was the great innovation of NotebookLM, was that people generated monologues, and what humans really respond to are dialogues. And in retrospect, it's pretty obvious almost all podcasts are dialogs. Like if I sat here for one hour, it's not that interesting.

So I think the big hack was to to go through a piece of material and present it as a dialogue and prompt it the right way. There's also obviously the internal Gemini model at Google that is probably very good, and the voice models have gotten better.

But actually, I think what they found was a product market fit for the actual audio format. And it turned out to be the podcast form — quite literally — it's pretty crazy.

Somebody on Threads tagged me and was like, "The male voice sounds like you." And I listened and it was not the same tone, but it had the same cadence and the type of questions I'd ask. Does that mean that I'm just the blend of the 'unremarkable middle' of podcasters, or did they copy my voice? I'm hoping it's the second one.

It'll be interesting to see if people either get tired of hearing the same two people talk about everything, or the opposite, they get used to the same two people and would prefer them to be the same and build trust.

I don't know, I think humans are very quick and prone to sort of anthropomorphize, it's sort of a hack on our human brain. So you feel like you know these people, because you heard them talk about so many things now. It's hard to predict where we'll go, and as a platform, we view it the same way.

Of course, people are uploading these podcasts to Spotify as well, and I don't know from the top of my head if anyone has super high engagement, but certainly people are listening to them.

So it's the same question: does this turn into a tool for creative people who can write stories but don't want to have the podcast around it, or just have no one interviewing them, so they just do an interview around their own material?

I think you're going to run into the same problem where if you just ask it to talk about something, it's not going to be very good. You need a good source material.

So is this a tool for creative people to get even more productive and creative, or is it replacement of creative people? My bet is that it's another tool.

It's pretty interesting because it sort of broadens out the long tail. And the thing about these podcast generators, NotebookLM, in particular, is you can take it and create a podcast for something that's so niche that you would never have a show similar with AI code, right?

Yeah.

It’s similar to coding. With generative AI, you can now code things that you would never code before because you can do it faster and cheaper. Maybe it’s the same with podcasts, more will be made that never would’ve previously.

I love that. One useful framing I think of these techniques is financial framing, like how the cost of something goes to zero. When the cost of writing code goes to zero, cost of doing a podcast goes to zero, cost of prediction goes to zero.

Usually, what happens is that the alternatives to that good, they get challenged. But the compliments to that good, like: "What if the price of coffee goes to zero, then tea is going to be replaced, but sugar is a compliment that is going to explode."

So I think what's going to happen is exactly what you're saying. We're going to have enormous amounts of content around niches where it didn't make sense to produce a podcast. So one way to think about is just that the cost went to zero.

So I do think that the catalog is going to explode. Then what does that mean? Well, it probably means that the recommendation problem becomes even more important, because now it's even harder to keep track of everything that is uploaded.

I also think that if you have this vast sea of the perfect sort of discussion around any topic, the recommendation problem becomes more valuable to solve, the bigger the catalog is.

But I also think you're going to see the same thing as we see in music. The superstars will actually also get bigger. This is what I find fascinating.

People say, like are Netflix winning or YouTube? Well, the truth is both. And they're saying, are the Indies winning or Taylor Swift? Well, both of them are winning, but Taylor Swift is bigger than ever.

Okay, let's talk about A.I. recommendations. It's a big part of Spotify, and the vision is eventually you want Spotify to be this ambient friend for us that knows the context of the situations we're in. Is that right? Why would you be pursuing that?

Spotify was founded in 2006 which was pretty early on, and it's interesting that this was before machine learning became a thing, and so Spotify was quite focused on social features.

For purposes of recommendation, we needed social features, because that's how most people discover music, through a friend. So we wanted you to connect to people. And then AI came came along, or what was called machine learning back then, and we realized that through all the playlisting data we had was almost labeling for the user.

They were creating sets for themselves for Spotify, "These tracks go well together. These tracks go well together."

So we got a lot of label data, and we said internally, "now, some people have a musical friend that happens to know their taste and so forth, but most people don't. So now we can build this friend for everyone that was the A.I."

But the interesting thing is building a friend for everyone that can give music recommendations, like 'Discover Weekly', was always an analogy. I think what's happening now with A.I. is that the analogy is actually becoming reality.

And so you can see us moving a little bit in that direction. You have the A.I. DJ that has started to give Spotify a voice that talks to you. And I think what is going to happen with these LLMs is — at least for some brands — you'll start having literal relationships with them. And I would love for it to be the case that you think of Spotify as actually a friend, not an analogy anymore, but reality.

That this is a 'person', that this is a thing, that knows me well. This is a musical intelligence, a podcast intelligence, a book intelligence, and I actually like hearing it tell me about new things and suggest things I'm interested in.

So I think that's where we're moving. I think other brands are moving there as well. I think if you look at someone like Duolingo, they've actually only communicated through four characters all along, when you get a push notification, if it's not from Duolingo, it's from Lily or Sarah or something.

They really give me a hard time if I'm away for a couple hours.

With AI, you can actually talk to these characters. So I think this is a journey many companies are on, and it's interesting to see it play out. It means that part of what was called 'branding' before is now like 'personality'.

What personality do you want your company to have? Not as an analogy, but literally, what personality should Spotify have? I think that it's a fascinating time to work in in tech, and it's something we're thinking a lot about.

You have this AI DJ, and it's okay. The feedback I've heard is people were excited about it initially, and have gradually moved away from it. So what is actually happening with the AI DJ? Are people using it?

So in the numbers, they're not moving away from it. It's actually very successful.

So my friends are just pretty snobby music listeners.

Well, for the people that use it, it's actually their biggest set. Bigger than their Discover Weekly usage. So it's quite a binary experience. I think it's for people who don't know what they want to listen to and just want to put something on.

When we launched the AI DJ, the big innovation there was that we managed to basically digitize the voice of a real person to make it sound very believable. But the things that it said around the music were like, to some extent, heuristics and kind of repetitive after a while.

So what we've done since then is we've recently invested quite a lot in rolling out LLMs that actually tell interesting stories about the music, and we've seen this have very strong effects on the retention of the application.

So whereas the thing used to say, "Here is this, and this song from this and that, I think you'll like it." Now we can say things like, "This artist was just in Copenhagen, or has played here." You're starting to get interesting stories. We're starting to feel more personal.

The other thing that I think is missing, that I hope we can do someday, for it to be able to talk to you and you would like to be able to just talk to it and say, "No, this was not very good. My 'Discover Weekly' this week was not what I wanted," and give actual feedback. And that is technically very possible now with these LLMs, so that's what I'm hoping will happen. This should not be a one way relationship —which Spotify has been for technical reasons. It should turn into a two way relationship.

I want to talk to you about how much we should allow the algorithms to dictate what our music experience and podcast experience is going to be versus how much should be dictated by us? How much agency should we have over our own choices?

Kyle Chayka, a New Yorker reporter, recently wrote about how he's leaving Spotify. He said: "Through Spotify, I can browse many decades of published music more or less instantly; I can freely sample the work of new musicians. Yet it has become aggravatingly difficult to find what I want to listen to."

He says it’s became clearer than ever what the app has been pushing me to listen to what it suggests, not choose my music on my own. What do you think about that argument?

I'm going to get this person back on Spotify 100%. I think there's an interesting tradeoff here that is real. Some people want less friction. They want to spend less time searching. You want to make things as easy as possible, right? But there's the end of the line where you sit there and you just receive, and you don't give any signal back — maybe a few clicks and so forth. And that's something that we want to avoid.

What's interesting with Spotify, which we are re-emphasizing, is that it was actually a platform where you invested quite a lot in your own playlisting. And there's a tradeoff here in the vision that we should be so good at machine learning that you should never playlist again. That would be the goal, because then you've done the user a great service, supposedly. But then you also receive no signal and the user does no investment.

So, we're actually re-emphasizing playlisting quite a lot. Over the years, we've gone more towards machine learning and algorithms because it works; people listen more and they appreciate the service more. But we need to cater to everyone, including this reporter.

The Spotify user base is divided into many different kinds of people. You have listeners who only listen to playlists You have the 'hardcore album listeners'. You have the 'artists radio listeners' who only listen to one type of artist. It's a big challenge to build a service that serves everyone when people are very different.

We try our best to make sure that the music aficionados who want their library to be "album, album, album," can have their service but then you have other people who just want their daily mix to play in the air. So we're trying to build and cater for both. You can never please everyone one hundred percent but we're trying to be statistical about it, to make sure that it's vastly better for the majority of people, but our goal is to cater to everyone.

And I do think there's a real point around going to zero user investment — which seems good in the short term — but I don't think it's good in the long term because you actually lose signal from that user. And in the end, I think they feel less participatory in the experience. Even if the engagement looks high, if you've done no feedback, I don't know how much you feel this is actually your service.

I was DMing with Kyle last night and asked him what he thought I should ask you. And one of the things he suggested was, "Should Spotify users be able to tweak their recommendations?"

Yes, absolutely. We're working on these things, both the obvious things, where you can say, "I didn't like this particular thing," but I think the free text element is very interesting. You could talk to it and it would learn much more, but you would probably also get more trust.

Let me ask you one broader question about this, but I thought it was really interesting. Kyle wrote a book called "Filter World." The main argument is that our world mediated by algorithms has become too bland, and effectively, that algorithms have flattened out. Do you see that at all?

I think this is a really interesting argument. There are two ways I want to address that, one is for Spotify specifically. We've seen the feedback that people feel that it's great for the kind of stuff they already listened to, but feel like they're in a bubble — "I'm getting more of the same. I'm not getting new stuff." This is sort of a Spotify specific challenge because most of the time your phone is in your pocket when you're listening to a session.

Let's say you're listening to indie folk or something, then it's quite easy for us to say, "Here's another indie folk song," and you're gonna say, "Oh, that's a good recommendation." But if we start playing Metallica there, you're gonna be like, "What is this?"

So most of the recommendation inventory we have is kind of constrained, naturally, to what they are listening to because we can't put in very random things. You would say, "this is a bad recommendation."

It's a challenge for us when we want to show you something completely new. My favorite example is: I love Reggaeton, but you wouldn't have seen that from my listening history. How do we solve that problem?

So we started investing about two years ago in other types of foreground recommendation, sort of like the feeds that you see on social media, but you can literally say, like, "Okay, I'm bored. I want to go wide." Then you can go into these foreground feeds of music where you can swipe through many tracks. And they're very efficient but the hit rate is going to be low because now we're in a territory where the whole point is that we don't know that you like this.

Then I think you need a very efficient UI to target lots of content, right? Because the hit rate may be one in 20. You're not going to listen to 20 songs. That's over an hour of music and you need to go quick. So we're trying to solve that problem for when, for example, Alex is bored and he wants to branch out, we know as soon as he sends the signal. We didn't have tools for that before, so we built that. So that's part of the answer.

But the more philosophical part of this answer is, did the algorithms sort of flatten out? Because they are, to some extent, trying to find statistical patterns and averages.

And I think if you look at recommendation technology —I don't think this is widely known yet— but these deep learning based systems, they had flattened out in terms of, if you added more use data or more parameters, they did not get better like the LLMs. There were no scaling laws. It was just like "it is what it is", and you could move 0.2%. There's something that has happened there recently which is called generative recommendations, where you actually use a large language model instead of these old deep learning models, and you basically think of user actions as a language.

So you have a sequence for users— they click this, they listen to that, they click this, they listen to that. Then if you turn that into tokens, just as you can turn a language into tokens and try to predict the missing word in a sentence, you can try to predict the missing action in a sequence.

And it turns out that these generative recommendations, they do scale with more use of data and more parameters, just like the LLMs. So this is a long winded way of saying, I think he's right that the recommendations did flatten out. It's also true that people are changing recommendation stacks, and now it's unclear why they couldn't continuously get better.

So I'm hoping that the recommendations do get more intelligent, because now it's not just a statistical average. They can look at your specific user history going years back, and they could potentially understand that it's actually Christmas again, and last year at Christmas, you did this. I'm hoping it gets more intelligent.

One last question about recommendations that comes from Ranjan Roy, who's on the Friday show with us. He would like there to be a parent mode on Spotify, where if you have kids, you can be like, "I'm on child mode” don't blur my recommendations. What do you think about that?

So we have a bunch of different solutions for this. Obviously, there's a family plan, so hopefully your kid can have their own account.

Doesn't that cost more?

Exactly. The other thing is, you can create a playlist for your kid and then if you click the settings, you can say, "do not include in my recommendations" and then it actually doesn't destroy your recommendations at all. So there are those solutions.

We're also trying to understand that all of this is kids' music. So while this is part of your taste profile, we shouldn't play this in your other sets because this is probably something you're doing for a specific use case. So you probably want a kid's music playlist in there, but you don't want that music to affect your other sets.

There's an algorithmic component. There's a subscription plan component, and then it's back to more user control. You can already say that this playlist should not be considered my taste, and we're going to build more of those controls.

Okay, Ranjan will be happy to hear that.

Last question about recommendations, I don't know if you have seen this YouTuber, his name is Fontana. He was talking about how we used to hear music on the radio often, and that was the music that was played. There was music that would often be played when we're with other people, with friends, having a good time. And it led to more dance songs, rock album anthems and stuff like this.

Today we're mostly accessing music via streaming platforms, and he says those are much more individualized recommendations, which has kind of shifted the way that music is made, and even the hits in music. What do you think about that argument?

So there is a philosophical question there, which has been researched a few times, which is: do you have an innate taste in your brain? And is our job to search for that and find it, or do what we play actually affect what you like.And there are all these experiments in colleges where you play different songs to different groups, and then you see what they like. And it seems like it's a bit of both.

You have some sort of innate taste, but you're also affected by what you hear. To this argument, like the radio can change your taste. So I think there's truth to that argument.

What I think is interesting about our music listening is that when we survey users and ask them, "What percentage of your listening is with others?" It's a huge percentage, double digit percentages.

So music is actually still a very social activity, and in some cases we see this. We have this feature called 'Jam' that is taking off like a rocket. For us, it's doing very well. And with 'Jam' we can essentially detect when two phones are close to each other. It's just like, "Hey, do you want to join Alex's jam?"

And now we have a 'joint queue'. So at a party, the way you party right now with Spotify is that you don't have to go and interrupt, you can just bring up your phone, join the queue, and then queue things up.

We have a lot of joint listening, but it just looks to the individual as the individual listening to us. So I think it's actually happening more than people think. It's not one hundred percent individual listening, but because we don't see them as group listenings, we're still treating them as individual listens. But now that we're getting more data on what is good group music, that becomes a different category.

So I think the radio use case is happening. You're hearing songs at parties and with others, and when you're riding in the car and so forth. It just looks to these services as lonely listening, but it's actually quite social, right?

So Spotify is investing heavily in podcasts. This has been going on for a long time, first through, largely through an original strategy and now also audiobooks. What has gone into the decision to just bring all these formats together in one app? And I mean, are they good businesses for you — podcasts and audiobooks?

What happened is that we saw, internally, a lot of our developers hacking podcasts using RSS into the Spotify experience. And we saw it again and again and again at Hack weeks. At first we thought, maybe it's a niche, random need. But we saw it again and again.

You know, Spotify still just has many thousands of employees, so it's not a very representative sample of society, but it is some sample of society. And if you see the same user need many times, you should take it seriously.

So we started looking at that, and then we looked at podcasts that we saw it had a lot of potential and was growing, but we didn't think anyone was doing anything interesting with it. So we decided to just approach it because we saw the use need internally. We saw the market growing, sized it, and then saw that there was no one really investing in it. Apple hadn't invested in it, and they had like 98% of the market. So that's how we came to it.

Then the question is, why is it in the same application? Why not as a separate application? And there's two views on that. One is that it's a strategic decision.The biggest barrier to something new right now, unfortunately, isn't necessarily the quality of the application. It's the user acquisition cost.

Distribution is everything.

Distribution is still everything. And actually, at the beginning of the iPhone era, there was a lot of organic distribution. People went to the app store every day. It's like no one goes there anymore, so you almost have to pay for every news. So user acquisition cost is probably the biggest inhibitor to most business plans.

If we built a separate app, we would have to reacquire our own users again, and that would make it very expensive. And we've seen all of these big, big companies —the American tech companies— launching app after app and nothing worked. Then we look at China, which uses a different strategy of 'super apps', where they double down on their institutes, in on their own distribution. And so you can think of the podcast pre-installed.

So the strategic angle for this made sense. But I actually have a user angle on this where I think it is the better experience. I think in 2024, the user should not have to adapt the software to the content, the software should adapt to the content. Where if you play a piece of music, there should be skip buttons. If you play a podcast, it's not rocket science to change the skip buttons to 15 seconds scrub and if you play an audiobook to change into chapters. Like, come on, it's 2024, why do you have to switch apps for that, right?

We believe that it was both strategically best for us —because then we could double down our own distribution— but we also think this long term is the right user experience. It's easiest for the user. Now we have these beautiful connections between the audiobook and the author being interviewed in a podcast on the same thing where it's seamless. So that's the reason that we do it in the same application.

So talk about discoverability, because that's the biggest issue for podcasts. For instance, if I'm listening to tech shows and I'm not listening to "Big Technology Podcasts", I probably want to see that there's a show called that out there.

From what I've heard — discoverability, both from product people and from podcast producers, has been the biggest issue. So I'm just curious what you think about this discoverability thing.

I think you're completely right. Short-form formats are easier, because the discovery is the consumption. Like a talk on TikTok: it's not like there's a recommendation for the for this talk, like when you watched it, you consumed it. Music is almost the same, it's three minutes. It's not quite but it's almost like, if you discover it, you also consumed it.

Podcasts are different. You kind of need a trailer, because it could be an hour of investment. Books are actually even harder, where it could be 15 hours of investment. So I think a lot of the challenge is to create a good representation, a good short form representation of this long form content, to understand if you should invest your time, right? It's something that we are investing quite a lot in.

The podcast world didn't have that for the longest time, right? And I think this is also part of the reason why, if you look at the old Apple podcast world, it's a few shows that have a lot of followers, sort of forever, but it's really hard to break in for a new show.

I think it's changing now with these short form previews that are happening on —TikTok, YouTube, Spotify— where you can quickly go through and understand what a show is about. I think video is actually helping and it's the same in music. We see that music video is very important in the discovery moment, and a new release with a music video in an AB test does much better than a new release without a music video in terms of downstream.

And I think it's the same for podcast. If you're quickly saying, I'm interested in technology podcasts, it helps a lot to have video for these podcasts. So this is what we built, these 'foreground feeds', where you can go through a lot of material within your interest with lower friction.

So we're investing quite a lot in the "preview problem". And it's the same for books to get a good recommendation, a good understanding of the book. Quick is hard; You can use LLMs for that to try to summarize them, you can use the author's own summary. It's something we're investing quite a lot in.

Okay, so you've been introducing video for podcasts, I know this one is going to be on video and I'm hoping to do a lot more video podcasts through Spotify. But are you going to do a short-form video feed like TikTok?

Well, we already have that for the intro. As a creator, you can upload your video podcast. You can also choose, if this is the short form representation I want in sort of discover feeds. Spotify has discover feeds for music, for podcasts and for books.

But it's important to know that while they look like TikTok, on TikTok or Instagram— the item is the consumption itself, right? And they're measuring how long you stay in the feed. We are actually doing something different. It looks the same, but we're doing the complete opposite.

How often do you leave the feed?

How often you save it. So we're trying to get saved for later. We're ranking them on how many things you save, not how long you stay, which drives a very different recommendation, right? We're trying to get people to save your episode into the library to listen to the full thing. That's the ultimate message. We actually don't want you to stay in that feed. We want you to quickly get through and save a bunch of stuff so your library is full of interesting podcasts.

We're here at the end of the podcast and I'm wondering how often do people get to the end of podcasts on Spotify?

I don't know if I can share it, but you can see a curve like this starts at one hundred and it goes down. It depends between creators, but you have fall off in the beginning, then after a certain point, most people just go stick to the end, right? Then there's a really big drop off at ninety-something-percent, usually it's end music or something.

Right, and does the end music hurt with discoverability? Is Spotify is saying, "Okay, we had sixty-something-percent up until the last minute, but then they go down to 30% before they complete.” So should we end abruptly?

Now we control for that. We understand that this is the end credits, and people move on.

So we can take our time coming in for a smooth ending.

Yeah, you can have a good exit song.

Good Exit Song Plays.

Thank you for reading Big Technology! Paid subscribers get our weekly column, breaking news insights from a panel of experts, monthly stories from Amazon vet Kristi Coulter, and plenty more. Please consider signing up here.

Upgrade To Paid

Comment

Restack

Share on

Other newsletters from Buzzfeed.com

Microsoft AI CEO Mustafa Suleyman: Why Our AI Diagnostician Outperforms Doctors Buzzfeed.com
Last Thursday at 15:57
Big Technology Turns Five: Here’s What I’ve Lea rned Buzzfeed.com
13 days ago
Why Apple Must Buy Perplexity Buzzfeed.com
20 days ago

More newsletters from Buzzfeed.com

Related newsletters

View other categories