Summary
Podcasts are one of the few mediums in the internet era that are still distributed through an open ecosystem. This has a number of benefits, but it also brings the challenge of making it difficult to find the content that you are looking for. Frustrated by the inability to pick and choose single episodes across various shows for his listening Wenbin Fang started the Listen Notes project to fulfill his own needs. He ended up turning that project into his full time business which has grown into the most full featured podcast search engine on the market. In this episode he explains how he build the Listen Notes application using Python and Django, his work to turn it into a sustainable business, and the various ways that you can build other applications and experiences on top of his API.
Announcements
- Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science.
- When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show!
- Your host as usual is Tobias Macey and today I’m interviewing Wenbin Fang about the technology powering the Listen Notes podcast discovery platform
Interview
- Introductions
- How did you get introduced to Python?
- Can you describe what Listen Notes is and the story behind it?
- What are some of the main goals that listeners have when searching for a podcast?
- What are the challenges that they commonly encounter when looking for information in a podcast?
- What are the different sources of information that you can use to extract useful details about a podcast?
- How do you identify and prioritize new features or product enhancements?
- Can you describe how the Listen Notes platform is architected?
- How has it changed or evolved since you first began working on it?
- How did you approach the technology selection for the initial version of Listen Notes?
- If you were to start over today, what might you do differently?
- What are the technical challenges that are posed by the ecosystem around podcasts?
- What are the biggest changes that have happened in the methods of production and consumption for podcasts since you first became involved in the space?
- How do you approach the design and contracts of the Listen Notes web API given how core that is to your platform?
- What are the most complex or complicated engineering projects that you have done for Listen Notes?
- What are the pieces of the infrastructure for podcasts that you would like to see improved, changed, or replaced?
- What are some of the kinds of projects that developers can build with the Listen Notes API?
- What, if any, impact have the introduction of podcasts to closed platforms such as Spotify, Amazon Music, etc. had on your business?
- What are some of the most surprising things that you have learned about podcasts and their consumption while building Listen Notes?
- What are the most interesting, innovative, or unexpected ways that you have seen Listen Notes used?
- What are the most interesting, unexpected, or challenging lessons that you have learned while working on Listen Notes?
- What do you have planned for the future of Listen Notes?
Keep In Touch
Picks
- Tobias
- Wenbin
- Superhuman email client
Links
- Listen Notes
- Graphviz
- NextDoor
- PostgreSQL
- Elasticsearch
- Redis
- RabbitMQ
- Celery
- ReactJS
- Django
- Bootstrap CSS
- Digital Ocean
- Tailwind CSS
- Entity Resolution
- Clickhouse
The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA
Hello, and welcome to podcast dot in it, the podcast about Python and the people who make it great. When you're ready to launch your next app or want to try a project you hear about on the show, you'll need somewhere to deploy it. So take a look at our friends over at Linode. With the launch of their managed Kubernetes platform, it's easy to get started with the next generation of deployment and scaling powered by the battle tested Linode platform, including simple pricing, node balancers, 40 gigabit networking, dedicated CPU and GPU instances, and worldwide data centers.
Go to python podcast.com/linode, that's l I n o d e, today and get a $100 credit to try out a Kubernetes cluster of your own. And don't forget to thank them for their continued support of this show. Your host as usual is Tobias Macy. And today, I'm interviewing Wenbin Fang about the technology powering the Listen Notes podcast discovery platform. So, Wenbin, can you start by introducing yourself?
[00:01:08] Unknown:
Sure. First, thanks for having me on. Hello, everyone. I'm Wundhian Feng, founder of Listen Knowles, protocol search engine and database. I grew up in China, came to the United States in 2010 for graduate study in computer science. Now, I live in San Francisco, running in ListenKnowles. Listeners is not a conventional company. It's only 1 full time employee, just me. I have a few freelancers helping me on on and off for
[00:01:38] Unknown:
content designs and other stuff. Yeah. And do you remember how you first got introduced to Python?
[00:01:44] Unknown:
It was in the summer of 2012 when I was in graduate school. I was working on a research project. I needed to pass a bunch of log files and do data visualization using graph v's. I wanted to use a scripting language, but I don't remember why I pick Python. Maybe I had the impression that Python is good processing strings, textual data, something like that. Then in 2013, I joined a startup called Nextdoor, a local social network. That's when I joined Django and seriously used Python to write production code.
[00:02:25] Unknown:
And so can you describe a bit more about what the Listen Notes project is and some of the story behind how it got started and why you decided that this what you wanted to spend your time doing, and how you turned it into a full time business?
[00:02:38] Unknown:
So Listenos is a protocol search engine and database. Specifically, it's 1 database, 3 UIs. 1 Polycom database, 3 user interfaces. What do I mean? So 1 Polycom database is easy to understand, but we need UIs to access to the database. Right? The first UI is listenos.com. It's a website. You type in keywords, and then you find Polycast episodes and, Polycast. The second UI is the Polycast API. API stands for application programming interface. It's a UI. So people, developers can use the API to build apps, to build services, accessing to database and search engine.
Now so the UI, there's no good name for you, so I call it bring your own UI. So people can come to our website, export Polycoms metadata into CSV files, and then open in Excel, Google Sheets, or whatever UI. You can write a Python script to pass the CSV, bring your own UI. For stories, like many software engineers, I've been working on some side projects on and off. Nissanos was 1 of my side projects. It was started in January 2017. So 4 years ago, close to 5 5 years ago. Initially, I worked on this project part time for 1 week, and then launched it. It was basically a single page, Then I started to work on it full time in September 2017, so it's 9 months after I started it as a side project.
Why did I start a side project in the first place? So, personally, I really wanted this project to exist. I listened to a lot of podcasts in my spare time, and when I was doing some boring engineering work, like writing unit text, Back in 2016 and 2017, almost all the Polycast apps require you to subscribe to Polycast first, and then listen to individual episodes. This is pretty bad. There are more and more Polycast. I cannot subscribe to too many Polycast. Actually, I'm an inbox 0 person. What I really wanted was to be able to find individual episodes to listen without subscribing to podcast.
Right now, we have listen notes, and today, I use listen notes a lot. I use listen notes to find individual episodes, add episodes to my listen data playlist on listen notes dot com, which provides a RSS feed. I subscribe to this RSSP overcast. I don't subscribe to any podcast other than this RSSP. Since the end of 2017, I've listened to more than 5,000 podcast episodes in this single
[00:05:31] Unknown:
Yeah. There are definitely a lot of interesting usage patterns around podcasts, and I know that there are still a wide number of people who have yet to actually dig into that ecosystem and start exploring the content that's available just because there are a number of kind of user experience quirks to the whole ecosystem around how you find them, how you consume them. As you were mentioning about, you don't necessarily want to subscribe to the whole show. You just care about 1 episode. And 1 of the things that I'm interested in digging into a little bit is your point about being able to generate your own custom RSS feed of all the episodes that you care about that you can then subscribe to. And I know that there are feed aggregator services that are generally used by podcast producers so that if they have multiple different shows, they can have 1 unified RSS feed across all of them. But this is an interesting sort of different take on it where people can curate their own RSS feed for individual episodes. And I'm wondering if you can maybe dig into some of the user experience aspects of how that improves the dynamics around being able to consume podcasts and why people might want to use this capability in listen notes versus just a feed aggregator, like like FeedBurner or something like that? I'm not familiar with feed burner. I can only speak for this and data
[00:06:48] Unknown:
playlist on listen notes. I can tell you, some use cases, how people use listen notes and the playlist feature. So 1 use case is in schools. So teachers who queue the playlist by topic and give the playlist to students, Use, our listen data feature. So we we know that in schools, teachers will cure reading list for books, for articles, and with the playlist feature, they can aggregate episodes by topic. Also, some email newsletter writers will use listen notes to find episodes and curate podcast episodes. Because nowadays, many email newsletters are curate content curation type newsletters. They curate articles, curate podcast episodes.
Also, some marketing people will need to do content research across different types of media. They may want to write a blog post, so they need to do research. We provide a tool for them to search Polycast API source by topic, and they can cure rates. And this is very good for content research. And it could also be, useful for podcasters, like you. So before interviewing a person, you might want to find a past podcast interviews for that person, and then binge you this and a little bit to avoid asking the same questions.
[00:08:14] Unknown:
Yeah. It's I listen to a number of podcasts myself as well, and so it's always interesting when I have somebody who's scheduled to be coming up on an interview for 1 of my shows, and then I happen to chance across an episode that they did on a different podcast. Like you said, it gives you some ideas about things that you might want to dig into deeper that, you know, were maybe touched on at a surface level in the other podcast. So I think particularly given the fact that the ubiquitous interface to the Internet as a whole has become search engines, it makes sense that a search engine is the natural interface people are going to gravitate towards as they're first starting to explore the space around podcasts. And I'm wondering if you can talk to some of the kind of discovery capabilities that are present in some of the different podcast applications. So I'm most familiar with Pocket Cast because that's the 1 that I use, but I know that pretty much every system has some level of discovery capability and some of the ways that that surfaces shows or episodes versus how people experience it when they're coming to listen notes?
[00:09:13] Unknown:
When I started listen notes, most podcast apps provide some kind of search ability, but it mostly only allow you to search podcast, no episodes. And what we really wanted, was to search individual episodes. Today, I see many Polycast apps already provide the episode search. Their question is whether the search quality is good You can also index a bunch of web pages to provide a search engine, but the key is to provide good search ranking algorithm to service the most relevant contents. And this is what we are working on right now to keep improving the search relevance on our search engine. There are some other discovery features on our website that I don't see on other podcast apps. For example, there's a feature on website, it's called listen real time, which is like Google Analytics, real time. You can see Polycast episodes are being listened right now on our website.
So this is 1 way for people to discover some niche podcast episodes. They don't even know it is. Also, there are a bunch of apps provide some kind of curated list by human editors.
[00:10:37] Unknown:
We also have such a list as well on our website. And as far as being able to actually build a search engine around podcasts, what are the different sources of information that are available for being able to extract some of the useful information that you might be looking for as a consumer of podcasts where I can see who's on the episode, who created the show, what are the topics, you know, being able to dig into the actual content of the episode given that it's generally going to be audio and transcripts are only going to be sporadically available, and just some of the technical kind of data collection aspects of being able to build up the search database?
[00:11:14] Unknown:
Primarily, we index metadata from RSVP, like title, description, things like that. If an episode has transcript, we will also index transcript. Yeah. Banking algorithm, we actually aggregate multiple signals to determine what aerosols or protocols to service on top of the results. So some signals we use, well, I can only disclose a little bit. I don't want to fully disclose everything. So, for example, a podcast is mentioned on New York Times, top 10 true crime podcasts in a New York Times article, then probably this is a good podcast. Right? And if an episode is mentioned in the good email news data, then probably this is a good episode.
So something like that. And we also look at first party data. So by first party data, I mean, some activities, from our users, from our website. Like, if an episode is added to a bunch of playlist, then probably it's it's very popular. It's good. As far as
[00:12:21] Unknown:
the product journey of Listen notes, I guess, what were the initial capabilities that you focused on at launch to say these are the most important things that I care about and that I think my users are going to care about and some of the ways that the additional capabilities and features kind of followed on as a natural progression from those initial feature sets?
[00:12:40] Unknown:
Yes. So at the very beginning in January 2017, as I said, it was a single page web app. So a search bar, you type in keyword, and you see some search results on a single page. And then after I started working on it full time, I added the playlist feature. So you can add the individual episodes into Playtest. And then in the end of 2017, some people reach out to me asking if we provide API, because they wanted to add a search feature to their own Polycast app, so we provide API. And the initial version of API was also simple. There were only 3 endpoints, search, fetch Polycast metadata, and fetch episode metadata, 3 endpoints.
And then later, we keep providing more Polycast discovery features, like the 1 I mentioned earlier, the real time feature. And also, yeah, there are a bunch of discovery features. I don't think I want to enumerate on my 1, even predicting in the journals. Pretty much in the past 3 or 4 years is to iterate on existing features, mainly incremental improvements. Right now, if you look at our product as API documentation, you can see a bunch of endpoints. Pretty much these endpoints mirror to the existing product features on the website.
[00:14:10] Unknown:
Digging into the technical implementation and the architecture of the Listen Notes platform, I'm wondering if you can talk through how you designed the system and some of the technical complexities that have built up as you started to add more capabilities and as you started to scale out to be able to service your growing customer base?
[00:14:30] Unknown:
So I can describe what the architecture looked like right now and then compare it with the initial architecture. So as of today, I primarily describe the decentralized website. Although we have 2 experimental iOS app, but it's irrelevant. It's irrelevant to the core product. So this and those website, the front end is the JS plus Tailwind CSS. Those JS and the CSS files are bundled through webpack. We upload bundles to s 3 and serve through CloudFront. And for back end, it's primarily Django and Python for web servers and API servers. And we use Postgres as the main database. We are running 3 database instances, dv1, dv2, dv3. Dv1 is the master DB, and the other 2 are slave DVs.
We use Elasticsearch for search engine. There are 3 Elasticsearch instances, and we use Radius, obviously, for some caching statistics stuff. We use rapid mq for mister q and the salary for async task workup. We run everything, everything, the infrastructure on AWS. As we are speaking today, December 7, 2021, AWS is a huge outage right now. At this moment, US is 1 region is down. So okay. So this is the raw architecture for this analysis today. And when I started in January 2017, the software I used was roughly the same, with HLS, Django, Python, Postgres, Elasticsearch, These are all the same. Except that at the beginning, I were using Footstrap CSS, and I ran the whole infrastructure on 3 small instances on DigitalOcean.
Each instance was like $10 per month, something like that. And later on, we migrated from digitals into database. In terms of scaling up, I think this architecture is very common, and it's very easy to scale up. Right now, we have several API servers, several web servers, and we can horizontally scale them up easily.
[00:16:48] Unknown:
It seems that probably the most challenging feature to be able to support in a scalable fashion is the real time listening to see what are people architectural capabilities that you've built in to be able to support that more near real time feed of information as people are interacting with the website, being able to propagate that back out? We write all these states, these logs to radius,
[00:17:15] Unknown:
and is powerful enough to support scale we have right now. So there's no worry about it.
[00:17:23] Unknown:
As far as the technical choice, I know you mentioned that you learned Python and Django as you were building it next door. And so as engineers, we often gravitate to use what we know. But I'm wondering, as you were building out the initial project and as you have gone through these different iterations of being able to scale and add new capabilities, What are some of the decision points that you've gone through about what other technologies to add in or whether to rearchitect and bring in different technological choices and just some of the either pain points that you've run into about the stack that you chose or some of the benefits that you've realized because of how you've designed things?
[00:18:01] Unknown:
Basically, I use all the technologies I already knew at the beginning. I didn't want to spend time to learn new things. Those technologies were used in my former employer, Nextdoor. I know it's beta testing. I know this tech stack can support a multi $1,000,000,000 company. This works fine for this and those. Yeah. The whole tech stack is pretty similar to Nextdoor's tech stack. You want to say anything new from Nextdoor? I think I use Tailwind CSS, but it is not part of the business front end. I don't know if Nextdoor is using Tailwind because it's quite new. Yeah. Banking infrastructure is not really the same. So there's no innovation here, to be frank. Yeah. Well, there's definitely a lot to be said for choosing boring, battle tested technologies
[00:18:50] Unknown:
and not innovating on the pieces of the business that don't actually matter to end users. Yeah. In terms of the podcast ecosystem, I know that there has been a lot of evolution in terms of some of the specifications for RSS formats and new entrants as far as companies who are offering podcasts in their different platforms, most notably, Spotify and Amazon Music have been some of the recent entrants, but there have also been companies like Pandora and number of others that have added capabilities. And there are a bunch of different discovery platforms that have come about for making it easier for end users to find and discover podcasts they care about or platforms for podcast hosts to be able to network with their listeners. And I'm just wondering what you have seen as some of the most notable changes in the ecosystem surrounding podcasts and particularly in terms of the feed formats and data sources that have become available for you to be able to incorporate additional signals to people who are trying to find the information that they care about?
[00:19:53] Unknown:
I haven't seen much changes in the irisense format. There's some specification changes, but it's really hard to get all the major platforms to adopt the same standard, especially AirPod. So how do you talk to Air Force Podcast? How do you find a human being from Air Force Podcast to talk to them, ask them to support a a specific new specification? Right? If Apple Podcast doesn't adopt the new specification, then, I think it's it's very challenging.
[00:20:27] Unknown:
Yeah. And as far as the types of information, you were mentioning that you were pulling in the RSS feed metadata, looking at hosts and topics and things like that. And I'm wondering if there are any more sort of analytical processes that you're bringing in to do things like entity resolution or anything like that to give more you know, building out a knowledge graph to make it easier for people to be able to say, okay, I found this person in in this 1 podcast. Now list all the other places that they've shown up. In the back end, we run NLP entity recognition
[00:20:58] Unknown:
to extract persons' names, locations, events, things like that. If you go to our web page for individual Polycast episodes, you'll find this kind of information. When you click a person's name, and you bring up a list of search results for that specific person, specific location,
[00:21:16] Unknown:
yeah, something like that. And then as far as the API aspect of it, you mentioned that when you launched, there were 3 endpoints available, and it was very simple and straightforward. And I'm wondering what you have settled on as far as any design philosophies or your overall approach to identifying how to structure that API so that it is useful and composable for end users to be able to build different applications on top of?
[00:21:41] Unknown:
It's actually a very natural evolution. So, as I said, API was initially launched in the end 2017. You had the 3 API endpoints initially, and then it evolves naturally. So first, I made major website features to the API endpoints. And I also talk to API users frequently, and their feedbacks are very important to decide, how the API would look like, what kind of response data, what kind of data fields they need to build an app. 1 principle I stick to is, is always be well compatible. Don't break the existing API endpoints.
[00:22:20] Unknown:
In terms of the technical challenges that you faced, what have been some of the most complex or difficult aspects of building the business and being able to scale to where you are now and being able to sort of collect useful signals and feedback from end users to improve and iterate on the project?
[00:22:39] Unknown:
But to be honest, all these technical stuff, all solved issues, they are not that difficult. Some of them are quite time consuming. I'll give you example. Upgrading Postgres from 9.17 to the version 11 without significant downtime. It is easy to say, but it's not easy to do. So with the upgrade, always keep the infrastructure software to the latest version. If not the latest version, at least close to the latest version because of bug fixes, security fixes, new features. Right? It's always a drama, tech companies, for upgrading across major versions, especially for database.
Last time, I upgraded the host grades from 9.17 to 11. I managed to achieve between 30 seconds 60 seconds downtime for write access and 0 downtime for read access. So, they involve a lot of prep work and a bunch of rehearsals on staging environment. Just to emulate all kinds of failure scenarios, and try to recover the service, how to go back, things like that. This is challenging. We need to do similar thing for Elasticsearch, other pieces of infrastructure to our software. Right now, we are providing API. We need to make sure API doesn't go down, doesn't have 0 time, doesn't have downtime.
So, yeah, it's challenging to manage this kind of infrastructure.
[00:24:13] Unknown:
Going back to the entrance of new businesses and interests in the podcast ecosystem. So Spotify probably had the most amount of press around it because it was such a big player in the music ecosystem already. And I'm wondering what you've seen as far as some of the shifts in the ways people are discovering podcasts or getting introduced to the idea of podcasts as these bigger companies start to play in the ecosystem and how that has influenced the way that you think about the value that you're providing to the space.
[00:24:43] Unknown:
So for Spotify's entrance into Polycast, I don't think it affects listeners too much because it's a it's a work out that it doesn't participate in the open ecosystem. Arguably, people can say, over Spotify is provided, it's not Polycast, because it's not based on ISP, but based on the narrow definition of ISP. Right? But, a good thing is, Spotify expand the audience audio consumption, the spoken audios. Right? The podcast listener, Spotify, might not be the traditional or existing podcast listeners. So existing podcast listeners pretty much use the same open RSS space, the Polycast players.
And Spotify may only target to their existing users, music listeners, and they hand them to discover, oh, there's a spoken audio there.
[00:25:41] Unknown:
As a podcaster myself, 1 of the challenges of the ecosystem is being able to understand who your audience reasons this is how many listeners I have is if you own the whole experience. And I'm wondering what you've seen as some of the useful pieces of information that you're able to provide either both you know, whether for your own purposes of being able to see what are people's listening habits or being able to provide back to podcast producers because of the fact that you do have a mechanism for consuming those podcasts and just some of the useful types of information that you're able to generate from listen notes? I would say I haven't done much to provide such a status back
[00:26:30] Unknown:
to product house producers, but in the future, we certainly do more. Right now, we certainly have the search history for how people discover certain polycast. We have the listen history, listen stays for individual episodes on our own platform. Basically, each podcast players all have their own listen stays by its fragmented. We developed a new metric called distance score, which is to estimate the popularity of podcast and then provide a global ranking. This is to give people a lot of sense
[00:27:07] Unknown:
how popular, especially Polycoms is. It's not 100% accurate. You have a basic sense. Yeah. As I was preparing for this interview, I was looking at my own podcasts on there and was happy to see that they were ranking in the top 1%. So I don't know if you can maybe dig a bit more into some of the signals that you're using to determine that popularity metric and what it actually means to be within a given percentile.
[00:27:29] Unknown:
Yeah. So ASR a decent score is used in our search ranking algorithm. So just now, I I mentioned that, a few signals we use to determine the search ranking. Azure, we use the same signals to calculate such a recent score. Basically, we use the 1st party data and the 3rd party data. 1st party data are the user activities on our own ecosystem, And so the party datas are those on the open web. I mentioned New York Times say, oh, this is 1 of the top 10 true crime podcast, something like that. And also some social media activities, and we combine all these signals together, and there's a formula.
We assign different weights on each signal, come up with this score. We still need to continue tweaking the the score. And the global ranking is based on these scores.
[00:28:20] Unknown:
Yeah. It definitely seems like there's a lot of interesting opportunities for adding additional sort of data processing workflows to be able to bring in some of that additional information. So being able to do things like every time you find a podcast that's registered or that's indexed on your platform, you can maybe set up a Google Trends alert or to say, anytime this podcast is mentioned, I'll get a notification that'll serve as another signal into this to be able to bump it up in the, you know, recommendation algorithm, things like that, or being able to do additional entity extraction if this host appears on this other podcast. So and then as far as being able to manage all those workflows, are you just doing that as Celery tasks within the web application, or do you also have a dedicated sort of data infrastructure to be able to pull in these signals and process it and then load it into the database to be able to have a more pre aggregated view of it?
[00:29:13] Unknown:
So we use ClearHouse. I forgot to mention ClearHouse in the infrastructure part. We use ClearHouse to store some aggregate data logs. It's a column based data warehouse. Yes. We use Ceri as async task to do the heavy lifting data crunching. And also, radius, is also used in in the pipeline.
[00:29:34] Unknown:
It's multiple pieces. As with everything in technology, there are so many different ways that you can approach it. So it's always interesting to see the pieces that people put together to solve their different problems. And so given the fact that you have a few different interfaces on top of this overall trove of data, I'm wondering if you can talk to some of the main use cases that people have built out for Listen notes, whether it's end users who are using the web UI to discover and curate a podcast feed or people who are building applications on top of the API or doing their own data analysis from the CSV exports and just some of the workflows that people are building out around Listen notes as a platform service.
[00:30:15] Unknown:
Yeah. I I talked about the use cases for API first. So, basically, if you want to access to a Polycast database or you want to search a Polycast, you have 2 choices. You build your own search engine, you build your own database, or you use some kind of API. There are some examples. So, for example, people want to build their own podcast app. I didn't know that there are so many podcast apps out there. If you search Polycast app in App Store, you can see a bunch of niche Polycast apps. And, also, there are many Polycast clipping apps. So people can create clips, short clips from Polycast ADSource and share to social media. And then we use our API to search to find specific episode.
Basically, it uses a better way for onboarding users. Also, a bunch of audio apps that want to get into Polycast, they will use our API, like music apps, audiobook apps. Spotify is not the only music app in the world. There are tens of thousands of small music apps. And also, Audible is not the only audio app. There are tons of audio app. And so, Polycom is a very good adjacent market for them to expand to. Also, there are some social apps that allow people to share and discuss things, like movies, games, restaurants, and podcasts.
And we provide a search feature so their users can easily find podcasts and discuss podcasts. And also some content curation sites, our content discovery sites, so we use our API to help people discover podcasts. 1 specific example is some website will provide information for specific stock. They will list all the podcast interviews from the CEO or CFO from a public company. So people can listen to the Polycast interviews and make decisions, whether or not to buy the stock. Well, this is not a good investment advice, somehow you get a signal. These are the rough use case of our API.
And in terms of the CSV buyers, some financial institutions would want to find as much data as possible to do, for some stock picking, stock alternative data, satellite data, or protocols data. So they will search a person's name or topic, something like that. I don't really know, how they use our data. They just export the data, and they will use it. Also, some PR companies want to pitch to Polycast for guest opportunities. So they will need a bunch of Polycast information to do research on what Polycast to pitch to. Yeah. I definitely get lots of those emails.
[00:33:11] Unknown:
And so for people who are using Python and want to consume the Listen Notes API, it seems seems that you have an SDK available. I'm wondering if you can talk to some of your motivation for building out a dedicated client library versus just telling people to just use requests and have at it? So it's interesting. Oh, I forgot to mention 1
[00:33:31] Unknown:
important use case, our API. Our API is used in many coding boot camps. I don't think you can use best new programmers to use REST API right away. They might need some specific SDK, language specific SDKs to make function calls. So we provide a bunch of SDKs in different languages. We have Python SDK, Nubi, Knowledges, Lost, Suite, Java, Go, Basically, all the major languages. Yeah.
[00:34:04] Unknown:
What are some of the types of convenience features that you're building into the SDK to make it easier to work with the Listen Notes API and some of the, maybe, common workflows that people are trying to build around it? Actually, nothing exciting now. It's a thin
[00:34:18] Unknown:
repo.
[00:34:21] Unknown:
And I guess as far as people who are using listen nodes, either as end users or people who are building applications on top of it, what are some of the most interesting or innovative or unexpected ways that you've seen the API used?
[00:34:34] Unknown:
A lot of people search promo code to find deals using this and those website and API. Actually, there's a company use our API to find promo code because they are providing some coupons to people. You know, 1 major monetization strategy for Polycast is to provide promo code.
[00:34:57] Unknown:
As the podcast ecosystem continues to evolve and mature and more companies and individuals become involved in it, what are some of the missing pieces that you think can and should be filled in to make it more viable as a broad ecosystem so that more people can interact and collaborate and grow the entire community?
[00:35:17] Unknown:
Personally, I really want to have a unique ID of individual episodes that major Polycast platforms all agree. So nowadays, if you want to uniquely identify a Polycast, you may use ISSP or Apple Polycast ID. But how about Aviso? So there's no unique ID for Aviso across platform. So if you want to share Aviso to Twitter, okay, or URL do you use? You can always just use Apple Podcast URL, where Android users will be pissed off. If you share Spotify link, well, not everyone is a Spotify user, then what URL do you use? Somehow you need to have a unique ID so you can deep link and then open whatever apps you have on the system. So, yeah. There's a GUID tag in RSSP, but it's not reliable.
The GUID is not permanent. It changes a lot, so there's no unique ID.
[00:36:23] Unknown:
Yeah. I've had to regenerate the GWT on a couple of episodes because of publishing errors. So I'm definitely guilty of being somebody that changes that.
[00:36:33] Unknown:
We don't use GUID, in Nissan Alt. We don't trust it. With good reason.
[00:36:38] Unknown:
And then as far as being able to have that kind of universal linking, I know that there are a few different platforms and services that are offering that as a feature where, you know, you publish your RSS feed through us, and then we'll give you a single endpoint that you can share with people so that it will automatically open the appropriate podcast player on their platform of choice. That
[00:36:59] Unknown:
that's only for the entire podcast because they uniquely identify podcast by RSVP or iTunes ID. It's not a PC.
[00:37:08] Unknown:
Fair point. And as you have spent a lot of time building the Listen Notes project and digging into the podcast ecosystem, what are some of the most interesting or unexpected things that you've learned about podcasts, the ecosystem around it, and the ways that people are consuming and using podcasts in their daily lives?
[00:37:26] Unknown:
I would say that I would be surprised that there's a I was surprised to learn that there's a podcast for that. Basically, you can learn anything. Just listen to Polycast. Polycast are very good, informal, lending material. If you want to learn a new topic, you can search, and you can find some experts talking about a topic. It's informal lending because if you want to dig deeper, you still need to read books or attend lectures.
[00:37:55] Unknown:
As a consumer of podcasts yourself, what are some of the best practices that you've seen from some publishers that you think can and should be more broadly adopted by everybody who's producing podcasts, whether it's in terms of the information that they're producing in the RSS feed or things that they're including in the show notes or the availability of transcripts or just anything having to do with making it more accessible for end users to be able to understand more easily what it is that they're going to get out of it as a as a listener.
[00:38:29] Unknown:
There's 1 thing I wish more podcasts use, stricter markers. People can know, oh, this segment is talking about what, which timestamp. Now there's many Polycast players support the deep link to specific timestamp, so you clear on timestamp, and then you can fast forward to a specific location.
[00:38:49] Unknown:
Alright. I will take that as feedback, the way I'm producing my podcast. Thank you. Can't promise anything because it is a bit of a labor intensive process, but I will take note. And in terms of your own experience of building the Listen Notes platform and growing it as a business, what are some of the most interesting or unexpected or challenging lessons that you've learned in the process?
[00:39:12] Unknown:
2 things. I want to say 2 things. 1 is great customer service is an effective marketing strategy. I personally reply emails very fast. I try to be helpful. Some people send emails asking for Polycast recommendations. I would reply with with specific Polycast. So I think if you serve customers well, they will help promote your service, your company. So providing great customer service is very important. That's 1 thing. And the second thing is, I'm surprised to find that there are a bunch of fake podcasts for link building purposes.
Some player has SEO from India and Pakistan tried to submit fake podcasts, containing synthetic audio and a bunch of things. They tried to link building. So on our end, we need to build a bunch of tools to fight such, fake podcast.
[00:40:09] Unknown:
For people who are looking for a way to discover new podcast or specific episodes, what are some of the cases where Listen notes might be the wrong choice and they're better suited just going with the Spotify app that they're already using or, general Google search?
[00:40:24] Unknown:
So we suddenly don't have exclusive contents from Spotify. It's illegal for us to provide exclusive contents on our platform. So if you want to listen to Spotify's exclusives, go to Spotify. Other than that, I think business knows it's a good place to discover the Holocaust. I certainly have bias. Fair enough. And
[00:40:47] Unknown:
as you continue to manage the platform and build the project and add new features and capabilities, what are some of the things you have planned for the near to medium term?
[00:40:57] Unknown:
Instead of giving you very concrete product features or infrastructure improvement, Android is something more vague. So, basically, we we want to continue to help people find quality EO time, as opposed to screen time. So screen time basically means watch your Netflix, you scroll down, Instagram feed. Your time is underutilized. People need to have more quality your time. Listening to podcast, listening to good podcast, listening to good audio books, things like that. Right now, we focus on podcast, but maybe in the future, we will expand to other verticals.
[00:41:33] Unknown:
Well, for anybody who wants to get in touch with you and follow along with the work that you're doing, I'll have you add your preferred contact information to the show notes. And so with that, I'll move us into the picks. This week, I'm going to recommend the Wheel of Time series that Amazon recently launched. I will admit that I have not read the books. It's definitely seems to be quite the undertaking to have done that, but I've started watching the show, and they seem to have done a really good job, at least for somebody who isn't familiar with the books. I thoroughly enjoy it, so definitely recommend that to people who are looking for something to watch. And with that, I'll pass it to you, Wenden. What do you have for a pick this week? I will recommend Superhuman,
[00:42:09] Unknown:
the email client. It's not free. You need to pay $30 per month. But as a heavy email users, I think Superhuman saved me a ton of time, so it's worth paying them. Yeah. I reply emails a lot. I spend a lot of time with emails.
[00:42:26] Unknown:
Alright. Well, thank you very much for taking the time today to join me and share the work that you're doing at Listen Notes and for all of the time and effort you've put into helping make podcasts more discoverable and consumable by more people. Definitely appreciate all the time and energy you put in there, and I hope you enjoy the rest of your day. Thank you again for having me on. It's a great pleasure. Thank you for listening. Don't forget to check out our other show, the data engineering podcast at data engineering podcast.com for the latest on modern data management.
And visit the site of pythonpodcast.com to subscribe to the show, sign up for the mailing list, and read the show notes. And if you've learned something or tried out a project from the show, then tell us about it. Email host at podcastinit.com with your story. To help other people find the show, please leave a review on Itunes and tell your friends and coworkers.
Introduction and Guest Introduction
Wenbin Fang's Journey with Python
Overview of Listen Notes
User Experience and Custom RSS Feeds
Podcast Discovery and Search Engine Features
Technical Architecture of Listen Notes
Technology Choices and Scaling Challenges
Impact of Major Players in the Podcast Ecosystem
Use Cases and API Applications
Challenges and Opportunities in the Podcast Ecosystem
Interesting Insights and Best Practices in Podcasting
Lessons Learned from Building Listen Notes
Future Plans for Listen Notes