HN2new | past | comments | ask | show | jobs | submitlogin

This looks very promising! I've thought of doing something similar and started (very slowly) on the project a month ago.

What are the data sources you're using for the transcripts behind the api?

As an idea for others who are better able to implement, I'd love a system which would search every podcast I've ever listened to. It's very difficult for me to remember which podcast I heard a specific story, maxim, or interview when I want to share it with a friend. Bonus points if a transcript has some timestamps lined up.



I have meta data of podcasts & episodes in my own database, including title, description, publisher, ... But I don't have transcripts.

Everyone says they want transcripts, but after some digging, I decide that transcript is not very useful for now --

For listeners, those who choose to listen hate reading texts.

For podcasters, it's expensive to produce transcripts and the seemingly SEO boost is not easy to justify -- 1, it takes time for SEO to work; 2, conversational contents (most podcasts) are not high quality when you see them in texts.

For Listen Notes (podcast search engine), indexing transcripts introduces more noise than signals.

I did some experiment around transcripts, e.g., https://www.listennotes.com/e/51222de65c2c484e8a47608eac1329... But I decided not to continue for now.

The search results from Listen Notes include uuid for episode & podcast. So the client side (e.g., podcast player) can keep track of listen history for a user.


If you're interested in transcript search. We do it as part of our video processing engine and we work with a some podcast creators to provide search api. Message me if you are interested.

We do the first option in what you bring up in your later comment


I've also looked into this problem. It would be very valuable to search podcasts if transcription were accurate. Lots of companies and services get mentioned in passing in podcasts but never get to find out about it.. I think quite a few companies would pay to be "alerted" about this, much in the way they do Twitter searches now. I'd certainly do it for my own name and business.


No idea if my idea is feasible or not, but I have 3 concepts for how it could work out:

1) it's hella expensive to make transcripts of podcasts. Allow users to contribute a set amount for podcast transcriptions they're interested in (e.g. 50 cents per episode).

2) standard subscription model. Give access tiered access to n podcasts for a set amount per month.

3) modified subscription model. Target 5 minutes of transcribed audio per user. Split audio files into small overlapping segments. People can either pay a subscription fee equal to 5 minutes of audio transcription or can transcribe 5 minutes of content per month.

Any thoughts on which would work best? The crowdsourcing of transcriptions would need stitching together and editing to make it flow, but it might be less obtrusive to people who don't want to pay.


To be honest, I'm hoping for 4) have a computer transcribe and index podcast contents to at least a 90% accuracy. It appears this is a big ask right now, however, which surprises me given how good things like Alexa are.


Full transcript sounds appealing, but is it really what people want? For all signals I've got so far, there's not strong use case for podcast transcript.

It may be more realistic to ask podcasters to provide good show notes, instead of full transcripts. All important things (wikipedia entries, guests background, places mentioned, ...) should be in show notes. Listeners may be interested to contribute to show notes, which is lightweight enough to produce.


My downfall is remembering a sentence or two but having no idea which of the approximately 20 podcasts I listen to the content came from. There are sometimes guests who are on multiple podcasts, overlapping topics, etc.

That's a good point I hadn't considered. Personally I don't consume show notes, but perhaps reading those would help me out overall.


fyi http://audiosear.ch/ just closed doors after years of struggling to do in-episode search. its an expensive problem to tackle and something people clearly don't want enough for Audiosearch to survive. its easy to ask and imagine for these but running these services is extraordinarily hard.


For a podcast search engine, you can take two approaches.

1. Depth first approach. Position yourself as a AI company and do in-audio search, but only index a small set of podcasts. It's like if Google just indexes 1000 web pages and try to refine keyword matching techniques -- too few data to improve search relevance & ranking.

2. Breadth first approach. Start something simple. Index just meta data for as many podcast & episodes as possible. It's not sexy, in terms of raising money or going to headline of TechCrunch. It's not AI (for now).

Given the limited resource I have now (i.e., one person, not VC funding, 2 months fulltime work), I have to take the breadth first approach. In the future, I still have choice to do in-audio search gradually. It's like playing strategy game; it's about build order :) Ideally you do everything all at once, e.g., AI, player app, community, ... But in reality, you get to take the very first step...


oh for sure man i understand. not criticizing you. i was just trying to respond to the guy above me.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: