Summary
You spend a lot of time and energy on building a great application, but do you know how it’s actually being used? Using a product analytics tool lets you gain visibility into what your users find helpful so that you can prioritize feature development and optimize customer experience. In this episode PostHog CTO Tim Glaser shares his experience building an open source product analytics platform to make it easier and more accessible to understand your product. He shares the story of how and why PostHog was created, how to incorporate it into your projects, the benefits of providing it as open source, and how it is implemented. If you are tired of fighting with your user analytics tools, or unwilling to entrust your data to a third party, then have a listen and then test out PostHog for yourself.
Announcements
- Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
- When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $60 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show!
- You listen to this show because you love Python and want to keep your skills up to date, and machine learning is finding its way into every aspect of software engineering. Springboard has partnered with us to help you take the next step in your career by offering a scholarship to their Machine Learning Engineering career track program. In this online, project-based course every student is paired with a Machine Learning expert who provides unlimited 1:1 mentorship support throughout the program via video conferences. You’ll build up your portfolio of machine learning projects and gain hands-on experience in writing machine learning algorithms, deploying models into production, and managing the lifecycle of a deep learning prototype. Springboard offers a job guarantee, meaning that you don’t have to pay for the program until you get a job in the space. Podcast.__init__ is exclusively offering listeners 20 scholarships of $500 to eligible applicants. It only takes 10 minutes and there’s no obligation. Go to pythonpodcast.com/springboard and apply today! Make sure to use the code AISPRINGBOARD when you enroll.
- Your host as usual is Tobias Macey and today I’m interviewing Tim Glaser about PostHog, an open source platform for product analytics
Interview
- Introductions
- How did you get introduced to Python?
- Can you start by describing what PostHog is and what motivated you to build it?
- What are the goals of PostHog and who are the target audience?
- In the description of PostHog it mentions being a product focused analytics platform, as opposed to session based. What are the meaningful differences between the two?
- Customer analytics is a rather crowded market, with a large number of both commercial and open source offerings (e.g. Google Analytics, Heap, Matomo, Snowplow, etc.). How does PostHog fit in that landscape and what are the differentiating factors that would lead someone to select it over the alternativs?
- For anyone interested in using PostHog, do you offer a migration path from other platforms?
- necessary features for a customer analytics tool
- privacy and security issues around analytics
- How is PostHog implemented and how has its design evolved since you first began building it?
- reason for choosing Python
- benefits of Django
- thoughts on introducing Channels
- option to include it as a pluggable Django app
- integration points
- data lake integration
- challenges of providing understandable statistics and exposing options for detailed analysis
- Having data about how users are interacting with your site or application is interesting, but how does it help in determining the useful actions to drive success?
- business model and project governance
- What are the most complex, complicated, or misunderstood aspects of building a product analytics platform?
- What have you found to be the most interesting, unexpected, or challenging lessons that you have learned in the process of building PostHog?
- When is PostHog the wrong choice?
- What do you have planned for the future of PostHog?
Keep In Touch
Picks
- Tobias
- Tim
- Triumph Of The City by Edward Glaeser
Closing Announcements
- Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management.
- Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
- If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@podcastinit.com) with your story.
- To help other people find the show please leave a review on iTunes and tell your friends and co-workers
- Join the community in the new Zulip chat workspace at pythonpodcast.com/chat
Links
- PostHog
- MixPanel
- Amplitude
- Heap
- Snowplow
- Looker
- SnowflakeDB
- Tableau
- DOM == Document Object Model for web pages
- Django
- Django Rest Framework
- React.js
- Kea state management for React.js
- Redux
- TypeScript
- Django Stubs
- Django Channels
- Sentry
- Pluggable Django App
- PostgreSQL
- ELT
- Data Lake
- Optimizely
- Feature Flags
- PostHog Roadmap
- PostHog Employee Handbook
- Matomo (formerly Piwik)
The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA
When you're ready to launch your next app or want to try a project you hear about on the show, you'll need somewhere to deploy it. So take a look at our friends over at Linode. With the launch of their managed Kubernetes platform, it's easy to get started with the next generation of deployment and scaling powered by the battle tested Linode platform, including simple pricing, node balancers, 40 gigabit networking, dedicated CPU and GPU instances, s 3 compatible object storage, and worldwide data centers. Go to pythonpodcast.com/linode, that's l I n o d e today and get a $60 credit to try out a Kubernetes cluster of your own. And don't forget to thank them for their continued support of this show.
You listen to this show because you love Python and want to keep your skills up to date, and machine learning is finding its way into every aspect of software engineering. Springboard has partnered with us to help you take the next step in your career by offering a scholarship to their machine learning engineering career track program. In this online project based course, every student is paired with a machine learning expert who provides unlimited 1 to 1 mentorship support throughout the program via video conferences. You'll build up your portfolio of machine learning projects and gain hands on experience in writing machine learning algorithms, deploying models into production, and managing the life cycle of a deep learning prototype.
SpringBoard offers a job guarantee, meaning that you don't have to pay for the program until you get a job in the space. Podcast.init is exclusively offering listeners 20 scholarships of $500 to eligible applicants. It only takes 10 minutes, and there's no obligation. Go to pythonpodcast.com/springboard and apply today, and make sure to use the code AI springboard when you enroll. Your host as usual is Tobias Macy. And today, I'm interviewing Tim Glaser about Posthog, an open source platform for product analytics. So, Tim, can you start by introducing yourself? Yeah. Hey. Thanks, Servis. Thanks for having me. Yeah. So I'm Tim Glaser,
[00:02:07] Unknown:
CTO and cofounder of Posthog. As you said, we do open source product analytics. So think something like Mixpanel or Amplitude, but completely open source. You can host it yourself and have full control over the data. And do you remember how you first got introduced to Python? I actually don't remember the specific point, but I started programming when I was fairly young, sort of when I was, you know, 10 years old with with HTML. And then I had a couple of jobs throughout high school that were mostly PHP, and then I ended up working for Cloud 9 where we used Node. But I think sometime in between there, I started playing around with Python. But the first time I touched Python professionally was at my was that company called Arachnis, where I spent the last sort of 4 years before before starting postdoc. And so in terms of postdoc itself, can you give a bit of a description about what it is and what motivated you to build it? Sure. So like I said, it's it's really a a platform that that you can host yourself. And, you know, it's, yeah, it's it's a platform you can host yourself that allows you to figure out what your users are doing on your app, on your website, and it will help you build a better product basically, by having all that information. So the category of product analytics is fairly well established. Right? You have companies like Mixpanel, Amplitude, Heap. There's a bunch of of others that, you know, we can kinda dig into.
But the thing they all kind of lacked is and, you know, we used a bunch of them before, starting Posthog, specifically, you know, things like Mixpanel. The problem was those tools are all really built for product managers, product owners, those kinds of of people. And, yeah, I used to be 1 myself, so I can kind of relate. But the thing they don't do very well is is make it accessible to engineers. You know, if you if you go around companies that use those kind of platforms, the only engineer that tends to have access to those, to those platforms is the engineer that installed it. And then he probably hasn't looked at yeah. They probably haven't looked at it since. So that's why, we kinda started PostDoc. You know, we wanted to build something for engineers, something that they could use and they could look at. Because in the end, engineers tend to make a lot of product decisions de facto, but at the moment tend to do it without any data, without having access to it, etcetera. So that's kind of the the overarching philosophy that, you know, why we started it. We felt quite strongly that especially those like Mixpanel, you know, they charge you an arm and leg for, getting access to your data. It tends to come with like the highest price point plan. And to us that felt really silly because it's your data, you know, you've kind of generated it. You should be able to do with it what you want. You should be able to do analytics on it. You should be able to run queries against it. So that's why we wanted to open source it. It didn't feel right to us that you should send all this data off to third parties. And also with the introduction of GDPR and CCPA and just the overall increase in awareness of consumer privacy,
[00:04:56] Unknown:
the sort of miasma that has begun to surround those different types of products as well is even more motivation to have greater control over who's using your data and what they're using it for rather than handing it off to Google Analytics or whoever else for being able to improve their own products
[00:05:15] Unknown:
just because it happens to be free for you. Obviously, you know, GDPR, etcetera, those are our big motivations for it. As engineers, you're always super, like, aware of who's using your data and what. And it just felt right to us that it should be something that you can control rather than, you know, like you said, sending it off to to whatever. And and certainly with things like Mixpanel, you know, they really encourage you to send as much data as possible about your use your email addresses, that, you know, addresses, all that stuff. And that's all hosted on that service, and and you can't kinda do anything with it, from that point on. And in terms of Posthog itself, you mentioned that 1 of the goals is to be more engineer focused.
[00:05:53] Unknown:
And I'm wondering who the overall target audience is for Posthog itself and some of the challenges that come along with building an analytics platform with the developer in mind. Yeah. So the target audience is is definitely engineers
[00:06:07] Unknown:
over product manager owners at the moment. You know, we strongly believe that engineers, need access to this data as much as as as those kind of groups of people. So our target target audience really is is kind of any engineer in any organization, but it's at the moment, a lot of people that are installing it are kind of engineers at the smaller organizations. You know, they they have a handful of engineers that there's no formal product role yet. But the engineers want to know what's happening in that product and who's using it and how they're using it. That's why they're installing us.
[00:06:37] Unknown:
And in the description of post hoc, and as you mentioned at the beginning, it's described as being a product focused analytics platform, and you draw the contrast to session based analytics in the read me of the project. So I'm wondering what the meaningful differences are between the 2 in terms of the types of data that are collected and the use cases that they benefit from. Yeah. Absolutely. So something like Google Analytics,
[00:07:04] Unknown:
won't unless you're paying them a lot of money, won't give you statistics or or or kind of data on individual users, which, you know, that kinda makes sense if you're maybe running, you know, something like a media platform. For example, if you're running like an online news website, you don't actually care who the individual users are or what they've done. You kinda just care that, okay, this article got this many hits, etcetera. That doesn't work quite as well for products. Right? Because if you have like a b to b or even b to c kind of product, your users are gonna be very different and behave very differently based on who they are. You know, you're the customer that's paying you a $1, 000, 000 a year, they use the product very differently than the customer that's paying you $10 a year. So you kinda wanna be able to segment on those things. And that's where, you know, the original rave of product analytics came in. Right? Mixpanel, Amplitude, etcetera. And they absolutely nailed that kind of use case, and they've done really well out of it. We're not really innovating there, to be honest. You know, the thing the thing the place where we've innovated is open source. Right? We have taken something that has clearly proved to work very well. Obviously, both Amity and Mixpanel are are sizable companies, with a lot of really dedicated users. But what we've done is just made it work for engineers. And that's, you know, that's the real power of it. Like, I I I kinda joke about it with with, some of the some of the people in my team sometimes, but the the product development for post op wasn't that difficult because it was very well defined what we had to build because it kind of already existed. And, you know, there's a bunch of places where we thought we could do better, but but the real innovation was just being able to self host it. And, you know, we've seen a lot of use cases where it's really hard in especially big organizations to get any software kind of installed. You have to go through all sorts of processes, etcetera. But as soon as something is MIT licensed, it tends to be really easy. So, yeah, that's that's just kind of where we where we've come from. And so
[00:09:05] Unknown:
using open source as the differentiating factor has a lot of benefits in terms of gaining broad based adoption for people who are interested in this kind of thing and want to be able to just pull something off the shelf and experiment with it. But in terms of the actual capabilities of the product that would help to sell it to the people who are actually going to eventually write the check to pay for support or pay for managed hosting. What are some of the capabilities of Posthog that make it stand out in the such a crowded marketplace?
[00:09:37] Unknown:
Sure. So part of it, obviously, I guess that there's there's at the moment, if you're a large organization and you want something like product analytics, you probably have 2 choices. First number 1 is you go to something like Mixpanel or Amplitude, and you pay them, a small portion and, you get access to that their their products, but you don't get access to the underlying data. Sometimes you can pay them to to put it in a a data warehouse maybe, but, you know, the data isn't stored in your servers or whatever. The other side of it is using something like, you know, snowplow and snowflake, and then using something like Tableau or Looker on the other side to do data analytics. And this is what we've seen. You know, we've talked to a bunch of people who've gone that route. So they they started with Amazee Mixpanel. That became too expensive. They wanted to keep more control over the data. So they went to something like by the the setup I described. So snowflakes, snowplow, Looker. The issue with with that is that you end up having to hire, you know, a data science team basically that your you know, the rest of your organization needs to ping to build dashboards. And, you know, it's very hard to build even like a basic funnel, for example. So we think there's, like, a massive gap between the 2 where something with the ease and the usability of of Mix on amplitude, but that you can host yourself. You can have complete control over it as you would with the kind of snow you know, snowflake, snowplow,
[00:10:59] Unknown:
setup. And for people who already have an analytics platform in place, whether it's session based and they're using Google Analytics or Matomo or they're already using a product focused tool such as Heap or Mixpanel and they're interested in using Posthog, do you have any sort of migration path for being able to pull data from those systems to populate your platform with the information that has already been collected, or would it end up having to be just a clean-cut over where they start new with posthog and just use that going forward?
[00:11:31] Unknown:
Yeah. We we've mostly seen kind of clean-cut migrations. Part of that is, like I said, unless you're, this depends per provider, but unless you're paying a lot of them a lot of money, it's actually really hard to get the data out. A lot of the times they won't let you. So there's no way for us to import it anyway. And something like Google Analytics, again, because they're, like, session based, we can't you know, there's no way of kind of marrying it up, I guess. So a lot of times what we see is it is just a clear cut kind of transition. And, you know, it's it's something I would love to kind of do better, I guess, but it's it's just something that is has kind of been made hard. But the thing with these analytics is, like, you know, a lot of them have 90 day retention of data, for example. And the reason is data older than that tends not to be very relevant to what you're doing now because your product changes, your website changes. So, you know, even if you have product versus if you have postal running side by side with those tools, you know, after after a couple of weeks a month, you probably have enough data in Posthog or, you know, whatever the new service is that you can
[00:12:30] Unknown:
do the right level of analytics with it. And as I was going through the documentation, 1 of the things that stood out to me about the way that you're approaching the event collection and post hog is that you are taking a similar approach to Heap and that you're automatically collecting everything so that after the fact, you can then decide that any particular signal is useful and incorporated into some sort of dashboard versus having to spend the engineering time to say that you want to start collecting something new and then have to wait for it to come in. And I'm wondering what you have found to be the utility of that approach and any challenges that come along with it. The utility of it, of course, is it allows you to kind of what you said. Right? It allows you not to spend engineering resources on adding, you know, dot track, functions everywhere.
[00:13:17] Unknown:
And those functions, sometimes it's it's 1 engineer who's who's really enthusiastic, and she'll add it, you know, everywhere, in 1 go, and then it kinda languages and the rest of the team doesn't take it up. So, we've we've seen that that doesn't tend to be a super reliable way of of making sure your tracking is up to date. And, you know, it allows kind of less technical users to go in and and kind of define what they wanna look at. And it allows you to, you know, do backwards looking stuff. So even if you haven't installed the dot track functionality, you can still go in and, you know, work out what's happened. The downsides or the challenges that we've, you know, that we've seen with, with that is, you know, it it does it tends to be basically the the the health that is, you know, DOM. Right?
You still have to kind of work with the DOM, and and sometimes the way, you know for example, like the CSS in JS. Right? It's it's great, but it creates class names that are, you know, completely meaningless basically to anyone else. So that makes it quite challenging to write selectors that work for it. And then, you know, we have, like, a a click and point and click interface to kind of defining these events. But if your, class names are, like, dynamically generated, that's not gonna work. So that's that's the challenge. And, like, you definitely get more reliability if you do kind of dot track calls everywhere in your code, but you you lose some of the flexibility. So it's a you work out. And, you know, obviously, like, we also allow you to do dot track calls if you want. Yeah. It's it's kind of just a a free extra option. Now we've seen a lot of of use for it. And for somebody who is using Posthog,
[00:14:52] Unknown:
can you talk through just the overall workflow of the data collection and then being able to build useful dashboards out of it and any integration or enrichment of data sources from either additional platforms that you're using, whether it's SaaS or otherwise, or being able to collect information from things like mobile applications?
[00:15:12] Unknown:
Sure. So, you know, if you sign up for an account on, you know, you deploy PostDoc to, Heroku or AWS, whatever it is, we give you a super simple snippet. You put that in in your website, and you start collecting from the word go. That's the super basic kind of version, and that will allow you to do quite a bit of analytics, and we start collecting events straight off. We have libraries for most popular kind of libraries and languages. So we have, you know, React Native, iOS, Android, we have Python, Ruby, Go, etcetera, etcetera. So you can start collecting events from the back end as well. The thing that's most challenging, and then this is challenging with with all of the kind of product focused analytics, is that you need to marry up what happens, you know, what the users are doing in the front end with what you use to do in the back end. That's maybe you don't need analytics in the back end, in which case it's a lot easier, but you basically need to send something that uniquely identifies the user, so, you know, a user ID. So you can work so you can kind of, like, make sure that all the events that 1 user does does correctly get grouped under 1 user.
So that's kind of how you so there's there seems to be a little bit of challenging, man. We've we've made it as easy as possible, But it's a challenge whether you're using Amplitude or Mixpanel or or Posthog. Then the next step is, you know, in Posthog is you start you start creating some dashboards. So you've got all this data. We precreate some of the dashboards for you, but, you know, we have kind of like a a graph interface that is really powerful, allows you to do all sorts of analytics, you know, things like stickiness, like retention. It allows you to filter by kind of any property that, you know, you send or we send a bunch of properties automatically.
You know, so it allows you to create really powerful dashboards really, really quickly. And in terms of
[00:16:56] Unknown:
the collection of the data too, 1 of the
[00:17:07] Unknown:
in terms of enforcing some sort of default schema or default attributes for being able to then collect the information across multiple different sites or platforms and then join it across them? Yeah. So I guess the the the upside is, you know, we control all the client libraries as well. So it means that all of the attributes across the libraries are consistent. In terms of, marrying up the users, you know, we have kind of unified calls. Basically, the thing you do is you just send us a user ID, and that will mean that everything that 1 user does, if they're logged in on your app or whatever, gets attributed to that user. So there's a couple of ways we we, you know, we kinda make sure that that all of the data is consistent across, you know, whether you've got data coming in from your iOS app or your marketing website or, you know, your online app that it all gets that it all kind of works together. And that's another interesting thing too about this thought of product analytics
[00:18:01] Unknown:
is that for things like Google Analytics or Matomo, it's very much that the statistics are collected in this discrete entity of 1 website. And then there is the possibility of being able to view traffic across different properties, but it's not a first class concern. Whereas as I was looking through the documentation proposed hog, it seems that there is much more built in functionality for being able to say that the information that I'm collecting in this seeing different properties as a single overall experience versus treating everything as a discrete entity in its own right? You know, it's super important. I think,
[00:18:48] Unknown:
you know, talk to any marketeer and they love talking about and, in fact, my my girlfriend is a is a senior marketing manager. And, you know, she loves talking about things like first touch, last touch. It's so crucial to especially, you know, kind of marketing, but then that filters down into the the product. Right? You wanna know how people first found you, whether that you know, if you're doing paid ads, for example, you wanna know, okay, is my Facebook campaign working? But if if your KPI is someone goes to your website, you know, sees an ad, goes to your website, then downloads an app, then signs up on the app, you know, you wanna be able to to kind of track people across all of that ideally because then you can say, okay. The kinds of users that do for example, Slack knows that if you join, like, a couple of channels, it knows that you're basically hooked for life. And Netflix has something similar, where if you watch a couple of movies, you're gonna be hooked for life basically all for a long time. If that's your KPI, but that lives in your mobile app and you're spending a ton of money getting people, you know, from Facebook onto your website, you wanna be you you wanna make sure that the types of campaigns that bring people to website are the types of campaigns that then eventually lead people to do whatever that action is in your app. So, you know, marrying the 2 app is is a massive challenge. But if you get it right, especially in big organizations, it can be so powerful because it just allows you to yeah. If you're doing paid marketing, spend much more effectively, or if you're doing content marketing to focus on the things that really matter. In terms of Posthog itself, can you talk through how it's implemented and how the overall design and architecture of the system has evolved since you first began building it? Yeah. Sure. So I sort of consider myself to be, you know, a fairly average developer, and I like libraries and, you know, frameworks that are well tested, that are, you know, a little bit boring, but that just do the job. So, you know, I started with it's just Django and, like, a very basic React app. And that's basically still what it is today. Obviously, we've expanded things massively. So but, you know, in essence, it's basically a Django app. It's got a you know, we use a Django REST framework for all the, API endpoints.
It's a single page application in React, and we use something called Kia, for our state management. So we actually hired Marius who wrote, the Kia framework. It's basically just the layer on top of Redux that makes that takes away a lot of the boilerplate of of writing kind of Redux code. And it's really helped us structure our code base much better. So that was probably the single biggest, like, architecture change that we made, because before, and, you know, obviously, it's all open source, so you can see exactly, how I kind of stuff this up.
But, you know, it was we would just have, like, huge classes of React components that did a bunch of magic stuff. It would cool API's left right center. And, you know, we slowly started migrating to using Kia. So, basically, you know, having having some kind of state management in the app, and it's it's made the world a difference. So that's definitely been the the biggest challenge or the biggest change. And using Django, as you said, it's a boring technology. It's something that's been battle tested for a long time.
[00:21:57] Unknown:
And because of the fact that you're building something that is very heavy on the data ingestion and data analytics side of things. I'm wondering what you have found to be its overall capability in terms of performance and any issues that you're seeing in scaling to larger volumes of data and larger volumes of interactions for maybe large properties?
[00:22:17] Unknown:
Yeah. So, on the analytics side, it's kinda, you know, the database has most of the heavy lifting. The event insertion, we are definitely starting to run up against some issues. So, actually, something that we we have generally been working on this week is, I won't get into too much detail, but the the the basic problem is every time we insert an event, we have to work out if that event is part of an action. We call them actions. It's basically, you know, a wrapper around something like an auto captured event that makes it human readable. So if you have a a call to action on your website, that's like sign up. You can create an action that's called sign up and then use our point and click interface to, you know, work out what button is the is the sign up button. Now every time we do an event insert, we kind of check whether that event is part of an action.
And we create this, like, fairly horrible SQL query to work this out. The database handles it absolutely fine, but the Django ORM is just quite slow in generating it, and it's actually brought down some people that were experiencing, like, high volumes. And we had to, you know, quickly remove those actions to to make sure it works. So that's something we're looking at the moment. So I think possibly for the event ingestion, you know, we might move away from the Django ORM, or we might just cache the results of that query or, you know, whatever it is. But apart from that, you know, I absolutely love Django. I think it's 1 of the best things, you know, certainly in kind of like the Python community, and it just allows, you know, I could've we could've never built post hoc as quickly as we've done without Django. Since you have gotten
[00:23:48] Unknown:
decently far along in the product, I'm wondering what your thoughts are reflecting back on the initial decisions of using Python and Django for building this platform out, and any considerations
[00:23:59] Unknown:
that you might have done differently now that you are further along in the journey? I would absolutely use Python again, and I would absolutely use Django again. Some of the other mistakes definitely have been at the front end, as I said, around, you know, kind of state management, and I sort of wish we'd use TypeScript. I sort of wish we used kinda testing, but, you know, absolutely you typing in hyphen is is great. And, you know, the the Django stubs library is a little bit funky sometimes, but it's getting better. And that has saved us quite a few times, just having strong typing everywhere, kind of enforcing that across the the code base. I guess, doing differently, we and this is maybe a little bit niche, but we definitely have kind of large functions in our Django REST, like, API framework. So some of the some of the kind of view sets and serializers are really, really big. And I think the the sort of accept accepted wisdom these days is to have sort of a middle layer between models and, the API. And we we still kinda struggle to place things. So some some of the logic is in, like, models. Some of the logic is in the API, you know, the API layer. And we yeah. We're constantly not quite sure where any of it belongs. And I think if we'd had some kind of, like, structured middle layer, that would have been that would have been useful. And because of the fact that this is very IO heavy, I'm wondering if you've put any thought into using the channels
[00:25:29] Unknown:
use channels to maybe give you some extra performance while still being able to take advantage of all of the stability that Django offers. And and mostly for for kind of event insertion you talking about them? Yeah. Primarily for the critical path of getting data into and out of the system.
[00:25:44] Unknown:
Yeah. It's to be perfectly honest, it's something we we haven't considered yet. We, our JS library is fairly simple. We do do batching, basically, for example, of of, event insertions, and we do some clever stuff around that on that side. But, no, we haven't we haven't really looked at the permanent event insertion using channels to perfectly answer. Yeah. Something we should consider. And 1 of the interesting things that I've seen from looking at a lot of the different open source analytics offerings, it seems that most of them are implemented in PHP,
[00:26:14] Unknown:
whether because of the success of the LAMP stack or the success of WordPress and their desire to be able to play within that Centimeters ecosystem. And I'm wondering what your thoughts are on the trade offs of PHP as the implementation target for these analytics platforms versus the benefits that you're seeing of building it on top of Django. In the same way that, if you go to Sentry's GitHub,
[00:26:35] Unknown:
it says, it doesn't have much text in the, like, read me. But the 1 thing it says is the Sentry server is written in, Python, but you can use any library or any language to send events to Sentry. That's kind of how I think about it as well. Like, the fact that we're written in Django is kind of irrelevant to how people send us events because we've got you know, we have a PHP library, and we have a Ruby library. And they're all as good as the Python library. So, you know, we don't really care what you write, you know, what your system is written in because we will probably have a library for it. So, yeah, I think the the the question of what we're implemented in is almost, like, less interesting. We did have a an employee from an from a from a large Canadian bank that will remain unnamed reach out to us. And if you're like, we really love what PostDoc is doing, but we wish it was written in Java. And I, yeah, I mean, I I guess, basically, the reason they wanted that is is maybe they they wanted to expand it and hack on it. But apart from that, I think we you know, what libraries you offer is more important than what you write the software itself in. Yeah. It's just interesting to me that it's actually taken so long for there to be a decent
[00:27:45] Unknown:
option in Python for building a web analytics platform or an event analytics platform versus the plethora of solutions that have cropped up in PHP, probably just because of the fact that PHP has had such a dominance over the web for a while, at least until the frameworks like Django and Rails have come about. And on the subject, post hoc being implemented in Django, I'm wondering what you see as the viability of potentially offering it as a pluggable app for other Django projects to be able to embed within them versus running it entirely as its own hosted platform.
[00:28:20] Unknown:
Yeah. That's that's super interesting to me. We we we we've had quite a lot of people to talk to us about wanting to kind of offer post talk as a thing to their customers. So, yeah, you know, you can imagine if if you're building a website builder, for example, you wanna give your customer and you know, analytics information. Now how many pages did you get? You know, where did people come from, etcetera. So kinda using, using post doc for that. Yeah, specifically for the Django app, I think that is a good idea. I think we are gonna in in the same way that Sentry, for example, has, like, only a couple of options on on how quite specific options, how to deploy it. You know, we require, like, Celery. We require a Reddit server. We require, like, quite a bit of config sometimes to to make sure that we were up and running. We abstracted a lot of that away by yeah, having, like, an app dot JSON thing for for Heroku and having Docker images and, a Docker Compose file, having, like, Terra, you know, CloudFormation, all that stuff. So I wonder if, you know, how feasible it is to, like, expect people to set up all that config in their own Django app just so they can use it as a as a side project. But apart from that, like, you know, we there's been quite a few use cases of people kind of serving post op to their customers in some way, shape, or form. It's definitely something that, you know, we wanna encourage. And as far as the utility of posthog,
[00:29:40] Unknown:
some of the benefits that come about from having access to the underlying data and having the APIs available to it is the option for integration extensibility. So I'm wondering what you view and what is already existing in terms of the options for extension
[00:29:58] Unknown:
and pluggability or integration points for people who want to use posthog within the broader context of their systems? Yeah. The the thing that we see a lot of is people are just deploying it themselves, you know, on on their own systems. And then, obviously, we we all the events get stored in Postgres. What we see a lot of is people then using the data from Postgres to do something else. So really using Posthog as a event collection thing. They use all of our libraries on their websites and on, you know, the the iOS apps and and whatever to send events to the the Posthog instance. And then on the back end, they don't really use the Posthog analytics. They kind of, you know, use the database, and and do it that way. We do also have a couple of people using us via the API. So, basically, again, kinda same story. They they set post log up as a event ingestion thing, and then they use the APIs to, you know, use our because we we have put quite a lot of effort into have making APIs They're super powerful and that, obviously, we use in the front end to allow you to do all sorts of analytics. But then, you know, if you wanna use the API, obviously, go ahead. But, you know, that was kind of the point of of the exercise. And on the point of scale as well, a lot of people,
[00:31:08] Unknown:
as they get to a certain point of volumes of data, they start to look to data lakes or if they're more interested in sort of responsive analytics, the data warehouse approach. I'm wondering what your thoughts are on the utility of post hog within those contexts or the overall integration path for exposing post hog as a data source for those different systems, either via an ELT integration or things like that? I think the the the the first thing that I would say is it is amazing
[00:31:38] Unknown:
how far Postgres can go. For example, I I'm fairly sure Heap is using, is still using, Postgres. You know, there there's a bunch of of, companies that have, you know, a service with with a terabyte of memory, to run a massive kind of Postgres database. So I think Postgres can go much further than people give it credit for. And our main focus at the moment is just to support, you know, even super like, quite high volume customers with Postgres. However, we absolutely think that there will be a you know, we've had inbound interest from, you know, kind of b to c, you know, large, well known b to c companies, and they'll be doing volumes that, you know, even if you pull every trick in the book, you know, you might not get that with Postgres. And it might already have something set up. Right? So in those cases, we absolutely want to integrate with with those data warehouses, but we kind of see that as I think the way we're looking at that at the moment is we're we're just gonna kind of wait and see until someone approaches us with that question, and then we're gonna work with them to to implement that. It's not something we think we need to actively you know, for me at the moment, what I care about most is the individual developer being able to install it quickly, being able to get something up and running super quickly. It it's stable. It works really well. It does everything. I'm assuming Mixpanel will do it more. So that that's where most of our focus is right now. And then, you know, if someone comes in and wants to use a data warehouse with us, we'll we'll very happily work with them to make that happen. And
[00:33:07] Unknown:
the other aspect of building an analytics platform, and you mentioned that you have some prebuilt dashboards for people who are getting started with post hoc to get them up and running, is just the overall aspect of making sure that the presentation of the data and the presentation of how to query and investigate the data is accessible and understandable for the people using it. And I'm wondering what you have seen as being the challenges of providing an interface that is intuitive and still able to be expressive enough for power users who wanna be able to dig deep and do their own analysis.
[00:33:44] Unknown:
Yeah. This is probably something, you know, I think I underestimated when starting, Postoak. You look at some analytics tools. You're like, oh, you know, it's just a couple of options, but the real challenge is in the permutations of the options. Right? So, if you if you log into FaceTime, there's a couple of fields that you can change, and it actually looks deceptively simple, but it's really, really powerful. And, obviously, if you do the maths on the number of different settings you could use to create a graph, it's it's immense. So kind of testing that and making sure all of those permutations work is quite challenging. But what we found also is that and, again, you know, with some of the other tools, they have, like, different views for each, you know, variation of a of, you know, they have a view for retention. They have a view for aggregates by people, etcetera.
We've actually managed to get all of that in 1 view that still looks kind of deceptively simple. And I think the way we've done that is is really just take kind of take the essence of what what do you really, you know, what do you just really care about seeing and and kinda taking that very cost you know, user centric approach to to to building it rather than this is what our system is technically capable of. And once you have all the information collected and you're viewing the results and the graphs that show you the different trend analysis or the behavioral patterns of your users,
[00:35:04] Unknown:
what is the useful next step from that? And what are some of the ways that you can provide some useful, sort of constructive feedback in terms of this is the structure of the events, and then this is the actual benefit that you can gain by having this knowledge and some concrete actions that you can take to improve the growth and viability of your company or your project or your product?
[00:35:29] Unknown:
The certainly, if you if you just started using product analytics, it's surprising how non deep you have to go to find insights that will help you build a better product. So, you know, my favorite example is it's it tends to be really easy to create a funnel in any, you know, product analytics platform. And really just by looking at that funnel and seeing where people drop out, you know, double your conversion rate. So if you're if you've got a, you know, a web shop or a or a product that, you know, takes someone from the the home page all the way to to paying, to putting their credit card in, create that if you create that funnel, with all of the steps and then just see where people drop off the most and then do your best to improve that, you can get a huge amount of of you know, huge huge uptake in conversion rates just from that.
I guess the other the other thing that tends to be really insightful is a lot of people and especially kind of engineers don't know how users use their product. You know, do you really know? Like, okay. You have, you know, a sidebar with 7 menu items, but do you know which of those people are actually using? You know, what's the top 1 or 2, actions that people do most in your system. And it it will surprise you, you know, even obviously we're using post hoc to track post hoc. It always surprises us what people end up using the most. It's screens that we have given no thought to, no effort to. Yeah. We we slap together in in half a day and those tend to be the most popular screen. So being able to then kind of divert some focus, you would have otherwise spend on a screen that no 1 looks at. And you're now suddenly spending it on, you know, the second most popular screen. Those are huge And, you know, at the moment, the way our product works and the way all of the other kind of product analytics products work is you have to sit there and kind of create a dashboard or, you know, filter a graph down to get these insights.
And that's useful. And it's especially useful if you're someone who likes spending their time looking at dashboards and, like, playing around with it as I am. But a lot of engineers, I think, don't want to create a dashboard to work out what's going on in their product. So the next thing that we're working on is basically a toolbar that we, you know, sort of spare you the technical details, but inject into your website. So if you're logged into postdoc, it will show you a little icon that you can click or open the toolbar that will show you exactly what people are doing on your website. So it's sort of a heat map that actually gives you a lot more information than that. So when you're developing, when you're looking at your own app, your own product, you can see exactly where people drop off and where they get stuck, where they get confused. And, also, like, surprisingly, where people are clicking the most, and that might be something totally different from what you thought. So I think that's kind of the next step of this. You know? We focus with post hoc first on getting the dashboards and the graphs, etcetera, to kinda get to parity.
The next step for us is that toolbox to make it, you know, to bring the statistics and the analytics
[00:38:28] Unknown:
to where you're working. Yeah. As I was looking through the roadmap document that you have on your site, and I'll link to it in the show notes, it was interesting to see some of your vision of where you think you can go with post hoc now that you've got this foundational layer of product analytics. It's functional. You're able to collect events and view development cycle so that you can surface that information at the time that you're actually building the application and building the platform that you're building the analytics for and seeing how the analytics interacts. Whereas, a lot of times, the collection of events and the usage of analytics is a second concern and something that you implement because somebody in the marketing or sales department asked for it, not because it's something that you are actually engineering as a first class consideration into the platform. Platform. And so I'm wondering what your thoughts are in terms of the capabilities that you can unlock in the overall development of these systems and the ways that Posthog can be integrated into the overall life cycle of applications.
[00:39:36] Unknown:
Yeah. Yeah. And I think this is where it gets really So like you said, we've kind of built a foundational layer now. And and, you know, we kind of sort of 80% to parity with with the other tools. And I think it's the important 80%. Yeah. So the next steps for us are are 1 is that is that toolbar, and that's gonna be super crucial. You know, like you said, bringing the data to where you're developing right in the moment that you're developing, not as an afterthought. We think that's super crucial. We also think that's really just and this is kind of, you know, talking longer term, but we really think there's a place for a kind of in the same way GitLab, you know, went from being a GitHub clone to being an all encompassing sort of DevOps CICD life cycle behemoth. We think that's a similar place in our mythics. So, you know, at the moment, we're happily kind of ingesting all these events. We're we're gathering some super interesting data. The next obvious place is that we're gonna use that, like I said, the toolbar, But we're also thinking of things like AB testing, for example. So, you know, why do you have to buy something like Google Analytics and Optimizely and, like, LaunchDarkly for, you know, kind of, feature flags? Yeah. So why do you have to use something like Google Analytics, Optimizely, LaunchDarkly, you know, for feature flags, for AB testing. And we think there should be like a single platform to do all of those things. And we think post hoc is gonna be that. But that is that is, you know, the kind of future vision of it.
[00:40:59] Unknown:
And in terms of the project itself, it is an open source platform, but you've also built a business around it. So I'm wondering what your plans are in terms of the business model and your overall approach to project governance to ensure that the open source aspects continue to be useful and attractive to people who are finding it, but at the same time, sustainable
[00:41:21] Unknown:
and something that you're able to run with in the long term? We haven't been around for for very long, but at the moment, our core focus is our open source product. So, you know, we want to make sure that that foundational layer is there and that works great and it's really pleasant to use. We want those, you know, that toolbar that we talked about, that to be amazing and that to work really well. And we want, you know, an individual developer to be able at any size company, you know, from a from a 2 person start up all the way to the Ubers of the world to be able to pick up our tool and start using it straight away. That's where we're gonna start, and that's where our focus is gonna be. And, you know, we we we have investors who are who totally understand that, and that's gonna be our focus. In terms of monetization, we never want to charge individual developers. We don't wanna, you know, charge very small teams. We think the people that will be able to to pay for what we're building and who will get the most value out of it in that sense is the large enterprises. And that's why we're talking about, you know, kind of an all encompassing platform where we have things like AB tests. We have things like feature flags. And those those really only come into play when you're a larger organization.
You know, if you're in your sort of proverbial bedroom coding up your first site, AB testing is not gonna is not that relevant to you. So that's kind of how we're thinking of segmenting it. But the the core stuff that we're working on now, you know, that that kind of analytics capability, event ingestion, that tool belt, that's all gonna be, you know, free forever, and and that's gonna be very much open source. And as far as the landscape,
[00:42:54] Unknown:
and the exercise of building a product analytics platform, what are some of the most complex, complicated, or misunderstood aspects that you've encountered? The biggest 1 is probably
[00:43:05] Unknown:
the biggest prob the biggest one's probably usability more so than, you know, obviously, all of, you know, inserting millions of events a day is kind of an interesting technical challenge. In the end, it tends to be, you know, a little bit of optimization and a little bit of just buy bigger servers. The real challenge is building an analytics tool that everyone can use. And that's why we're thinking about these things like the toolbar, etcetera, because we want it to be really easy to unlock kind of insights from our product. And the way to do that, we think, you know, is not gonna be that's not gonna be easy. And, you know, I'm personally not a great UX designer, for example.
So it's it's a real, like, concentrated effort from everyone to make sure that we build, like, the best in class analytics platform that's wonderful to use. And as far as your experience
[00:43:56] Unknown:
of building Posthog, what have you found to be the most interesting or unexpected or challenging lessons that you've learned in the process? So before this, I'd never
[00:44:06] Unknown:
built like an open source project, you know, maybe a contribution here and there, but hadn't really contributed much to open source, hadn't worked much with open source. And, you know, open source is amazing. And I I know I'm I'm about 20 years late to to party, but especially if you're building something that's meant for developers. We basically tell every, you know, founders that we meet with, we tell them, like, try open source because the quality feedback is so much better than if you're, you know, doing kind of a SaaS application that's stuck behind a paywall. Because developers can just pick it up. And developers are really picky people. They will tear apart what you've done, and they'll give some really honest direct feedback. And it's great because that's the only way you can make your product better. And, you know, we get hundreds of bits of feedback versus, you know, if we had to go out and sell this like 1 on 1, we maybe have gotten, like, tens of bits of feedback, and it wouldn't be very honest because they would, like, be trying to, you know, drive us down on price or or whatever it is. So having something that's open source is a great way to build a product.
And an open source community is great. And, you know, we've kind of, you know, we're kind of building this company with that ethos as well. So our handbook is online in some way that, you know, GitLab is. Like, we're super transparent about a road map, about what we're thinking, about how we work. And we wanna just keep that up and make sure that, yeah, we really give back to the open source community. But that's been that's been the most amazing thing about building Posthog. And when is Posthog the wrong choice and somebody would be better served using the Google Analytics of the world or some of the other open source offerings that are maybe more limited in scope? So Google Analytics and and, special shout out to, Matare, maybe, who are obviously kind of the the open source equivalent of that. Those are those are great options if, you know, like I said, you you have kind of a website where you care more about, you know, things like sessions and and and clicks and, you know, how long are people spending on my site? And on average, you know, what is the most popular article on my website and where are my visitors coming from in the world?
Those kind of questions, they tend to be a lot better at asking it. You know, post hoc could be possibly overkill for those use cases. So, yeah, you know, there's a bunch of ways that those tools will be better. And are there any other aspects of your work on post hoc or the process of building out out a product analytics platform or its utility
[00:46:25] Unknown:
or just anything else about the topic at hand that we didn't discuss that you'd like to cover before we close out the show? I think we covered it. I, you know, I I do wanna give
[00:46:34] Unknown:
possibly another shout out to doing something like this open source. You know, even if you are planning on eventually making a paid and closed source product, but, you know, open source has just been
[00:46:44] Unknown:
a revelation, a revelation to us. So that's that's been the the kind of best thing to to come out of this. Alright. Well, for anybody who wants to get in touch with you and follow along with the work that you're doing or give it a try and contribute, I'll have you add your preferred contact information to the show notes. And with that, I'll move us into the picks. And this week, I'm going to choose the hitchhiker's guide to the galaxy. It's a great book. I've read it a couple of times and just recently started revisiting it with my family in an audiobook. So just always worth going back to and reading it for the first time or reading it again if it's been a little while. So definitely recommend that if you're looking for something to keep you entertained. And with that, I'll pass it to you, Tim. Do you have any picks this week? Yeah. So so 1 book I recently read is is Triumph of the City,
[00:47:26] Unknown:
by Edward Glaser. No relation. You spell his last your surname definitely as well. But it's it's, it's about 10 years old and it it talks about, you know, the way to a sustainable poverty free world is cities. And especially now with obviously, corona and, a lot of chatter on Twitter about people moving to Barnes in Kansas, or whatever. It's it's a really interesting read and reminding us, like, why people go to cities in the first place and why they're so there's such a great way of of being really environmentally friendly
[00:47:59] Unknown:
and, you know, a a great way of lifting people out of poverty. So that was a really good read. Alright. Well, thank you very much for taking the time today to join me and discuss the work that you're doing on posthog. It's definitely a very interesting product and 1 that I intend to start experimenting with and using for my own purposes and probably at my work as well. So thank you for all of your time and effort on that, and I hope you enjoy the rest of your day. Yeah. Thanks very much for having me, Sebas. Have a good day. Thank you for listening. Don't forget to check out our other show, the Data Engineering podcast at dataengineeringpodcast.com for the latest on modern data management.
And visit the site of pythonpodcastdot com to subscribe to the show, sign up for the mailing list, and read the show notes. And if you've learned something or tried out a project from the show, then tell us about it. Email host at podcastinit.com with your story. To help other people find the show, please leave a review on Itunes and tell your friends and coworkers.
Introduction to Tim Glaser and Posthog
Tim's Journey with Python and Early Career
Target Audience and Challenges for Posthog
Capabilities and Differentiation of Posthog
Data Collection and Dashboard Creation
Technical Implementation and Architecture
Future Vision and Integration in Development Lifecycle
Business Model and Open Source Governance
Complexities and Lessons Learned
Final Thoughts and Recommendations