Summary
In order for an organization to be data driven they need easy access to their data and a simple way of sharing it. Arik Fraimovich built Redash as a way to address that need by connecting to any data source and building attractive dashboards on top of them. In this episode he shares the origin story of the project, his experiences running a business based on open source, and the challenges of working with data effectively.
Announcements
- Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
- When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With 200 Gbit/s private networking, scalable shared block storage, node balancers, and a 40 Gbit/s public network, all controlled by a brand new API you’ve got everything you need to scale up. And for your tasks that need fast computation, such as training machine learning models, they just launched dedicated CPU instances. Go to pythonpodcast.com/linode to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show!
- You listen to this show to learn and stay up to date with the ways that Python is being used, including the latest in machine learning and data analysis. For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media, Corinium Global Intelligence, ODSC, and Data Council. Upcoming events include the Software Architecture Conference in NYC, Strata Data in San Jose, and PyCon US in Pittsburgh. Go to pythonpodcast.com/conferences to learn more about these and other events, and take advantage of our partner discounts to save money when you register today.
- Your host as usual is Tobias Macey and today I’m interviewing Arik Fraimovich about Redash, an open source business intelligence platform that helps you make sense of your data.
Interview
-
Introductions
-
How did you get introduced to Python?
-
Can you start by describing what Redash is and its origin story?
-
What are the primary ways that it is used?
-
The business intelligence market is quite mature and has many commercial and open source projects to choose from. What are the aspects of Redash that have allowed you to be successful?
-
What would you consider to be your closest competitors?
-
-
What was your background with data before starting on Redash?
- What are some of the most notable lessons that you have learned about business intelligence since starting the project?
- How has the landscape for business intelligence and data analysis changed since you began the project?
-
Beyond just accessing data, Redash focuses on enabling visualization of the results. What types of visualizations do you support and how do you support users in choosing the most effective ways to represent the information?
-
What are some of the common challenges that your users and customers encounter when communicating with data?
-
One of the critical aspects of enabling data access in an organization is the ability to collaborate on asking and answering questions. How do you approach that challenge in Redash?
-
How is Redash implemented and how has the overall design and architecture evolved since you first started working on it?
- How do you manage the complexity of supporting so many different data sources?
- If you were to start over today, what would you do differently?
-
Beyond the code of Redash, you also have a business around providing it as a hosted service. What are some of the most interesting, challenging, or unexpected lessons that you have learned in the process of building and growing that service?
-
How do you approach the direction and governance of the open source project and balance that against the wants and needs of the community?
-
What are some of the most interesting, innovative, or unexpected ways that you have seen Redash used?
-
When is Redash the wrong platform to use?
-
What do you have planned for the future of the Redash business and project?
Keep In Touch
Picks
- Tobias
- Arik
Closing Announcements
- Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management.
- Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
- If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@podcastinit.com) with your story.
- To help other people find the show please leave a review on iTunes and tell your friends and co-workers
- Join the community in the new Zulip chat workspace at pythonpodcast.com/chat
Links
- Redash
- Google App Engine
- EverythingMe
- RedShift
- Metabase
- Apache Superset
- Elasticsearch
- Tableau
- Looker
- PowerBI
- Data Warehouse
- Data Lake
- Athena
- Spark
- Redash Funnel Visualization
- Stephen Few
- Flask
- SQLAlchemy
- Redis
- PostgreSQL
- Celery
- RQ
- Tornado
- Django ORM
- AngularJS
- ReactJS
- NodeJS
- Redash Query Results Data Source
- IBM DB2
- Retool
- Forest Admin
- Grafana
The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA
Hello, and welcome to podcast dot in it, the podcast about Python and the people who make it great. When you're ready to launch your next app or want to try a project you hear about on the show, you'll need somewhere to deploy it. So take a look at our friends over at Linode. With 200 gigabit private networking, scalable shared block storage, node balancers, and a 40 gigabit public network, all controlled by a brand new API, you've got everything you need to scale up. And for your tasks that need fast computation, such as training machine learning models, they just launched dedicated CPU instances. And they also have a new object storage service to make storing data for your apps even easier.
Go to python podcast.com/linode, that's l I n o d e, today to get a $20 credit and launch a new server in under a minute. And don't forget to thank them for their continued support of this show. And you listen to this show to learn and stay up to date with the ways that Python is being used, including the latest in machine learning and data analysis. For even more opportunities to meet, listen, and learn from your peers, you don't want to miss out on this year's conference season. We have partnered with organizations such as O'Reilly Media, Corinium Global Intelligence, ODSC, and Data Council.
Upcoming events include the software architecture conference, the Strata data conference, and PyCon US. Go to python podcastdot com/conferences to learn more about these and other events, and take advantage of our partner discounts to save money when you register today.
[00:01:40] Unknown:
Your host is usual as Tobias Macy. And today, I'm interviewing Arik Franovich about Redash, an open source business intelligence platform that helps you make sense of your data. So, Arik, can you start by introducing yourself? Yeah. Hi. I'm Arik. I created Redash. I'm a software engineer that happened to become a CEO of a small SaaS company.
[00:01:59] Unknown:
I still try to code, so I guess I'm still a software engineer more than a CEO. And do you remember how you first got introduced to Python? Yes. So I've been thinking about that question, and I realized that was when Google App Engine was released. So I guess that's when I picked up Python. So that's 2, 008. Can you start by describing a bit about what the Redash product is and some of its origin story and why you started building it in the first place? Sure. So Redash is basically like, you can call it a BI tool but in simple terms, it's a web interface that you can use to connect to your databases or actually data sources because we support more than just databases like Google Spreadsheets, Jira, Salesforce or any JSON API. Then once you're connected, you can query those sources and visualize the results in different forms like charts, maps, whatever, or just a plain table is fine as well. Group that in a dashboard and share within your team or company organization.
So DoorDash was born actually as a hackathon project a bit over 6 years ago at the at the that the previous company I was working at, Everything Me. We we were just starting to use Redshift and we needed the tool to share the data from Redshift and we didn't find anything at the time that would work well with with Redshift. And we had a hackathon. So at that hackathon, I created the first iteration of Redash,
[00:03:21] Unknown:
and that's how it started. That's funny that it started as just sort of an accident and now it's become your main source of revenue.
[00:03:28] Unknown:
Yeah. Yeah. I mean, like, it's not like I have any BI background or anything. I basically stumbled into it, and I happen to really like the field. I really enjoy,
[00:03:39] Unknown:
like, the product. And I really, like, enjoy seeing things that are whether like, people use it for, like, extra stuff. It's not like a game. It's like driving their business. So it's really fun. And so in the context of the hackathon, I guess, how long did you have to work on it? And what did the end result look like by the time that you were done with the hackathon? And what was your decision point where you thought that, okay. This is something that I can actually make a business out of or something that I can keep hacking on? You know, what what was the story beyond just that point? And,
[00:04:10] Unknown:
how did it lead you to where you are now? The hackathon proved that, yeah, we we can build something that will be useful for us. And then I kept working on it as a sort of 20% project at the company. About, I think, 2 months after the hackathon, we open sourced it. And that's when we started seeing adoption from other companies. It's been slow at at at first, but then it it became like we saw more and more adoption. And around something like 2 years into the project, everything new was shut down. Like, it was a startup and, like, didn't find a business model and was shut down. And at this time, I saw enough adoption of FreeDash by real companies that use it for their daily stuff that I figured there is a good opportunity here. So I wanted to make sure that Redash has a sustainable future, and that's how I decided to start the company. So in terms of the product itself, what are some of the primary ways that it's used given that it you're able to use it for gaining access to all these different data sources and build dashboards and visualizations around them? Yeah. So I hope that all these things by Windows are not in the recording. So the main use cases are guess I guess are BI analytics. And by analytics, it's both like usage analytics of different product, but also operational analytics, like understanding the business, understanding how like, I don't know. Like, from, like, my personal experience, what we are using with it internally at Redash for, we like, analyzing our revenues. So we use Redash for that. We basically have our charges table, and we put Redash on top and build different dashboards that show our revenues and churn and stuff like that. Because it connects to so many data sources, the usage is very diverse.
Some people use I guess that the main usage is business users, but there are some people that using it for operations proper, just to analyze their infrastructure. Place.
[00:06:14] Unknown:
And in terms place. And in terms of the business intelligence market, there are a number of products that have been available for a while, both commercial and open source, and they've gone through their own evolutions over the years. But I'm curious how you view Redash in the context of the business intelligence market in particular, and what are the elements of the Redash product specifically that have allowed it to be successful given the maturity of the market and the number of competitors that there are? Yeah. The the the market is very like, there are so many solutions. But I think that what helped Redesh is a combination
[00:06:49] Unknown:
of 2 things. 1 is the fact that it's very easy to start with. I mean, you deploy Redash and you can either, like, use our SaaS and that takes a few minutes, or you can use 1 of the cloud images we provide and that also takes a few minutes to start. And once you have Redash ready, you can connect it to your database and you can have dashboards ready to share with your team in an hour or even less, sometimes more. It really depends on the kind of questions you are trying to answer. So the fact that it's so easy to use is 1, aspect that helps. The other thing is the the fact that we support so many data sources and especially that we were early on to support things like Redshift, BigQuery, and Amazon Athena, where they didn't have that many support from more traditional tools. That really helped with adoption. I think that especially BigQuery.
I'm not sure if it's still the case, but it's like the developer evangelist for BigQuery used to use Redash for his demos. So that's definitely helps with people getting to know the project. Yeah. In terms of the
[00:07:50] Unknown:
competitors that I view as being closest to Redash, there are things like Metabase and Apache Superset, which are good projects in their own right. But you're right in that Redash definitely has the edge in terms of the number of data sources that are supported, and that's actually a big reason of why I started using it in the 1st place, particularly for things like Elasticsearch that weren't supported at a number of other projects. So that's definitely 1 thing that continues to be the case, and I'm curious sort of what your approach was in terms of being able to enable connections to so many different data sources as far as how the plug in interface was defined or what's involved in adding data sources to Redash that has allowed you to maintain that velocity as you continually add new data sources with each release? I think it's based on the fact that to add a new data source in Redash, there is a very, very simple API you need to implement. You basically need to implement how to execute a query, and then
[00:08:44] Unknown:
you can add support for implementing how to get the schema of the database that you're querying, but that's optional. So it's basically 2 things that you need to implement. It's quite simple most of the time. Sometimes with things like Elasticsearch, it's actually not that simple because then we we work with a table concept of results. So you need to, like, mesh the dataset that comes back if it's nested and stuff like that. But usually, it's not not really that complex. And I think this model and the fact that we don't have our own query language. Whenever you use Redash, whatever you use it with, that's the query language you will use. I think that really helps us being able to add almost any data source
[00:09:26] Unknown:
very fast and very easy. So I guess that that's the thing. Yeah. The fact that you're using the native syntax is definitely beneficial as far as anybody who's familiar with that data source can pick up redash and start being effective without having to learn the specific peculiarities of whatever interface is being exposed in the other tools. And a lot of other older business intelligence suites will actually use more of a drag and drop editor for being able to define queries and answer questions. And so I think the fact that you're relying on just the native interface helps, particularly in this day and age where people are more likely to want to use the code interface
[00:10:05] Unknown:
than anything else, at least from a developer perspective. Yeah. Exactly. I mean, the trigger for me to, like, look into developing Redis was trying to use Tableau. And Tableau have this really rich interface and all drag and drop. But when we connected it with spreadsheet, it was so, so slow. I mean, the queries it was generating were very not performant. It was really bad experience. I mean, it looked all shiny and nice. But when you try to create something, it was a really terrible experience.
[00:10:34] Unknown:
So, yeah, just having a SQL box that I can type in my query was super helpful. And you mentioned that you didn't really have much of a background in business intelligence before you get started on this project. And so I'm curious, what are some of the most notable lessons that you've learned about that landscape and about the market since you first started the project and some of the ways that that learning has reflected in the Redash product? I'm not sure I learned that much about BI.
[00:11:02] Unknown:
I'm just trying to build the product based on our customers' feedback. But something that I did realize at some point is that all the big tools, promise self-service BI, basically the idea that the business users can ask questions themselves. But in practice, almost nobody really delivers it. Looker are quite close to that, but it requires a huge effort to set up. And if the organization doesn't invest that effort, they won't get self-service. And I, early on, decided that we are not going to promise self-service BI. We are not going to have a drag and drop interface.
We're just gonna let you use your database. It's probably gonna change at some point. Like, we like, I have a certain road map in in my head. And I think that at some point, we will be ready to introduce a concept of drag and drop. But I'm pretty sure we will do it differently from how it's usually done. Because usually, they they try to stitch a drag and drop interface to a really bad schema, and that doesn't allow you to ask the interesting questions. That's when people go back to writing SQL.
[00:12:12] Unknown:
So that's why we want to give a really great SQL experience and then add on top of that. And in terms of the overall market itself, what are some of the shifts that you have seen as somebody who's working within that market and the types of things that businesses are asking for from their business intelligence suite and the types of data that they're dealing with? I think that there are 3 things I've noticed. 1 is the fact that there is more data that organizations have. So
[00:12:41] Unknown:
and that makes tools that work with extracts like Tableau or Power BI a bit obsolete. Now we have really powerful data warehouses or even data lakes like Athena or Spark and stuff like that. And really you want the raw power of these tools and not something to to put on top of them. So that's 1 trend that really helped Redesh is the fact that people have these massive datasets that they want to analyze using their databases. The other 1 is the rise of open source. I don't think like, there there have been some open source solutions in the past, but I don't think that any of them was, like, really an open source as in, like, having a big community and stuff like that. And that's something quite new. Like, you mentioned Metabase and Superset. They're probably, like, the biggest examples aside from Redash. And I but none of us are still at the size of the commercial solutions. And that will change. That will change. I think that in the future, we will see the open source solutions becoming as massive, if not more, as the commercial solutions, And then interesting things will happen.
Another thing that I'm noticing is that more companies want to use data, like, to expose data to their customers. Like, basically, embedded analytics use cases.
[00:14:01] Unknown:
We see more and more people ask about that. And then beyond just the different data sources that are supported, the other core element of Redash is actually visualizing the results of the queries that are being executed. You know, you can just use the tabular results and be able to parse through that visually, but the primary ways that I've used it and that I've seen others use it is by actually creating these visualizations and combining them into a dashboard to try and tell an overall story of the data so that it's accessible to those business users who don't necessarily want to dive into the SQL queries or whatever the particular syntax is. And I'm curious how you support users in choosing effective visualizations for the answers that they're trying to provide and the context that they're trying to create for those different dashboards.
[00:14:47] Unknown:
So unfortunately, I can't say that we are doing too great job in, like, helping the user choose an effective way to to show their data. We have a few experiments, things like our funnel visualization, where we basically you provide us with funnel data and we show it in a meaningful way. Like, it's a bit beyond of, like, just letting you define the visualization. It has some presets of its own. But aside from that, like, most of our visualizations, we try just to give you the options you might need to create effective visualizations But we don't really help you with that. I recently picked up a few books on the topic actually by Steven Few and started like just exploring that more of like what it really means to create meaningful visualizations. Like, things that not just look good, but actually communicate the data better. And I hope that we would start providing better guidance in this area, but it will take time. And
[00:15:47] Unknown:
in terms of the challenges in communicating with those visualizations, what do you see as being some of the common stumbling blocks that your users and customers encounter when they are trying to build these dashboards and use Redash as a communication tool? 1 thing that is not unique to Redash is basically knowing
[00:16:05] Unknown:
what the question to ask and what data to look at. But on top of that, there is also some challenge sometimes of really knowing your data based schema and, like, different, like, issues that you might have in your data. Like, sometimes you might create a query that shows the data you want. But, apparently, if you dig into it, you realize that you have something that you need to clean up or exclude, like, I don't know, test users or whatever. Another challenge that is actually is a bit unique to Redis is the fact that you need to know the query language of your database. And sometimes it might seem that SQL is doesn't let you answer the question you want, but, actually, SQL is quite powerful. So there is usually a solution. It might be tricky, but I guess that's the kind of challenges you sometimes see. And then the other element
[00:16:55] Unknown:
of communicating with data in the context of redash is the ability for people who are familiar with the query language and also people who are using the output of that to be able to collaborate on asking and answering the different questions. And I'm curious what your approach is to that challenge in redash or what you have seen as being effective
[00:17:16] Unknown:
patterns for people who are leveraging redash for trying to fulfill those goals? Yeah. That that's a great 1. I mean, our, like, collaboration around data is something that's really been on my mind when I started with Redash. It's actually reflected in our logo. It's like a speech bubble and the sort of chart which basically is supposed to communicate collaboration and data. And obviously, we still have a way to go here but something that really helps is the fact that you have the query itself next to the data that you are looking at. So that allows others who look at what you're doing, like to to really understand how you got the results that you got. And then if they want to dig further like dig further. We are going to expand the collaboration features we have. It's probably gonna start with allowing collaboration on the same query.
And that's basically very easy to enable. It's like just change some logic. But I think that to make it effective, we need to make sure that we have good versioning of the queries or the other objects that you can edit in Redash and then commenting and stuff like that. But it takes time. There is a we have lots of things to do. And so digging deeper into the platform itself, can you describe how Redash itself is implemented and some of the overall design and architecture changes that have happened since you first began working on it? Yeah. Sure. So basically, something that I try to follow is to use boring technologies. And that basically both to make it easy for people to maintain, like those people who deploy Redash for their usage, to make it easy for them, to maintain it and to support it. So we try to use things that they probably will be familiar with them. And the other side of that is that if people want to contribute to our code base, they will probably find technologies and patterns they are very familiar with. We basically use Python for the back end which uses Flask, SQLAlchemy, and then Redis and Postgres to support that. And we were also using Celery but we are actually switching to RQ now. But it's actually practically, it's not really an architecture change. It's just swapping a library. So our architecture was quite stable for the past, I think, 3 years at least. At the beginning, I experimented with various stuff like I used Think Tornado and actually Django's ORAM at some point. But in the past 3 or so years, there were no big changes in terms of, like, what the kind of tools we use. Another big part of the code base is our front end. And there we started with AngularJS 6 years ago.
And about a year ago, we started the transition from Angular to React. We were trying to decide between React and Vue. Js, but we picked React mainly because there was a really nice way to keep working, of having, like, a dual of hybrid codebase where we have both Angular JS code and React code side by side. And that's what we've been releasing for the past year. Basically, with every release,
[00:20:23] Unknown:
there were more code that was in React. And now we are really, really close to the finish line. Like, practically, all is left is to switch the router, and we are Angular free. So that's that's nice. And because of the fact that Redash was born out of a hackathon, it seems somewhat obvious that Python would be the language that you chose for implementing it as a way of just being able to get something done quickly. But I'm curious if you were to start over today, if there were any design elements or foundational pieces of the code that you would do differently, either choosing different languages or different frameworks or just overall different system architecture?
[00:21:01] Unknown:
That's an start putting semicolons in your Python code when you switch too much. And start putting semicolons in your Python code when you switch too much. And Node has the benefit of being asynchronous which is really helpful when what you're doing most of the time is IO. Now obviously Python has some async support today but it's very different when you have some libraries that support asynchronous code and some not where in Node it's all asynchronous. But I really like Python, so I'm not sure I would really do something different in that sense today. Yeah. I think that
[00:21:39] Unknown:
redash has also benefited from being in Python because of the fact that there are so many different libraries to support the various data sources that you're working with. And that I think you'd be hard pressed to find that same level of support in other ecosystems, though Node might be a close competitor in that regard.
[00:21:54] Unknown:
Yeah. That that that's for sure. Although like, the the the like, we have, I think, a bit over 40 types of data sources supported today. But I think something like 10 of them are mostly used, and the rest is, like, a long tail. So I'm not sure like, if that was the only reason, I'm not sure it's a big reason to choose Python over Node. You can always, like, start another process and just delegate to it, and it can be written in any language. But I don't know. Like, I find, like, the libraries and the tools nicer on the Python side. And then in terms of the
[00:22:30] Unknown:
system design of how it's implemented and how the queries are executed, I'm curious what you have seen as far as challenges, particularly when it comes to people trying to execute queries that are returning large volumes of data and being able to represent that back on the front end or being able to handle the data in the query execution?
[00:22:49] Unknown:
Yeah. So that's something that we don't really handle really well. And that's almost like, it's not intentional. Like, in in hindsight, I might implement it differently. And then it would be ease like, it would be probably easier to implement it from the get go differently than now trying to redo. But we definitely don't support large datasets well. But considering the use cases Redo is used for, it's not a big issue because most of the time, the data you're going to visualize is not going to be big because there is no, like, there is no point in having lots of data points when you create a visualization. Like, people don't see in this fidelity. And if a person is going to look at the results, they're also not going to review lots of results. People do want larger datasets usually when they're trying to connect Redash with some other system, like use Redash as an API. Or when they want to download the dataset and crunch it in Excel. So yeah, we don't really support large datasets result sets really well. Like, you you can basically just give Redash more memory, and then it will be fine. But the main issue is the fact that we load everything into memory, then convert it to JSON,
[00:23:59] Unknown:
and only then dump it into our local cache. I mean, that that's not super great. Yeah. But as you said, the use case the redash is designed for isn't really 1 where you want to be processing large volumes of data because you want your queries to be structured in a way that they're actually going to condense the information down into something that's digestible by somebody who's trying to gain some insight from that information rather than just say, here is all of the data for this query of, you know, 10, 000, 000 rows.
[00:24:29] Unknown:
Yeah. Although 1 thing that happened over time is for a long time, my message was use your database to crunch your data and use Redash to visualize it. And basically, we we don't store your data. We don't, like, we we just visualize it. But what happened is that I'm I'm not sure when exactly it was, but a few years back, we introduced the query results feature. Basically, the ability to run queries on top of other query results. And the idea here was to, like, allow different use cases of where you might want to join data between different data sources, or sometimes it's a bit easier to run some query, like another computation on top of existing query result or whatever. But people are very creative and people tend to abuse the tools you give them. But, obviously, you need to be mindful of that. And, like, it's good when people abuse your product. It means that it brings them value, and you just need to, like, look at what they do and try to, like, give them a better solution. And, basically, once we gave this feature, people started using Redux sometimes as a form of database.
And then they want to have ability to, like, load larger result sets into Redash. And that's something that I've been looking into recently and, like, trying to figure out
[00:25:45] Unknown:
what we can do better there. Yeah. It's definitely interesting the ways that people will work around the sharp edges of a tool and make it do things that it was never actually intended to do just because it's the tool that they have rather than seeking outside and looking for the tool that's more well suited to the particular problem that they have because it's only 5% of their use case, and the other 95% is filled by the tool that they have. Yep. Beyond just the open source code base of Redash, there's the business that you've built around it. And so I'm curious what you have seen as the sort of benefits of having a hosted solution in terms of the adoption of the product, and how you balance the needs of the business against the desires and needs of the open source community that are using and contributing to Redash?
[00:26:33] Unknown:
Yeah. So having a hosted solution really helps because when you have an open source product, it's really hard to know what people use it for and in what ways. So having a hosted solution really gives us a way to look into, like, how people use it, what kind of visualizations they use, what kind of data sources are they use. And obviously, it's not, a perfect representation of how the general population uses Redash, but it definitely gives you some idea of what's more common and what's less. You need to be mindful because, like, for example, we support IBM DB 2 or whatever and, like, that's less common to be used in a cloud, environment, I guess. So you won't see that on the SaaS, but there might be people who use it with the open source version. But it's still like, you get so much visibility on how people might use the product. And whenever we, like, whenever we make a release, it's always after significant time where we had that code base running on Sass and we stumbled at like different stupid bugs and mistakes. So it helped us make more stable releases because, like, it's it's harder for people to upgrade often. So we try to make sure that when we make a release,
[00:27:47] Unknown:
it's worth the worth of their time to to upgrade. And that's an interesting point too is because you have this hosted platform, my guess is that you're deploying the current state of the master branch. And I'm wondering what your decision points are as far as when to say that this particular point of the code is a major release that 1 of the open source users is going to deploy as an artifact versus people who want to just deploy straight for master themselves?
[00:28:16] Unknown:
So it's usually a combination of, okay, enough time passed since the previous release, and there is enough interesting stuff in this release for people to upgrade. I was hoping to have regular releases every month, but it just so happens that it's really it's really hard for us to to maintain that schedule. So it's usually, I think, a release every 3, 4 months. So basically, when we feel okay, we have enough stuff there that it's worth upgrading and it feels that it's stable enough, like we had this conversion running for quite some time, nothing major came up, We make a beta release. And then after the beta release, basically, that helps with whoever, like all the early adopters who might deploy on prem and then find out issues that we don't experience on the SaaS version, we then make another round of fixes,
[00:29:07] Unknown:
and we make, the final release. The other thing that I'm interested on the business side is the overall business model that you have and some of the ways that it has grown or evolved since you first started the company and just your overall lessons learned
[00:29:23] Unknown:
in terms of managing the business behind the product? Yeah. So when I started, I researched into, like, how people basically what are the business models that people use for open source projects? And what I learned is basically people are doing everything, and the bigger companies definitely do, like, all the stuff, like support, SaaS, different versions, and all this stuff. But I took inspiration from what Sentry been doing, which is basically a SaaS offering of the same code base you have on on the open source side, which I really like because it's, like, very simple. There is no conflict of interest. And I figured, yeah, let's do that. Now because I'm bootstrapping and I really, like, really quickly burned all my savings on that experience, and SaaS takes time to to ramp up. I I was hustling for, like, any stream of income at the beginning. So we do have a few companies that pay us for support. Like, the that's the bigger users that reached out and really wanted someone to be able to answer their questions when they need to. And at the beginning, it felt that, Wow, like SaaS is such a bad, such a terrible business model. And support is so much better because we were making so much more money from support and that's like from 4 customers versus the SaaS platform. But lucky enough, I was patient to wait. And today, SaaS makes most of our revenues, like, something like over 19% of our revenues is coming from SaaS. And I definitely see the benefits. Like, it's a very stable, in a way, business model, like, especially when when you deal with lots of smaller customers versus the big ones. So that was nice. Every year I've been telling myself, yeah. This year we're gonna introduce some offering for for the enterprise users. Because, basically, all the big companies that use Redash, they use the open source version. And they use that not because they want to save money or anything. They just use the open source version because they're not going to trust some SaaS vendor with their database. So it made sense to offer them something they can pay for, which isn't support. Because support is not, like I I want to be a software company and I want to sell software. I don't want to sell support.
I want to make my product easy to use, like, that people don't really need support. So every year I've been telling, okay. This year, we're gonna introduce something for the bigger customers that deploy on prem. And every year, it was pushed back because we were so busy with, like, building the product itself, working on the SaaS stuff. And over time, I think that what happened is that the world changed a bit. Like more and more companies are more comfortable with SaaS offering. Now obviously, I don't know. Like, Bank of America will not adopt a SaaS offering anytime soon. But that's fine. I don't really need to serve all the customers in the world.
And more and more companies are definitely willing to use a SaaS offering, even for things like their database access. So I think, I'm not sure if we will ever have some kind of an enterprise offerings. But on the other hand, you never know. It's definitely good that you held out with the SaaS approach because as you as you said, you can scale it much more readily, and you're much
[00:32:45] Unknown:
less susceptible to customer churn if somebody drops off versus if you have a smaller number of support contracts where you're gaining more revenue per customer, but if 1 of them then decides to go with a different solution or they go out of business or whatever the reason is that they no longer maintain that support contract, it's a much bigger hit to you, and then you have to scramble to try and find somebody to replace them. And I would imagine too that by having that direct support contract, it's a much bigger burden of time on your end versus somebody who comes and signs up for the SaaS platform and then they just use the aggregate support network that you have built around that product. So to be honest, and I hope that none of our support customers is listening, the support contract's been great.
[00:33:29] Unknown:
I mean, they don't reach out that much and usually their questions are very reasonable, but I don't think that scales. Like if we were if we try to scale that, eventually, we would need to scale people instead of servers to handle the load. Well, sometimes people would like support just for their peace of mind of knowing, hey, if we ever have a question, someone can answer us. But it's sometimes needed because the product isn't easy enough and I don't want to be there. Like, I want to make it super easy. Like, if today you want to deploy Redash, you go to our website, click get sync started, go to our setup page. There are links to, like, the popular clouds like AWS, Google, DigitalOcean.
We should probably add support for Azure. And few minutes later, you have Redis running. That's something that we might not want to have if we're like building our business around support and people having to reach out to us and stuff like that. I don't want to be in to have this conflict. And yeah. That that that's definitely a good point is
[00:34:33] Unknown:
if your business is built around support, then then you end up making it harder to actually use the open source product, which is never gonna benefit anybody because it will just create a conflict between you and your users, whereas you want it to be as friction free to help adoption so that if somebody comes in on the open source channel and then decides that they don't wanna actually be in the business of running their own server, they can just easily switch over to the SaaS platform. So, yeah, I definitely appreciate your clarity on that point. Yep. In terms of the uses of the platform, you mentioned that you've seen some people abusing it for various cases. I'm curious what you have seen as some of the most interesting or innovative or or unexpected ways that people are leveraging Redash.
[00:35:15] Unknown:
So people are a bit shy on sharing how they use Redash. So I don't really have a good visibility on that, but I do hear stories from time to time. I think the 1 that I'm most proud about is an organization that does cancer Redash and uses redesh to support their efforts. That's awesome. I really hope they are successful in whatever they do. And I guess the most unexpected usage is the French Navy. Like, the the they're, the people that do sea rescues, they use Redash to, like, analyze their efforts, and that's really unexpected.
[00:35:49] Unknown:
And then when is Redash the wrong platform to use and somebody would be better suited by going with a different solution?
[00:35:55] Unknown:
So right now, the first thing is when you don't have anyone in the organization who knows SQL or whatever the query language your database uses. It doesn't have to be SQL. Most of the time it is. So if if there is no such person in your organization, then, yeah, Redesh is not a good fit. It doesn't mean that everybody needs to know SQL to benefit from Redesh, but there has to be at least a single person. Another case is when you want to support self-service, and then you might want to choose Looker and invest in, like, defining your data models and stuff like that. But you need to be really honest with yourself and make sure that you really need that full self-service BI thing. Because many times, there are still people who create the reports. And in that case, you could just go with Redash. I guess that in other cases, when people trying to use Redash as some sort of an admin tool. And that in that case, there would be like, we do that actually ourselves, because it's very easy for us. It's already connected to the database so we can create different views when like, for our support use cases.
But eventually, if you really need an admin tool, you will be better served with a full like CRUD tool like retool or forest admin and stuff like that. Also people sometimes use Redash for, operations like infrastructure stuff. Sometimes Redash can be a good solution there. Again, we use Redash for that as well but we are a bit biased. But I guess that for these use cases, Grafana would probably be a better solution, especially if you connect with some time series database that they support.
[00:37:34] Unknown:
And for the future of Redash, what do you have planned both in terms of the business and the open source project?
[00:37:41] Unknown:
So I I try not to make commitments because life is surprising. Right now, we are focused on finishing the 2 big efforts of, like, migrating, to Python 3 and our queue on the back end. So that's, like, takes our focus to make sure that we deliver a stable version on that end. And then on the front end side to finish with the React migration. Once that's done, we will be finally free to get to really dig into developing some stuff that's been waiting for a long time. We actually did deliver some new features this year, but mostly we've been focused on the React migration.
I guess that when when we, like, come back from that effort or just have to review, like, the the kind of feedback we have and try to assess what's really the next thing, There are some interesting stuff that we really want to really want to experiment with and to deliver, but they're really trying not to make commitments because, you can see that in our GitHub tracker where, like, we have this make email reports from 6 years ago and people like, hey. When's that gonna be available? So I kinda learned
[00:38:47] Unknown:
to let's commit to stuff that we're actually working on. Alright. Well, are there any other aspects of the Redash product or the business you've built around it or the overall business intelligence market that we didn't discuss yet that you'd like to cover before we close out the show? Oh, wow. That's a that's a big 1. No. I don't know. Like, we covered lots of stuff. If you have any questions, I'm happy to keep discussing stuff. Otherwise, I'm good. Well, for anybody who wants to get in touch with you and follow along with the work that you're doing, I'll have you add your preferred contact information to the show notes. And so with that, I'll move us into the picks. And because we've been talking about a lot of things having to do with business intelligence and data and managing it, I'm going to pick my other show, the data engineering podcast. So you can listen to interviews about a number of the different projects and tools and topics that we've been talking about here in a little bit greater depth. So I'll, plug that again. So with that, I'll move pass it to you, Eric. Do you have any picks this week? Yeah. So 1 pick is PeeWee. It's like p double e w w e. It's a Python ORM that we were using before we switched to SQLAlchemy.
[00:39:52] Unknown:
And it's probably 1 of the decisions that I regret the most. I wish we stayed with Pee wee. I think it's like the most Pythonic ORM there is. It's really great engineering, super easy to use and I miss it. Another 1 is Amazon ECS. Everybody seem to use either serverless or Kubernetes these days, but ECS is sometimes overlooked, but it really matured a lot in the past years. And it makes it very easy to have a very resilient infrastructure.
[00:40:19] Unknown:
And it really helps us sleep better at night. Alright. Well, thank you very much for taking the time today to join me and discuss your experience of building and managing the Redash project and the business that you've built on top of it. It's definitely a useful tool that I have been using for a while. So I appreciate all of your efforts on that front, and I hope you enjoy the rest of your day. Sure. You too. Have a great day, and thank you for having me. Thank you for listening. Don't forget to check out our other show, the Data Engineering Podcast at at dataengineeringpodcast.com for the latest on modern data management.
And visit the site of pythonpodcast.com to subscribe to the show, sign up for the mailing list, and read the show notes. And if you've learned something or tried out a project from the show, then tell us about it. Email host atpodcastinit.com with your story. To help other people find the show, please leave a review on Itunes and tell your friends and coworkers.
Introduction and Sponsor Message
Interview with Arik Franovich
Origin and Development of Redash
Primary Uses and Market Position of Redash
Challenges and Lessons Learned in Business Intelligence
Collaboration and Data Visualization
Business Model and Open Source Community
Interesting Use Cases and Future Plans
Closing Remarks and Picks