Summary
Notebooks have been a useful tool for analytics, exploratory programming, and shareable data science for years, and their popularity is continuing to grow. Despite their widespread use, there are still a number of challenges that inhibit collaboration and use by non-technical stakeholders. Barry McCardel and his team at Hex have built a platform to make collaboration on Jupyter notebooks a first class experience, as well as allowing notebooks to be parameterized and exposing the logic through interactive web applications. In this episode Barry shares his perspective on the state of the notebook ecosystem, why it is such as powerful tool for computing and analytics, and how he has built a successful business around improving the end to end experience of working with notebooks. This was a great conversation about an important piece of the toolkit for every analyst and data scientist.
Announcements
- Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
- When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show!
- Do you want to get better at Python? Now is an excellent time to take an online course. Whether you’re just learning Python or you’re looking for deep dives on topics like APIs, memory mangement, async and await, and more, our friends at Talk Python Training have a top-notch course for you. If you’re just getting started, be sure to check out the Python for Absolute Beginners course. It’s like the first year of computer science that you never took compressed into 10 fun hours of Python coding and problem solving. Go to pythonpodcast.com/talkpython today and get 10% off the course that will help you find your next level. That’s pythonpodcast.com/talkpython, and don’t forget to thank them for supporting the show.
- Python has become the default language for working with data, whether as a data scientist, data engineer, data analyst, or machine learning engineer. Springboard has launched their School of Data to help you get a career in the field through a comprehensive set of programs that are 100% online and tailored to fit your busy schedule. With a network of expert mentors who are available to coach you during weekly 1:1 video calls, a tuition-back guarantee that means you don’t pay until you get a job, resume preparation, and interview assistance there’s no reason to wait. Springboard is offering up to 20 scholarships of $500 towards the tuition cost, exclusively to listeners of this show. Go to pythonpodcast.com/springboard today to learn more and give your career a boost to the next level.
- Your host as usual is Tobias Macey and today I’m interviewing Barry McCardel about Hex, a managed platform to turn your notebooks into collaborative, interactive data apps and stories
Interview
- Introductions
- How did you get introduced to Python?
- Can you start by describing what you have built at Hex and your motivation for starting the business?
- Who are the primary users of the Hex platform?
- How has that focus influenced your product direction and the features that you prioritize?
- What are the biggest roadblocks that you see data analysts and data consumers running into?
- How have those roadblocks shifted in recent years?
- What is it about the concept of a notebook that has caused them to see such a massive rise in usage and popularity?
- What are the barriers to productivity and accessibility that still exist in the notebook ecosystem?
- What are the pieces for working in and with notebooks that are still missing?
- What does Hex add to the experience of working with notebooks?
- Can you describe how the Hex platform implemented?
- How has the design of the platform changed or evolved since you first began working on it?
- Where does Hex sit in the lifecycle of notebook creation and usage?
- How does it compare to other services built to support users of notebooks such as Zepl, Saturn Cloud, Noteable, etc.?
- You focus on the Jupyter platform, but there are a number of other notebook frameworks that have sprung up in recent years. What do you see as being the relative strengths of the available options?
- What are the trends in the tooling, capabilities, and use cases for notebooks that you are keeping an eye on?
- What are the most interesting, innovative, or unexpected ways that you have seen the Hex platform used?
- What are the most interesting, unexpected, or challenging lessons that you have learned while building Hex?
- When is Hex the wrong choice?
- What do you have planned for the future of the Hex business and product?
Keep In Touch
- @TheRealBarryM on Twitter
Picks
- Tobias
- Barry
Closing Announcements
- Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management.
- Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
- If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@podcastinit.com) with your story.
- To help other people find the show please leave a review on iTunes and tell your friends and co-workers
- Join the community in the new Zulip chat workspace at pythonpodcast.com/chat
Links
- Hex
- Palantir
- IPython
- Jupyter
- Mathematica
- IDE == Integrated Development Environment
- nbconvert
- Observable Javascript Notebooks
- React
- BlueprintJS
- Papermill
- Streamlit
- Shiny
- Redshift
- Snowflake
- BigQuery
- PostgreSQL
- Noteable
- Saturn Cloud
- Zepl
- Zeplin Notebooks
- JupyterHub
- Binder
- Kubeflow
The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA
Hello, and welcome to Podcast Dot in It, the podcast about Python and the people who make it great. When you're ready to launch your next app or want to try a project you hear about on the show, you'll need somewhere to deploy it. So take a look at our friends over at Linode. With the launch of their managed Kubernetes platform, it's easy to get started with the next generation of deployment and scaling powered by the battle tested Linode platform, Go to python podcast.com/linode. That's l I Go to python podcast.com/linode, that's l I n o d e, today and get a $100 credit to try out a Kubernetes cluster of your own. And don't forget to thank them for their continued support of this show. Your host as usual is Tobias Macy. And today, I'm interviewing Barry McCardell about HEX, a managed platform to turn your notebooks into collaborative, interactive data apps and stories. So, Barry, can you start by introducing yourself?
[00:01:08] Unknown:
Thanks for having me on. I'm CEO of HEX. Along with my cofounders, I started the company just over a year ago. I've been working hard on it ever since.
[00:01:16] Unknown:
And do you remember how you first got introduced to Python?
[00:01:19] Unknown:
I first started using Python in undergrad. I was doing some network science sort of research projects and got introduced to Python and a little bit of R through that through some of the postdocs in the lab I was working in. And really kind of fell in love with it. Thought it was a lot of fun. It was it was a really cool exposure to a different way to use code than I had in the past, which was mostly just, you know, hacking up websites and HTML. So I graduated, went into consulting. And as you do as a consultant, you you kind of became an Excel jockey. I was doing all my work in spreadsheets, building these really complex, horrible models with lots of nested functions and VBA macros. And, you know, it was impressive work in its own way, but not really the I was kinda pushing Excel to the limits. And this guy in my project said, well, we should, you know, fire up Python and do this there instead. And I was going, oh, wow. Yeah. That That was really cool. And we kind of went off, kinda went rogue a little bit and and wrote this little Python script to help with some of the work, but then, you know, there was no real way to share and make that useful. And that that was pretty frustrating, but it was kind of reminding me how much I liked more technical data analysis work. And so I then went to Palantir, a data analytics company, and that's where I really got a chance to really go very deep on using languages like SQL and Python to analyze data, analyze data at scale.
It really is kind of what fully showed me the potential for Python as a language that's very approachable and great for data work, but, you know, but but also very powerful and flexible.
[00:02:49] Unknown:
And so now you have started this HEX company for being able to solve some of the problems with sharing and collaboration and data work. And I'm wondering if you can just give a bit of an overview about what you've built there and the motivation for turning it into a business.
[00:03:04] Unknown:
And the experience I just described and then subsequently working at another startup got a lot of exposure to how, you know, folks myself and folks I was working with were were working with data and sort of how Python was starting to really become the lingua franca for a lot of that work for us. And 1 of the consistent frustrations and pain points we'd have is we would do really great work, whether it was, you know, just writing a script or more and more we were adopting notebooks, you know, first as IPython notebooks. This was a little while ago and then when Jupyter came on the scene, and 1 of the really consistent frustrations we had was the ability to take the work we were doing and turn it into something that was useful, usable, consumable for other people.
And, you know, a good example of this was we'd have a data analyst who would go off and build this whole, you know, really, really interesting, thoughtful model in a Python notebook that would answer a key question that the business was having. And then, you know, it was great, really impressive work. The charts look great. The outputs were really great. But then the only option for sort of how to take this and turn it into something that the rest of the business could consume would be like, well, I'm gonna go screenshot all the charts and put this in a Google Doc. Or, like, I'm gonna every Monday build a deck that I'll send around. Or, you know, sometimes we would be decomposing our notebooks into these Python scripts that we'd run on a cron job that would, you know, dump the data the outputs of the model into snowflakes. We could look at it in Looker. And it was kinda these, like, Rube Goldberg machines, and none of this really felt like the way that we wanted to be able to work. And and I both collaborate internal to our team, but really critically take that work and turn it into something that was actually impactful for the rest of the organization.
And so what we built with HEX, it's a notebook based platform. It sort of takes a lot of the things we love about notebooks and runs with them, and it focuses on sharing and collaboration, really addressing these core issues. And the really unique superpower it has is its sort of built in ability to quickly build interactive app. That app could be a dashboard. It could be a report. It could be a really, you know, complex tool. But to quickly build this on top of the logic you've built and then publish it, share it with others, your most critically nontechnical or, you know, people who don't know how to come in and and use a notebook in a way that they can consume, use, interact with, really take your work and turn it into something really usable for other folks. And in some ways, this is the tool I kind of always wish I wanted. You know, I told so a little bit of my background about it, and I HEX is really the platform that that if I had had at some of these different points in my career, I I think I would have do better and more more useful work, in some of my roles.
[00:05:33] Unknown:
1 of the things that you were talking about before of using Excel as the way to do this data analysis at some of your consultancies, The reason that Excel has been able to maintain such popularity and widespread use is the fact that it is ubiquitous in that anybody who works in an office, for the most part, is going to have either a Microsoft Office license or these days access to something like Google Sheets or even if it's LibreOffice. And just the concept of a spreadsheet is kind of 1 of the foundational computing primitives that everybody who learns how to use a computer gets introduced to. And the fact that the means of delivery is also the means of actually executing computation, and it's all bundled together, which I know is 1 of the problems that you just highlighted with notebooks where the unit of computation is great for a developer or a data scientist who's familiar with how to set up their environment. But the means of actually sharing all that information is much more complex than just sending an Excel spreadsheet with all of the macros and all the logic embedded within it.
And I'm wondering if you can just call out some of the major points of friction that you've seen for people who are trying to use notebooks as this means of doing more sophisticated analysis in a business context and just the stumbling blocks that they run into, both in terms of gaining access to the data they that they need, being able to perform the analyses, and being able to share and collaborate with the logic that they've built on top of it.
[00:07:06] Unknown:
You're hitting the nail on the head there, really. And and I think you've seen a lot of people who are traditionally using spreadsheets for their work or the types of workflows that people traditionally have sort of used and abused spreadsheets for, wanting to graduate into notebooks and languages like Python. And you get all these really great advantages. You get, you know, the code is easier to test, the logic is less brittle than sort of, you know, what you wind up often with a notebook, you can connect directly to data in a way that's really challenging to do in spreadsheets, You can really go deep on customizing your visualization. There's all these great things.
But the end result, like just a dothypynv file, can actually be worse than a dotslx or, you know, the equivalent Google Sheet because it's not immediately consumable and usable by other folks. And I really think that's sort of the gap that we're trying to bridge, this question of, great, you wanna move your work out of the spreadsheet for whatever reason, but preserving that ability to share and communicate it in a way that's widely accessible. I think that's really, really important. And then there's another aspect of this too that, you know, Excel is kind of like a universal run time for data. As you mentioned, everyone has access to some type of spreadsheet program, and that makes it really easy to get into. Like, you know, everyone has Excel on their computer, you know, Google Sheet that they can just sort of get into and start working with. And you see a lot of people that become Excel Power users.
You know, they start off small and they really get deep and it sort of has this wonderful accessibility and then ramp up to the really high end. And I think the scientific computing world and, you know, if you think more specifically in the space, Python and and notebook workflows, I don't think it's ever had quite that level of accessibility. Like, if you're, you know, a person off the street, it's until very recently, it's been very difficult to just get started with that. You know, you would need to set up Jupyter locally and, you know, what what the hell is PIP? And all these sort of friction points between you and actually writing some code to do something interesting.
And I think there's just a there's just sort of a base opportunity to sort of just open up and make the editing part of this more accessible in addition to making the fruits of that labor more easily shareable and consumable.
[00:09:15] Unknown:
And for what you've built at HEX, who do you view as the primary consumers of the platform, both in terms of the producers of analyses and also the people who are interacting with the applications that are being built?
[00:09:28] Unknown:
The producers are data analysts, data scientists who are working in primarily Python and SQL, who, you know, if if they've been working in notebooks before, if they've been doing their work in these languages before, HEX is gonna feel just extremely familiar and very easy to get started with. And then later on, obviously, a whole set of new superpowers that we think are great. And that's really, you know, our our core users. And I think the folks that we get the most impact with are those that actually then have some other stakeholders that they're working with or communicating with. So those could be a product manager. Those could be the CFO. Those could be a customer or client that you're built an analysis for and wanna be able to share it with them. You know, 1 cool thing you can do with HEX is, you know, you take the logic that you've written in a notebook. You know, it can be a really very fast process to turn that into an interactive application that looks really nice, but then you hit publish, share.
All of a sudden, what you've shared with someone, you built and sort of deployed to someone, is is a really sharp looking interactive web app. And to them, you know, you didn't send them a notebook. You didn't send them a bunch of extra Python code. You just sent them something that's easy for them to use and interact with and understand. That can be really powerful for folks in a lot of different roles, that ability to sort of be able to actually be hands on with the results of this work now.
[00:10:43] Unknown:
And going back to the roadblocks that exist for people who are using notebooks and trying to be able to go beyond the limitations of just working in isolation on their machine with their locally set up environment. How have those roadblocks changed over the past few years from when notebooks were first introduced with IPython and IPython Notebooks, and now moving to the stage where notebooks have become 1 of the default platforms for computing and data analysis?
[00:11:14] Unknown:
It's an interesting question, how the roadblocks have changed. I think, in some ways, the roadblocks are pretty consistent between when I was first firing up IPython Notebooks on my underpowered ThinkPad to now. Like, there's some of these consistent frustrations around getting everything set up and environment management and connecting to data. It can be extremely clunky depending on the type of data and where you're where you're trying to do. But then that last mile problem, I shared a little bit of my my experience with that over the years before. 1 of the things that kind of triggered us really starting to experiment with what became HEX was this observation of, like, this has just been a problem, like, the whole time that I've been working with data. Like, this has been really consistent. It doesn't feel like it's really changed.
And so in some ways, it's the staticness, persistence of those roadblocks that kind of pushed us into really examining this problem space and and building what we built. And and there's other roadblocks as well that have that have maybe gotten some more love. There's some that that have gotten lost, but that that sort of core thing was was really what fired us up and inspired us.
[00:12:21] Unknown:
In terms of the actual notebook concept, it has definitely been gaining a lot of ground recently, partly because of the rise in the necessity of working with data. But I'm wondering what you see as the other driving forces for why the notebook interface has gained such popularity and why it feels so intuitive and obvious, particularly for data use cases?
[00:12:48] Unknown:
From when they first that format first sort of emerged, I I think they first popped up in Mathematica, which is sort of classic mathematics, scientific computing platform. I think the reason is the format just feels natural for people. I think there's just something very fundamental about the way it helps you sort of form a path of thought the way it helps you format your work. I was even observing when I was going back and having to do something in a spreadsheet a few months ago that I almost found myself formatting it a little bit in the spreadsheet itself, almost notebook like, because it's just a sort of very natural way for me to think. And then I think 1 of the things that really helped propel it initially was a lot of the work being done in notebooks was done around academic work where especially where there's sort of a narrative or a sort of a story or an experiment you're trying to walk through. And so I think the format just feels really natural. And there's an interesting debate, like, I don't know if plugged in folks are to this, but sort of interesting debate over notebooks. And there's been conference talks about how notebooks suck and then conference talks about how notebooks are great and notebook flame wars on on Internet forums. And, like, I just think that it's a format that obviously works and makes sense for a lot of people, and we're excited to sort of take it and extend it and think about where it can go and how it can get even better versus trying to, like, change people's religion and talk them out of using a notebook, which, you know, there may be other formats down the road that people discover and experiment with, but it's something that always felt natural to us and and has clearly clicked with a lot of people. So we're sort of we're sort of running with it and trying to add some of our own thoughts there.
[00:14:18] Unknown:
As you said, there have been people who have kind of pushed back against the idea of the notebook as being such a revolutionary or useful means of doing analysis or doing programming because of some of the limitations, particularly around things like software engineering best practices or repeatability of executing notebooks because of the fact that you can execute cells out of order. And there have been some notebook platforms that have tried to add constraints to that and tooling built around Jupyter notebooks for being able to do things like turn them into Python source code and just extract all of that information out of the notebook to make it easier for versioning or turning the notebook into an IDE environment. I'm wondering what your thoughts are on just just the overall growth and directions of the ecosystem and how it is that the limitations of the notebook interface are outweighed by the utility of of the platform and driven so many people to innovate to kind of try to drive down some of that debt.
[00:15:18] Unknown:
Yeah. I mean, you're calling out some of the big things that are potentially problematic about notebooks or at least historically problematic about notebooks. You know, you can get into some especially more novice users can get into some state weirdness. Yes. There's you know, if you do things in a certain way, it can be difficult to reproduce. Yes. It's not as nice for someone who's maybe used to, like, a full featured IDE in terms of features. And and these are all true. I guess when I've sort of thought of this historically, it's been like, these are true, but they're easy also easy to offset with some combination of best practices and thoughtful features. And I don't think that these limitations of notebooks are necessarily set in stone. And so I think some of the extensions and plugins and that have sort of emerged in the Jupyter open source ecosystem can be really helpful with this.
And then in our own way, we're trying to contribute to this as well. And so, like, 1 thing that with HEX is that, like, the logic itself, when someone's running a HEX app, the logic itself executes top to bottom every time. And because of that, the state is both predictable, but also sort of enforces you as the editor, the creator, to know that your code can run top to bottom every time. And because we sort of build that in, you kind of take some of those those issues and those pain points and and strip them away a little bit or ameliorate them in a way that, you know, I think makes it a format that works for what people are trying to accomplish. And so I don't subscribe to that these are, like, immutable problems. I think there's thoughtful ways to approach them. And, you know, there's the don't throw the baby out with the bathwater adage, and I've kinda feel like that's what people wind up doing of, like, hey. There's this issue with how people use notebooks or novice users can get themselves into trouble like this way.
And it's true, but it doesn't make the format invalid. It doesn't mean it's not useful for people. There's clearly a lot to like. And so we are very much at the mind if we wanna, like, embrace it, push it forward, contribute some of our own ideas, contribute back to a lot of the energy in the ecosystem versus, like, try to change people's minds about whether this is a useful format or not.
[00:17:22] Unknown:
Another interesting element of this space is that the concept of notebooks, as you said, has been around for a while with things like Mathematica and the Wolfram Alpha, and Jupyter has become kind of the canonical representation of notebooks. And particularly in the Python ecosystem, if you say notebook, then most people just assume you mean Jupyter. But there have been a number of other platforms that have been developed for various use cases of notebooks, either for language support in different ecosystems or better integration with certain platforms. I'm wondering if you can just give a bit of an overview of your perspective of the overall ecosystem of notebooks and what your particular affiliations are with HEX in terms of what platforms you're aiming to support.
[00:18:07] Unknown:
I think the ecosystem's been very cool to watch grow. And as I mentioned, I I was sort of user of maybe 1 of the early days of of the project that became Jupyter, so IPython Notebooks. And just seeing the the amazing progress of how that format and that ecosystem has grown has been really cool. You know, JupyterCon was a couple months ago, and there's just a lot of energy there. I think for us, Jupyter will I predict Jupyter will always have a place, and that ecosystem and compatibility with that ecosystem has been really important to us. And so we have full import export compatibility with IP, Y, and D files. So in that way, you know, we're kind of affiliated.
Not we actually sponsor the Jupyter project, but we're we're not formally affiliated. But we are most closely associated with sort of the Jupyter format. So, you know, work that's done in Jupyter can come into HEX really easily and vice versa. The interesting thing, though, is, as you alluded to, there's there's been a lot of sort of extending that popularity of the notebook format to other places. Like, Observable's a really cool example for folks who haven't seen that of of a notebook that is JavaScript based, or it's based on a fork of JavaScript that they actually created that really contributes some really unique ideas to how state and reactivity can work. And, you know, just by virtue of it being JavaScript based sort of appeals to a very different set of users and a very different set of workflows.
And so it's cool to see folks take that format and contribute some of their own ideas and stretch it and push it into different places. I just predict we'll see a lot more of that, and I'm, you know, excited to both be a part of that and excited to sort of watch the ecosystem unfold.
[00:19:41] Unknown:
Digging more into HEX specifically now, can you give a bit of an overview about how the platform is implemented and some of the core features that you have built on top of notebooks and just some of the ways that the platform has evolved since you first began working on it?
[00:19:57] Unknown:
Yeah. So in terms of implementation, it's a sort of full stack web app that we've built. We've wrote our own front end using React, JavaScript Blueprint, which is a UI library that some of us actually contributed to in the past. And, you know, our front end notebook editor, we think, sort of takes a lot of the best of what we've been able to appreciate in some other tools and add some of our own niceties. And there's some great things there in terms of you know, type ahead and better shortcuts and all sorts of stuff. And then there's our own back end server, which can run-in our environment. So, like, our hosted SaaS hex cloud environment that we fully manage. It can also run-in a customer's environment, so your data sort of never leaves your environment. You can get more into how we thought about that a little bit later. And then there's a compute cluster, which is effectively a pool of kernels running Jupyter kernels. So we're actually using the kernel part of the Jupyter open source system.
Under the hood, all of this, as I mentioned, can be deployed either and used either as part of our HEX Cloud sort of hosted offering, or it can be fully deployed to a customer's environment, which is great if you have really sensitive data or you want more control over the types of environment that HEX is running in. It's great if you sort of need that level of flexibility.
[00:21:14] Unknown:
Python has become the default language for working with data, whether as a data scientist, data engineer, data analyst, or machine learning engineer. Springboard has launched their school of data to help you get a career in the field through a comprehensive set of programs that are 100% online and tailored to fit your busy schedule. With a network of expert mentors who are available to coach you during weekly 1 on 1 video calls, a tuition back guarantee that means you don't pay until you get a job, resume preparation, and interview assistance. There's no reason to wait. Springboard is offering up to 20 scholarships of $500 towards the tuition cost exclusively to listeners of this show.
Go to python podcast.com/springboard today to learn more and give your career a boost to the next level. And for people who are using Hex, can you talk through the overall workflow of going from an idea of I need to be able to perform this analysis to delivering the end application of I've performed this analysis. Here's an interactive web app that you can use to be able to drill down into the details of it and just where HEX sits in that overall life cycle of that project.
[00:22:28] Unknown:
To start, with HEX, you can either start a new project from scratch, or you can import an existing IPYNB, you know, Jupyter notebook file. Then you can edit it just like you could any other notebook. So, you know, 1 of the really cool things that we built is real time collaboration. So if you've used Google Docs or Figma or any of these products that sort of have the real time multiplayer editing, we fully support that. I have a pretty cool way of implementing that. Where actually, like, the cell if you're editing, you kind of apply a lock to the cell so no 1 else could step on it or step on your work or, you know, conflict with your work. So we think that you'll kind of use that notebook cell based UI in a cool way there. You can comment and, you know, leave each other comments, and there's actually version control workflows. So it just sort of makes the overall collaboration part of just working together on these files much more, we think, intuitive and and easy.
And then, you know, once you've gotten your work in a good place, you've written some logic, model and analysis that you're proud of, you can easily parameterize it. So you can literally add visual widgets, dropdowns, sliders, buttons, tables. There's all sorts of these sort of visual components that we've built that help you parameterize your work and effectively start layering a UI on. And then you can compose those elements, the inputs and outputs of your work, of your notebook into a web app. We offer a couple different layout types depending on if you're trying to build, like, a report, like, more linear that kind of embraces the notebook linearity, or if it's much more of like a dashboard. Like, hey. I wanna be able to freeform a range, you know, these charts next to each other and these drop downs here.
And then you literally just click publish, and it effectively deploys this as an interactive web app. And you just go into the share dialogue, you can share it with your whole organization or individuals, give them different levels of permissions. And then as I mentioned earlier, like, the net effect for a nontechnical person or someone who, at the very least, you know, wouldn't be coming in and editing a notebook, is that they're they just got shared on an interactive web app. Like, to them, there's no setup, there's no overhead, there's no kernel to figure out what the hell is going on with. Like, you know, they move a slider, the logic reruns, they see updated outputs. It's it's pretty intuitive. And so we really think we're, like, the fastest, best way to go sort of end to end there from the building and collaborating on the the work phase to something that's fully deployed and useful and usable for other folks.
[00:24:47] Unknown:
1 of the interesting aspects of what you've built at HEX that I've been able to gather from looking at the site and looking at the demo is the ability to parameterize the notebooks and turn certain variables into dynamic attributes that you can modify through drop downs or input fields. And I'm wondering if you can talk through some of the ways that you've implemented that and how it compares to the capabilities of things like paper mill.
[00:25:15] Unknown:
Yeah. Paper mill is a really cool project. And I think of paper mill when I think of, like, wanting to take a notebook and parameterize it to sort of have it be something that's running maybe as part of a pipeline or something that's run, for lack of a better term, kind of like headlessly. I think that's mostly where the sort of parameterization that Papermill really shines. We've taken a very different approach, which is the parameterization you do in HEX is really tailored around building these sort of interactive experiences. And so, you know, if you watch the demo, you can literally go in and drag, select a piece of a string or a numeric or whatever, right click it, hit add parameter, and basically replace a static value. Like, if you're doing something that drills down by state, you had California written in, you know, you could turn that from something static into a dropdown or into a free text field. And those parameters can be used, you know, throughout your work. There's a lot of flexibility there. So you have, like, you know, free text fields or drop downs or sliders. You also have table inputs, which is really cool because it kind of starts eating into some of those more spreadsheet like workflows where maybe your set of inputs aren't as simple. They're they're tabular, and or maybe it's a forecast workflow where you want to present users the, you know, the sort of base case assumptions, but then they can override them in the logical rerun and show the new forecast based on their new inputs.
And so there's a lot of flexibility and richness we sort of have layered on to that ability to parameterize your work. And it's just a really fast, really easy way to sort of effectively build a UI on top of your logic. You know, for us, was always felt like the missing piece of what we were trying to accomplish with our work.
[00:26:52] Unknown:
In terms of the capabilities of turning your analysis or your Python logic into an interactive web app, another project in that space that I'm curious to get your take on is Streamlit. And I'm wondering if you can give some compare and contrast between the capabilities that they provide and what you've built with Hex.
[00:27:10] Unknown:
Yeah. I think Streamlit's really cool. I used Shiny much earlier in my career and, you know, liked it for a lot of reasons. It didn't for others. Shiny's a similar project in the R ecosystem. And I think Streamlit's sort of taken and built a really kind of neat version of that in the Python world. I think we do have pretty different focuses. I think, you know, maybe the net thing you can kind of wind up with at the end might be similar. But, you know, as 1 big example, we sort of embrace notebooks, and we have an editor that's sort of directly in our platform. We have a lot of other features, like built in data connections. And I mentioned, you know, environment management and real time collaboration. And so we're a little more end to end. We're a little bit more of a sort of full workflow, which is great, we think, for a whole number of reasons. Streamlit's also really cool in that it's sort of this very focused point thing that is easy to pull into an existing project. So, you know, we have different focuses, but I'm I think they've done a great job. And it's cool for me to see the sort of richness and how dynamic, you know, the different projects in this ecosystem are. I think it you know, for us, it's largely validating that this is a problem we're solving and that there's probably a lot of different approaches that can work.
[00:28:19] Unknown:
And another interesting aspect of the problem that you've taken on is the aspect of dependency management and access to external systems and integration with the overall ecosystem of the data platform. So I'm wondering if you can talk through some of the ways that you handle those challenges.
[00:28:37] Unknown:
Yeah. This is 1 of the things that was, you know, among the list of frustrations I had personally in the past working with notebooks and doing data analysis. You know, 1 of the things that was always frustrating was how you wind up connecting to data. And it's like, you know, everyone's got their notebook locally, everyone's got maybe some, like, templatized, you know, SQL alchemy or whatever the library you're using connection you're setting up, and then you're writing SQL and these, like, triple tick sort of you know, it feels a little hacky while you're doing it in terms of how you're actually sort of connecting to data in the notebook. And we thought that was a really good opportunity both to have a better workflow for creators, but also it's really important if you're in the business of taking these this work you're doing and publishing it as a web app, that you have a good story on being able to securely and reliably connect to data that's sitting behind it. And so we built you know, on the back end, we have a really cool data connection service that sets up data connections. It actually has some cool caching functionality, so you'd have control as a user over when a query refires versus when you're just using a cached version. You know, if you have a query that takes 5 minutes to run, you don't want that to rerun every time a user moves a slider. So This lets you sort of have a lot of control. You can schedule updates of of queries.
And then on the front end, in the notebook, we have SQL cells, which is, just a much cleaner, more pleasant way to actually write your queries against these data sources. And then those SQL cells feed their outputs in as data frames that then you can just use in the rest of your work as you would any other data frame in Python. And so so things are very intuitive, very much easier way to connect and work with data. And, you know, 1 big change in the course of my career working in data is the the just explosion of people embracing these cloud data warehouses like Redshift and Snowflake and BigQuery and people running Postgres.
And so Hex just makes it super, super simple to connect to those and take advantage of all of this rich data that people have now.
[00:30:31] Unknown:
To your point of SQL, another interesting complexity there is how you manage things like mitigating SQL injection attacks since you are exposing these applications to end users over the web?
[00:30:45] Unknown:
I think, generally, when you're building a product like we are, how you're thinking about security is really important. Like, not only are we building a cloud app that, you know, is on the Internet, whatever, but you're letting users run arbitrary code. So you have to be really thoughtful of what you're able to do there. So there's sort of the obvious best practices around input sanitization. Then there's also a lot that we do with the kernels themselves and how we have everything containerized to make sure that there's not, you know, undue risk that we're exposing ourselves or our users to. So we've put a lot of thought into that. And I think in general, you know, security is something that is really table stakes for anyone who wants to do interesting work in the data space, and it's something we focused a lot on. And so, yeah, like, SQL injection attacks are among the many things that we had to be very thoughtful about early on, making sure that we were avoiding.
[00:31:36] Unknown:
Going back to the broader level of what you've built at HEX and the fact that it is a platform to help drive forward the use cases for notebooks and improve the accessibility and scalability of their use by an individual or across an organization. I'm wondering if you can give a bit of perspective as to how it compares to some of the other systems and platforms that have come up for being able to handle notebook use cases, thinking in terms of things like Zepple for Zeppelin Notebooks or Notable from some of the folks from Netflix where they were using paper mill or Saturn cloud or things like JupyterHub and, you know, the binder ecosystem, things like that.
[00:32:18] Unknown:
As you said, there's there's a few of these floating around, and I think it's really cool that there's so much energy in this space. Like, for us, that means, you know, it's a very interesting place to be working in and a lot of cool ideas flowing in. We know folks at at several of those teams. I think for us, we just felt like there was a whole set of of use cases and users that weren't being served well and a vision we had around the type of platform we wanted to build long term that that I think it even kind of winds up transcending just like, you know, notebooks, you know, hosted notebooks and what you could sort of think of in that scope. We have a lot of things we wanna do and try out and a lot of ideas that we think could make these types of data analysis workflows a lot better. And so I think you could sit down and do, like, a feature by feature comparison against different platforms, but we clearly felt you know, we didn't use some of the things we mentioned in the past and felt felt like there was still room to experiment and try different things. And so just but just to say again, you know, generally, I think the fact that there's so many people building and trying out new things in the space is great, and we love seeing more smart teams working on more interesting projects around this.
[00:33:20] Unknown:
In terms of the aspects of the ecosystem that you're keeping a close eye on, both as an individual user of Notebooks What are the trends in the tool what are the trends in the tooling or capabilities or use cases for notebooks that you're keeping an eye on and that you're most excited for?
[00:33:41] Unknown:
There's some projects happening in the Jupyter open source world that I think are really cool for specific use cases or needs. I mean, 1 that's kind of needed, you know, there there's folks trying to build even real time collaboration in Digi Jupyter. And that's something we've built in HEX. We know it's that's very challenging to do. And I think seeing people try to bring that into, like, sort of a legacy platform is very neat. And then I mentioned it earlier, but there's cool examples of taking that sort of notebook format and extending it to other other spaces. Like, again, I think Observable is a really cool example of that if you're working with JavaScript. And so just in general, I think it's neat to see the sort of diversity and breadth of of projects in the ecosystem. And there's a lot that I'm keeping my eye on both, you know, either ones I'm really cheering to succeed because I think they're smart ideas, that solve problems that maybe we're not we're not getting to, or other projects that we kind of look to for inspiration and for ideas. And so it's a fun space to be part of right now.
[00:34:33] Unknown:
And in terms of the users of HEX, what are some of the most interesting or innovative or unexpected ways that you've seen it used?
[00:34:42] Unknown:
Yeah. 1 of the awesome parts about building a product that is so flexible and powerful is you see all of the really interesting ways and directions people take it. We have a kind of funny thing we've said for a while now, which is like life finds a way. You know, if someone if you have a really smart, motivated user who's trying to accomplish something, they're gonna find a way to do it. And so we've seen people take hacks and really run with it a lot of interesting ways. It's a good example of someone taking this Google Sheet workflow where they were you know, had some Jupyter Notebooks, and there's a step where they're exporting things into CSV and then putting it in Google Sheets and having other people go in and update and annotate the data and then taking that and exporting it back to a CSV and uploading it back into Snowflake. They were basically able to take the whole thing and move it into HEX to build this cool workflow around it. And, you know, we obviously can't see what they're doing, didn't know what they were doing. I mean, this user sort of reached out with an issue and showed us what they were working on. I was like, oh my gosh. Like, I had never thought of this type of use case before.
And it's just cool to see how people extend things. And I think it's a challenge in some ways, right? Because you see users go and find edge cases or things that you hadn't thought of. But I think 1 of the most fun parts of building a product is letting your users sort of guide you and help you understand maybe the ways in which your product is valuable in a way that in ways you may not have even thought of. And and that's it's a real joy and also quite humbling to see people do that.
[00:36:09] Unknown:
And in your experience of building the business and building the technology for it, what are some of the most interesting or unexpected or challenging lessons that you've learned in the process?
[00:36:18] Unknown:
I think the big 1 that comes to mind here is the importance of security. I wouldn't say it's necessarily a surprise. We have a lot of interest and exposure to this from our backgrounds, building software at Palantir, building tools in the health care space, which is a couple of us were doing just before we started the company. And I think there's I mentioned earlier, I think it's really table stakes to have really invested deeply in security and have be able to give customers and users confidence that their work and their data is being taken good care of. And I think it's been important for a long time. I think people are only getting more mindful of this.
And so we've invested a lot here, I think, and embraced it not just as, like, a check the box thing, but as a true superpower that we wanna have around being a very secure, very reliable platform, whether the products you know, whether users are using it in our environment, in hex cloud, or it's deployed to their environment. And so it's been a challenge, but it's 1 we embrace very eagerly and are just gonna continue to focus on over time.
[00:37:20] Unknown:
And for people who are using notebooks and they're considering ways that they want to extend their usability, what are the cases where HEX is the wrong choice?
[00:37:29] Unknown:
There's definitely a few. I think today, we have not focused to date on workflows that are really focused on very large deep learning, you know, sort of machine learning training, workflows. You can do a lot of that in hex. We have folks who do that in hex. But if you start getting to a point where you want, like, GPU acceleration or a lot of fine tuned control over the exact compute environment. We just haven't spent a lot of time on that yet. That's just not the type of problem that we thought we had a lot to add to early on. We'll we'll add more of that over time. But right now, I typically point folks out there where I think there's there's a lot of interesting projects in Kubeflow and others that I think are do a really great job with that. You know, some of those workflows will build in over time, but others will probably just really eagerly integrate and book to partner with other tools. We don't feel like we have to be everything to everyone, which is important when you're building a product to have a sense of, like, the types of problems you don't necessarily need to solve right away.
[00:38:24] Unknown:
And as you look to the future of the HEX platform and the business, what are some of the things that you have planned for the near to medium term?
[00:38:32] Unknown:
Oh, there's there's a lot. We just went through planning for the next few months. I I think some of the things that really excite me are about how to just in deepening the the sharing and collaboration aspects. I think there's a lot more we wanna do in terms of helping users be able to create more customization and richness in in the apps they're building. There's a lot we want to do around helping people collaborate better. So, you know, 1 example is allowing you to share data connections across projects and sort of have shared secrets across projects. It just make it very easy for people to sort of utilize shared assets and resources. I think there's some really cool stuff we've been thinking about in terms of allowing people to share code and code snippets and have a library of those things that are useful across an organization. I think you're going to look at a lot of the hacky ways people do that type of thing today.
And then there's a lot that we have in mind for starting to really scale up, including launching more flexibility around environment configuration and compute. So there's a lot to be done in this product space, and that's really what excites us. And 1 of the fun things, as I mentioned earlier, is just seeing what our users do and getting feedback from them. And so, you know, we ship code multiple times a week and, you know, almost everything we work on are things that are being directly asked for by our users. And so we're in some ways, we're really letting them guide our roadmap as much as we can, like, preconceive what it is. And then in terms of sort of the business more generally, we've been growing our team. I don't know if there's folks listening that are interested in the space that are either engineers or are interested in sort of evangelism around products and data workflows.
There's a lot of folks that we'll know we'll need to partner with to build the business and take the product to where we want. So I'd love to talk to folks who are interested in that.
[00:40:14] Unknown:
Well, for anybody who wants to get in touch with you and follow along with the work that you're doing, I'll have you add your preferred contact information to the show notes. And with that, I'll move us into the picks. And this week, I'm going to choose a couple of things. The first is a tool called Flake hell that's a wrapper for Flake 8 that makes it easier to manage the various plugins that you're using and manage the rules, and it lets you use the pyprojectdot TOML file as your configuration location. So just streamlines a lot of the workflow around linting and code quality in your projects. So definitely recommend checking that out. And also, I've recently been revisiting some of the movies in the DC Extended Universe, watching them with my kids. So that's been a lot of fun. So if you're looking for something to keep you entertained, definitely worth checking out some of those. And so with that, I'll pass it to you. What do you have for picks this week?
[00:41:02] Unknown:
My pick this week is this board game called Wingspan. It is a really beautiful game with hand drawn illustrations of birds. It's an engine building game. And I just find it so delightfully zen and really fun to play. And it's it's 1 of those games that it's it's competitive. Like, there's a winner, but, you know, you're not fighting each other over resources. You're really sort of doing your own thing. And so I think that's a lot of fun for playing with family. And I've been playing it a bunch with with my family members and really loving it. So that's my recommendation
[00:41:33] Unknown:
for the week. Alright. Well, thank you very much for taking the time today to join me and share the work that you've been doing at HEX. It's definitely a very interesting product and solving some of the problems that still exist in the overall notebook ecosystem, which is, as we've discussed, a very growing and important field. So I appreciate all the time and effort you've put into that, and I hope you enjoy the rest of your day. My pleasure. Thanks for having me on. Thank you for listening. Don't forget to check out our other show, the Data Engineering Podcast at dataengineeringpodcast.com for the latest on modern data management.
And visit the site of pythonpodcast.com to subscribe to the show, sign up for the mailing list, and read the show notes. And if you've learned something or tried out a project from the show, then tell us about it. Email host at podcastinit.com with your story. To help other people find the show, please leave a review on Itunes and tell your friends and coworkers.
Introduction and Guest Introduction
Barry McCardell's Background and Introduction to Python
Early Career and Consulting Experience
Motivation Behind HEX
Challenges with Traditional Data Sharing Methods
Excel vs. Notebooks in Data Analysis
Primary Consumers of HEX
Evolution of Notebooks and Roadblocks
Popularity and Intuitiveness of Notebooks
Limitations and Innovations in Notebooks
HEX's Affiliation with Jupyter and Other Platforms
Implementation and Core Features of HEX
Workflow from Analysis to Interactive Web App
Parameterization in HEX vs. Papermill
Comparison with Streamlit
Dependency Management and Data Integration
Comparison with Other Notebook Platforms
Trends and Future Directions in Notebook Ecosystem
Interesting Use Cases of HEX
Lessons Learned in Building HEX
When HEX is Not the Right Choice
Future Plans for HEX
Contact Information and Picks