The Statistical Kings of Comedy | Stats + Stories Episode 348 / by Stats Stories

Sachin Date works for VitalEdge Technologies and has, over his career, worked in two research labs, three software companies including two product companies, and in a classroom. He has built and delivered all kinds of software including massively distributed discrete-time simulations, data science stacks, a new programming language, and dozens of mobile apps, including the world’s first Napster app for Blackberries. Along the way, Sachin taught 100 liberal arts majors how to program in BASIC and built a mobile applications practice from scratch.

Episode Description

A journalist, statistician and sound engineer walk into a bar. Well, well, actually, to a studio to record a podcast. Comedians have been a source of great amusement and delight over generations. Popular comedians can earn a great deal from their live shows. In 2023 billboard reported that Kevin Hart earned 67, and a half 1 million dollars from 82 shows with 631,000 tickets sold. Comedies are also a popular genre for television and movies. One of the most successful shows, Seinfeld, created by Jerry Seinfeld and Larry David ran from 1989 to 1998. Have you ever noticed an echo of one of your favorite comedians from the past in the work of a comedian today that’s the topic of this week’s episode of Stats+Stories with guest Sachin Date.

+Full Transcript

John Bailer
A journalist, statistician and sound engineer walk into a bar…well, actually, to a studio, to record a podcast. Comedians have been a source of great amusement and delight over generations. Popular comedians can earn a great deal from their live shows. In 2023, Billboard reported that Kevin Hart earned 67 and a half million dollars from 82 shows with 631,000 tickets sold. Comedies are also a popular genre for television and movies, one of the most successful shows, Seinfeld, created by Jerry Seinfeld and Larry David, ran from 1989 to 1998. Have you ever noticed an echo of one of your favorite comedians from the past in the work of a comedian today who may have influenced Seinfeld or David? How would you know? Stay tuned, and you will get your question answered on this episode of Stats and Stories, where we explore the statistics behind the stories and the stories behind the statistics. I'm John Bailer. Stats and Stories is a production of Miami University's departments of statistics and media, journalism and film, as well as the American Statistical Association. Joining me is regular panelist, Rosemary Pennington, chair of the department of media, journalism and film at Miami University. Our guest today is Sachin Date. Date works for Vital Edge Technology. His career has included work in two research labs, three software companies, including two product companies and in a classroom. He has built and delivered all kinds of software, including massively distributed, discrete time simulations, data science stacks, new programming languages and dozens of mobile apps, including the world's first Napster app for Blackberries. I remember Blackberries and Napster too. For that, he has also taught 100 liberal arts majors how to program in basic and build a mobile applications practice from scratch. Date’s recent Significance article entitled that Shakespeare influenced Seinfeld provides the background for our conversations today. Thank you so much for joining us today.

Sachin Date
Thank you for having me, John.

John Bailer
So what is it? What inspired you to embark on this project? Right?

Sachin Date
So I didn't actually start with the intention of establishing the patterns of influence between specific comedians and their influences. What really happened was, I was browsing through the Wikipedia pages of some of the comedians I follow, and I quickly discovered that a lot of these pages have material on them that seem to indicate that the comedian was heavily influenced by other comedians, and sometimes not necessarily other comedians, but also writers and a lot of other, you know, kinds of people, like family members and friends and so forth. So I clicked on the links of some of these influences, particularly the influences of influences that came from other comedians, and I discovered that the Wikipedia pages of those influencers also contained information about whom they influenced. So I clicked on those links. And then I kind of kept on going back in time, until I ran into Wikipedia pages of writers in the 18th century, 17th century, 16th century. At one point, I opened a Wikipedia page of Shakespeare, William Shakespeare, and I realized that I had actually basically followed the links through from someone who is alive today in the 21st century, and then kind of transported myself back in time all the way to William Shakespeare. So that made me wonder, well, how common is this pattern? Are there other comedians who also have influence data listed on their Wikipedia pages? So I kind of started clicking around, and I discovered that a lot of comedians actually have this kind of data on their Wikipedia pages. Additionally, the Wikipedia pages of very influential comedians like Richard Pryor, for example, or John Carlin, have legacy sections on them which contain information about whom they have influenced. That's kind of part of their legacy. So there's those backlinks also to be followed. So I figured, well, let me actually see if I can do a systematic study of this topic. But when I started doing that, I realized that, well, the number of comedians involved is very big. Wikipedia itself has about, I think, 50 to 100 different categories devoted to comedy. So I figured, well, let me, let me, kind of just put a circle around my research. I'll focus only on the comedians who are contemporarily the most popular comedians in America today, and then I'll start tracing the links back from that set of comedians. And let me see how far back in time and how widespread those things kind of get. And that's kind of, you know, what motivated the research on that topic.

Rosemary Pennington
How did you determine who were the top comedians working today?

Sachin Date
Yeah, so I was interested in the way of finding that information, what I thought I would do and not actually work remarkably well was that I ran a couple of well, actually, I ran three pretty straightforward Google searches. So the search text basically went: most popular American comedians in 22x where that X was either one or two or three. So basically, the most popular American comedians in 2021, 2022, 2023 I figured, well, the last three years could be considered as kind of the window or the most popular contemporary comedians. So sure enough, Google showed a lot of search results. So I tweaked those results by setting the time frame filter to include only the results that were published in the October through December timeframe. So as soon as I did that, that brought forth research that was really more focused toward the end of the year, rankings and less and ratings that were available on the internet. And then I started going through those research, and sure enough, there was a large amount of diversity in there. So for each one of those three years, 2021, 2022, 2023, what I did was I essentially identified about 10 different types of sources, and I tried to keep those sources as different from each other as possible, just to kind of, you know, reduce the bias and improve the diversity in the data. So that gave me essentially a mass of comedians to work with, and then I merged that data, and then kind of arrived at the list of what I consider to be the most popular contemporary American stand ups.

John Bailer
So let's name some names. So who are some of the comedians that you ended up including kind of from this, this three year window?

Sachin Date
Well, there was Jerry Seinfeld, of course, and then there was Hasan Minhaj. Well, let's see. There was John Mulaney and Taylor Tomlinson, David Chapelle. A lot of the same, you know, same set of people started repeating in those names and those things. So one thing that kind of was common amongst them was that a lot of them were very active in stand up comedy. I mean, not just now, but I mean just, you know, three years ago, four years ago, 10 years ago. So they've been doing stand up for a long period of time.

John Bailer
So how many different comedians did you identify in this collection? I mean, once you filtered it based on you said that they were American comedians in this time window that were identified in October through December of these three years. So what was the total number of comedians that you included to start building this connection of influence?

Sachin Date
So the three sources that I ran those searches produced several 100 different comedians, and once, I kind of twittered out all the ones that were not US persons, because my focus was only on American stand ups, so I filtered those out. Then I also filtered out comedians which did not have Wikipedia pages, because my study was really kind of just focused on data that came from Wikipedia. I also filtered out comedians who had not really performed any kind of stand up or improv or sketch comedy. So once all those filters were applied, I narrowed the space down to about 100-175 to 200 comedians. So, that was kind of the social network of comedians that I started with. Now, this was the set of the most popular contemporary comedians as of the end of 2023, now, of course, a lot of those the Wikipedia pages did not have the influence data on them. In fact, I think for over 100 of those 175 or so comedians, there was no good data available on Wikipedia on who influenced them. So those were really isolated nodes in the network, and then the balance set of comedians who had that data, I kind of followed the links back in time and also across in space to build a social network. So in the end, I basically ended up with about 64 to 70 comedians who had a lot of influence data associated with them, and then the social network was kind of based off of that set. The overall network, once you kind of factored in all the influences on those comedians, the overall network of influences ran up to 200 and about 250 to 260 nodes and around 700 of influence.

Rosemary Pennington
What concerns did you have about using Wikipedia data?

Sachin Date
Right? Yeah, so Wikipedia, on one hand, most of the data that's mentioned on Wikipedia is referenced very nicely. So that's kind of one advantage you get from using Wikipedia data, that you can follow through the reference links and just kind of verify that the influence that is mentioned on the page actually does ring true. The text talking about the influence, it is actually a valid influence, but it kind of links through to some article somewhere that mentions how the comedian actually was influenced by someone else. On the other hand, with Wikipedia, there is really no way for you to know the strength of the current strength of the influence, so you're forced to consider that influence as a binary variable, so either the influence is there or the influence is not there. But in reality, of course, influence is much more complex than that. Someone could be influenced by someone else, very heavily in the past, but not really so much anymore. And that character of the influence isn't really brought out very well. Actually, it's not brought out at all in most cases on Wikipedia. So that's another problem. Well, it's really not so much a problem about Wikipedia as much as it is with the nature of the influence itself. I mean, it's an inherently qualitative measure. And in fact, one of the goals of the study was to kind of work, work around that, try to work around the qualitative nature of the influence. But yeah, back to your question about the limitations of Wikipedia data. So there was that, that the influence of nature was entirely binary. You either assume that the influence was there or it was not there, depending on what was mentioned in the page. The other aspect of information on Wikipedia is that you have to be very careful to interpret the text, the sentence, the context around the influence very carefully. So I mean, in fact, I'll give you a couple of examples. In one instance, I think this was on the page David Letterman's page, where he talks about how Norm McDonald has been one of the greatest comedians that he has run into, but that that kind of a text is really more in the context of Letterman considering Norm McDonald as really a great comedian, not so much an influence. So you have to be careful about creating the text around words such as great comedian or my hero, or anything like that, so it can kind of, you know, the there's a lot of subjectivity involved over there,

John Bailer
You're listening to Stats and Stories. Our guest today is Sachin Date. So you've talked a lot about this idea of an influence network. So help the audience. Picture this. You have a cloud out there, and each comedian is some, I don't know, some unique cloud itself that's connected potentially to others, and those edges that can check them. Those nodes are comedians. The edges are if they hit one influences the other. There's direction here if one is influencing the other. So you've built this from the data. What kind of influences or influencers surprised you most after having built this, this network out?

Sachin Date
Well, okay, so let me kind of give you some examples here. So one of interesting findings was that people such as Charlie Chaplin and Stan Laurel and Oliver Hardy of the Laurel and Hardy fame, they, all three of them, in fact, individually seem to either directly or indirectly influence almost a third of the contemporarily most popular American stand ups who had influences listed on Wikipedia. So I kind of found that to be quite interesting. What that also pointed to was that a lot of the influence was coming from people who were not really stand ups in the currently understood definition of that term, a lot of the influences or influencers were writers, comedic writers, or stage performers, or people like Charlie Sharply, who were clearly not stand ups, not stage performers as such, also, but very accomplished comic actors and directors and producers. So that was one interesting thing. I found another thing worth mentioning is to do with the data about the birth dates of the influenced comedians and their influencers. So as I was kind of tracing out this network, one of the things that I was doing was also capturing the dates of birth of the comedians and their influencers. And what I found was an overwhelming volume, actually almost 100% of the volume, I think, like more than 95%, 95 point, some percent of direct influence volume came from individuals who were at most two generations older than the influenced comedian, and more than half of the direct influence volume on the contemporary most popular American stand ups came from people within the same generation. So it just kind of seemed like a lot of the, I would say, an overwhelming majority of American stand ups are drawing their influence from people who are kind of roughly their age, or not really very much older than them. Now if you also factor in the indirect influences, meaning, let's say comedian a was influenced by comedian B and comedian B was influenced by comedian C, so comedian C indirectly influences comedian A. So I guess that was kind of one of the fundamental assumptions of the paper over there, the birth year to birth year time spans naturally swept across a pretty vast period of time, and that that period of time was like, truly vast. I mean, it was 10 years to more than 400 years, with a median time span of like around three years. So overall, what it was pointing to was that, well, first of all, there was a very strong pattern of influences, like an 80-20 pattern, where a large fraction of the influence was coming from a very small fraction of influencers. And then if you combine that with the vast span of birth year to birth year time spans, if you kind of put those two things together, the kind of the conclusion to draw from that was that most of the contemporarily most popular American stand ups drew their inspiration from A small set of influencers who were themselves, spread across multiple centuries. So that was kind of an interesting thing, an interesting conclusion that I drew.

Rosemary Pennington
I'm looking at your visualizations of the influence chains from William Shakespeare to first Jerry Seinfeld and then to Larry David. And the thing that I was struck by looking at these is that the chain of influence to Larry David seems a little more direct than it seems to have been to Jerry Seinfeld. And I wonder, you know, what do you make of that, given that Seinfeld and Larry David are so, you know, tightly connected as far as comedians and producers. But also, were there chains that influence that you found particularly interesting as you were combing through this what must have been a vast bunch of data?

Sachin Date
That's right. So there's definitely a very large diversity in the structure of the influence chains. Now one thing to kind of keep in mind over there is that the data definitely has some degree of what we could consider as some form of, you know, non response bias, and that's because a large number of comedians simply don't have influence data mentioned about them on their Wikipedia pages. So, that's going to generate some kind of a bias, which is kind of similar to the sort of bias that one encounters on surveys, where people simply don't respond to the survey. So that's missing data bias associated with that kind of missing data. So there could very well be influences which are not represented accurately enough by the crafts that you see in the paper. And that's almost certainly because the data for them is simply not available. But at the same time, there is still, I think, enough data on Wikipedia to draw the conclusion that the influence networks of a lot of these comedians have a lot of diversity in them. Now going back to your question about some kind of interesting features about these graphs. Well, one of the things that I noticed fairly consistently was that Woody Allen seemed to be performing the role of what you might consider as a router of influence. So his position in the influence networks was such that he seemed to be routing over influences from what were essentially writers in the 1800s, 1700s, 1600s all the way back to William Shakespeare, over to the set of modern day American stand ups. So on one side of the craft there were a bunch of writers and humorists and playwrights, and on the other side of the craft were people who were largely American stand up comedians with Woody Allen. The node representing Woody Allen kind of sits in between. So that I found it interesting in the way that it, you know, this pattern repeated so often. The other thing, one other kind of interesting feature I ran into was just the lengths of some of these influence chains. So for instance, I observed like 20 long, really long chains of influence. And they were about, I think, 12 to 15 influences in each chain. And then, as you kind of go back in time, starting with present day comedians like Hassan Minaj or Michelle wolf or Taylor Tomlinson, if you kind of trace back the chains from comedians such as those, you slowly start hitting notes that represented comedians of the American vaudeville era of the early 1900s to late 1800s and then before that come the notes that represent comic writers like James Joyce or Ken Jeong, and then you keep following through on those chains until you kind of finally reach people like William Shakespeare in one instance, and then in another instance, Miguel de Cervantes, the creator of Don Quixote. So that's more than 400 years ago. So that's like more than four centuries of influence carrying over from me, well, the Cervantes, all the way to the 21st century comedians.

John Bailer
So what's next for you? I mean, you know, you've looked at this kind of connection here, of comedians, you mentioned some gaps that were in the Wikipedia study. And I think even in your article, you mentioned Lenny Bruce, not being within this influence graph. Do you have any thoughts of back filling some information that you thought were gaps, or are there sort of next projects that would be associated with these types of investigations?

Sachin Date
So with Lenny Bruce, one of the things I noticed was that a few previous studies on scholarly influence in general, not necessarily our district influences on comedians, but scholarly influence in general, those studies did mention Lenny Bruce. Those Lenny Bruce didn't really appear to be one of the major influences over there, but the moment you kind of look at Lenny Bruce's influence and the context of comedy, it kind of bubbles up to the top very quickly in terms of influence. The interesting thing about that is that there's simply, you know, not a whole lot of data available about some of these comedians, and in some cases, there's a lot of data available about others. So it's quite possible that Lenny's position in the influence structure is very heavily dependent by simply the availability of data associated with the comedian. Now, well, in terms of future work, one of the things I'd like to do is to essentially look at the influence structures of individual comedians and comic actors. So I mentioned Woody Allen. Woody Allen turned out to be a router of influences from writers to stand up comedians. So I'd like to inspect the influence structures around other famous personalities in this space to see if they are also routing over influences in a particular manner, from their influencers to the people who they influence. And then the other kind of natural extension to this study is to go beyond the contemporary, most popular American stand ups, which is what the focus of this study was, and then study all American stand ups, or maybe all comedians who have performed stand up of some kind all over the world, and then inspect the influence structures associated with that much, you know, much more, much more comprehensive set of comedians. So one of those things I've already done is a paper out recently from me, where I've extended this study out to include basically all American stand ups, and then studied the influence structures on that body of comedians. And one of the things I found was that a lot of the results of this paper in significance actually carried through very nicely in that bigger body of American stand ups as well.

John Bailer
Well, I'm afraid that's all the time we have for this episode of Stats and Stories. Sachin, thank you so much for joining us today.

Rosemary Pennington
Yeah. Thank you for being here.

Sachin Date
Thank you for having me.

John Bailer
Stats and Stories is a partnership between Miami University's departments of statistics and media, journalism and film and the American Statistical Association. You can listen to us on Spotify, SoundCloud, Apple podcasts, or other places. You can find podcasts and follow us on LinkedIn and Twitter. If you'd like to share your thoughts on the program, Send your email to stats and stories@miamioh.edu or check us out at stats and stories.net and be sure to listen for future editions of Stats and Stories, where we discuss the statistics behind the stories and the stories behind the statistics.