Dani Gamerman is Emeritus Professor at Federal University of Rio de Janeiro, where he was professor of statistics from 1996 to 2019. He is the author of numerous books and research papers, and the StatPop blog. He was one of two statisticians who worked on a Science paper showing evidence of manmade earthworks deep in heart of Amazonia and is also the author of a Significance article examining the use of statistics to map this hidden history of the Amazon.
Episode Description
The Amazon has been imagined as a pristine wilderness, one in need of protection from development. This framing has often treated the Amazon as a place without history, practically untouched before the arrival of colonizers in South America. Statistics is helping show the history is much more complicated than that and it’s the focus of this episode of Stats and Stories with guest Dani Gamerman.
+Full Transcript
Rosemary Pennington
The Amazon has been imagined as a pristine wilderness, one in need of protection from development. This framing has often treated the Amazon as a place without history, practically untouched before the arrival of colonizers in South America, statistics is helping show the history is much more complicated than that, and it's the focus of this episode of stats and stories, where we explore the statistics behind the stories and the stories behind the statistics. I'm Rosemary Pennington, stats and stories is a production of the American Statistical Association in partnership with Miami University's departments of statistics and media, journalism and film. Joining me, as always, is regular panelist John Bailer, emeritus professor of statistics at Miami University. Our guest today is Danny Gamerman. He is emeritus professor at Federal University of Rio de Janeiro, where he was Professor of statistics from 1996 to 2019 he's the author of numerous books and research papers and the stat pop blog. He was one of two statisticians who worked on a science paper showing evidence of man made earthwork steep in the heart of Amazonia. Gammerman is also the author of a significance article examining the use of stats to map this hidden history of the Amazon. Danny, thank you so much for joining us today.
Dani Gamerman
The pleasure is all mine.
John Bailer
Danny, this is just this is a great treat. I have to tell you, I love the title of your science paper. I mean, you know, more than 10,000 pre Columbian earthworks are still hidden throughout the Amazon. I mean, there's mystery, there's drama, there's intrigue. I mean, it's, it's all there, but, but actually, that title has a lot of, a lot of ideas that that I'd like to just unpack before we talk about it further. In particular, can you tell a little bit about the time frame of what pre Columbian is and sort of the scope of area that is Amazonia
Dani Gamerman
the first thanks. Thank you for the invitation. I'm delighted, as I said before, I'm delighted to be here chatting with you, and I hope I can bring some useful information to listeners, the time frame for the occupation of Amazon is,
Dani Gamerman
I say, about 10,000
Dani Gamerman
years. As far as my collaborators tell me, it's not something that I delve in with to begin with. Now I'm on a project that is interested in taking it into account, but I came another disclaimer that I think is important. I just got the data and trusted the data that I got. Listen to the information they provided about the data my collaborators and then provided them with a solution that seemed fit to me and seemed to have pleased them as well. And it was successful because it got accepted. And so my knowledge about the intricacies of what went on in the data and what is the details about it. I hope to be able to provide you some light on it, but I'm not the expert on
Rosemary Pennington
it. No, I'm gonna say, but this is a vast territory that you and your collaborators were looking at. How do you in this? What is it? 7 million kilometers square kilometers, right? How do you identify an earthwork in in a place like this, where so much of it is going to be covered up? And what are you doing to help the collaborators do that?
Dani Gamerman
First of all, another clarification that needs to be made right from the start. Start is that when we arrived into the project, the data had already been collected, so we didn't interfere with the data production in any part of it during the conversations with them about how should we go about analyzing the data I saw, I learned some details about the data itself. Like many ecological and archeological data sets, this data is collected in a large number of ways, many different ways. There's the standard way where you have the standard, statistically speaking, standard way of having cases and absences, case control scenario, which is where from what I see from the literature. But this is one kind of data set that is available, one way of. Getting data sets. The other one is what they called Citizen Science, which is something that was entirely new to me, or a few to me a few years ago, which refers to interested people with some or a lot of knowledge on the subject that they were trying to ascertain some information from the field. When going into the field and measuring whatever they can see and registering them in a way they could which is not perfect, is entirely the opposite of our standard case control scenario of sampling in the in the nice, air conditioned room of statistical analysis. So that was exciting for me to having to deal with it, because you have to deal with it. You cannot pretend it's regular data without bias. It has lots of biases acknowledge it that must be taken into account. Additionally, there are also other kinds of bias data, which is are collected from satellites and from airplanes flying over the region, either to collect data or they were just passing by to collect all the kinds of data sets, and they may happen to be able to collect some data. So there's lots of ways of getting into into this data collection process, and all of them are being used. And there is another kind of data collection which tries to filter out the cover of the of the jungle, of the forest that I've said, to uncover what is lying under the tree cover. And sometimes they in, they are getting some progress. I cannot quantify the progress, but they are getting some progress in this kind of data collection as well. And some of the data that we are have analyzed have been obtained in this way. So there it's a full bag of different ways of data collection that was being assembled five for us to analyze it. So, so let's,
John Bailer
let's talk a little bit about the data that you have. So the there were about 960 data points that from this than this analysis and these. Each one of these was an earthwork along with a bunch of characteristics of the site where it was found, ranging from its location to the plant species that were there to temperature, moisture and other things. Can Can you tell us what? What is an earthwork?
Dani Gamerman
First of all, right, let me try to go one step back, what our the study that we entered into is situated in an area where they try to understand, or at least to evaluate and Describe how previous inhabitants of the forest, in this case, of the region of interest, in a broader sense, influenced the current status of the current composition of the land of the area we are analyzing the forest in Our case, the Amazon forest in our case, so one
Dani Gamerman
way, obviously,
Dani Gamerman
to evaluate how
Dani Gamerman
predecessors, our predecessors, influenced the forest, is to find, first of all, remains of where was a sign of intervention that they were there. Humans were there. So there are many different ways, different ways of getting them and our the intervention that we used in our analysis is one of them it's consisting of structures that are built in in the land, like fans or small pond to collect water for usage. Any kind of construction, I must be careful here, because I don't know if it's any kind of construction, kinds of constructions that were being cited.
John Bailer
Okay, so there were, it seemed like in some of the pictures that that you included in the science paper, there were often these regular, geometric, geometric shapes that would be suggestive that you know that that that somehow. Where these, these, these indigenous peoples were, were moving the earth around to try to create these shapes. Do it? Did your colleagues talk at all about why these shapes were created?
Dani Gamerman
No, they well. Conversation that came to me was always that this is the sign of human intervention that we are going to use in our understanding of and in our representation of human presence.
Rosemary Pennington
As you were talking about this, I was thinking about so where we are in Ohio, there are lots of of mounds, and a lot of them are, if you didn't know it was a mound, you would just think it's a small hill. Because of the way, over time, the land has reclaimed it statistically, as you were combing through this data, how? How can you, how can you be sure that what you're looking at is is something that has been man made, or an earth work to be moved around, and not just a sort of natural formation?
Dani Gamerman
As I said, I'm going to disappoint you once again. So
Rosemary Pennington
Stans were not involved. I
Dani Gamerman
was not involved in this ascertainment of whether that was a valid representation of ancestors intervention, and I'm assuming they know what they are doing, yeah, but I didn't go into that part. I, as I said, we assume that the data that collected was a pure, genuine representation of human
Dani Gamerman
ancestors.
Dani Gamerman
That's what we assume that because they assume that. Yeah, so I couldn't there are many queries that I had to them. This was not one of them assumed that was as pristine as one can get.
John Bailer
So I bet one of the questions that you asked some of your your colleagues were, what are the kind of variables do you think are important for determining whether or not an earthwork would be present? What were some of the variables that that you were considering that would help you predict the presence of an earthwork?
Dani Gamerman
The way you ask me this question is the way I used to think about my collaborators as well. You don't have to go that way. All that way. They know what they want. They have some statistical training. Ecologists, at least the ones I know. It's a minority. I don't know them all, but they have good training, and I think we have to thank R, our software for that, because they are being trained in R, they know how to use to do some base, at least basic analysis. They are taught in the graduate courses, so they know the idea of of collecting data and the general idea of regression is not something that is a strange object for them. They know that they there is an explanation, and they know the explanation could be at least partially expressed in terms of variables of interest that they have to collect. So they know that I didn't This is a piece of work that I didn't need to get involved because the idea of using a regression is something very natural for them. So
John Bailer
what variables did? They did? They did they tell you about
Dani Gamerman
there. There are many kinds of variables that can affect the presence of earthwork and confession, topographical, geological, meteorological, climatic and the use of the content of the soil is very important as well, because it relates To the how productive the land can be they, they know that all that, they know they need to use all that, and they have worked hard to have that kind of data available for their analysis. Not, I'm not just talking about my research group, I'm talking about general ecologist so they they know pretty much a lot of they don't know the detail. Most of them don't know the details or how to perform a regression, how to perform inference, precise, but they have a very good idea of what they need, and what they need in terms of not only data, but methods as well. You're
Rosemary Pennington
listening to stats and stories, and we're talking about stats and Amazonia with Danny Gamerman, Danny, so you were working on this project with some 200 people, and it was two statisticians, which sounds like a monumental. Work, bit of work that you and the other statisticians do. What was your intervention in all of this, and how were you working with this massive data set?
Dani Gamerman
First of all, let me explain a few details of what it about our group did.
Dani Gamerman
In fact, our paper has more than 200
Rosemary Pennington
Oh, more than okay, great. More than 200 220
Dani Gamerman
or something like that.
Dani Gamerman
Most of these were collaborators because they provided us with information that was thought useful for our enterprise we work in a group of five people, not 205 people, which was called the core team of this project. Those are the people who really got involved in in the developments of the paper. But in like in many other areas, unlike statistics, if they provide some information, data, for example, they were included as a co author. Data is very valuable. We know that we don't do anything without data, but they value data in the way that maybe more than we do, because they know how important it is. We can think of theorems and things like that to get by. They if they don't have data, they don't have anything, so they validate a lot. And so that's how most of the collaborators came into the project. But the meetings, the reunions, and the discussions on the analysis and how collect, how trustworthy the data was, and all and whatever you like, was done in in regular meetings with with a group of five, the two statisticians, as you said, and the other three collaborators, which are one ecologist and two people who are working on analysis of phenomena around the forest, firmly based on remote sensing background. So that's the group that I saw, I didn't see any of the 200 or 220 that would be. I was having a conversation with, with another ecologist that I know for other reasons. And they said, No, but we did it that. And they said, We, I know. I am one of the co authors. Oh, course,
John Bailer
you know, it's, I'm just trying to think about, you know, 960 data points is not a lot of data. When you have 7 million squared kilometers of area, you know, there's a so and you describe this, that the data set is presence only data, you know, so it's so you're you found these from a variety of different ways that you've encountered these particular 960, observations. What would when you're trying to then analyze these data? What? What are the kinds of questions that you want to be able to to answer when you model these data
Dani Gamerman
before answering your question. I need to clarify another issue and a boring issue, a technical issue which has to do with statistics.
John Bailer
That's not a boring issue. That's a that's a key issue. You know who you're talking about. People are on the edge of their seats waiting for this answer now. Danny
Dani Gamerman
boy, okay, so the first message of disappointment is that our data is not only 960 long, is infinitely long because our basic approach the problem, not our but the area analyzes this kind of data, and we follow that approach is using the idea the background and the tools of point process analysis. Point pattern data, which is the pipe kind of data that is analyzed using point processes, is an infinitely dimensional objects where you see infinite numbers of absences of occurrences and a finite number of occurrence with probability one. So our data is not only 960 points where we saw occurrences is the other infinitely many points in the continuum of the forest of the 7 million kilometer square kilometer area where we didn't see anything. So that's our standpoint. So after all that, I think I may have forgotten some of the questions that you are asking me, but I need to make this clarification. Yeah, you wanted to ask me about the relevance of
John Bailer
the well, I was, you know, and I was in reading your paper, you talked about having, you had some data from one kilometer grid areas, and based on this 7 million square kilometer total area, and so, so hence, there's a lot of missing. There's a lot of places where you don't see any of these earthworks, or at least don't see them yet. Yeah. So my, my question was to say, Where, what are you trying to predict for this, for this area, you know? So you have, you have information where you've observed 760 earthworks. You have a lot of places where you've not observed it yet. What? What is the what's the goal for the analysis,
Dani Gamerman
one of the things we we, we try to predict, as you correctly put it, is, are there any other earth works with? Have not seen where are they and how many of them? This is, this is crucial, I think, for the success. I hope I able to explain it better, but it was crucial for our take at the subject. Because normally, as far as I understand, standard practice assumes that there are observability issues, take into account sampling bias. Obviously has to do that, but don't have in their modeling structure a specific object that provides exactly that provides the exact point process of unobserved occurrences of whatever you are interested in analyzing, in addition to the standard approach in any regression, which is to evaluate the relevance of the collaboration of the explanatory variables that are included in the analysis as well. So the other issue that is important is that in order to perform the analysis well, we need to have as fine agreed over the area of interest as possible. Because, remember, a point part data is data on every single point in the region of interest. So we thought, and I think it's reasonable what they they had to offer, which is to work on one square kilometer, grid cells, pixels. It's fine enough, because you are had considering 7 million of them this you can always think of on one square kilometer as a point, and that's what we did. They have. They have this is important, and it's something that surprised me. They have a number, not all, all quantity of that might interest you, but they have many quantities of relevance, like temperature, content, pH, of the soil and amount of nutrients in the soil. They have that data sets for the whole of the Amazon read, don't ask me how. I don't know how, but they have information and every square kilometer pixel in the jungle, wow, I find it hard to believe, just as you, that this is exactly the correct measurement of the but they have that, and People use that, everybody, not just ourselves, every year is data banks. And there are lots of data banks. Excuse me, there are lots of data banks that you they one half of one has available for his her analysis of the data, not only the for the Amazon, but many other places, many other forests, many other regions of the world for analysis, which is amazing. I'm astonished to hear that, but I'm enjoying, I'm enjoying the assumption that we have this data available and used it, and everything we got was our obtained through this data. But there are sometimes it's not well defined because there are many data banks. Some has some intercepts. Some are over disjoint regions, and you need to be able to master the craftsmanship of of assembling all that into getting your data set with 7 million rows for the variable, for all the variables of interest in your analysis,
John Bailer
there's, there's, kind of, there were two things that that came to mind immediately for me. One was in your paper, you also talked about there were different plants, different types of plant species around where you found. Earthworks. But that the question that I was having is came to mind for me was, have people taken the analysis that you've done and discovered new earthworks at some of the locations that you said were that your model suggested were highly likely to contain such I
Dani Gamerman
hope so. I don't have the answer for that. The paper appeared not so long ago. I know I checked a few weeks ago, and the paper was already cited in 30 other papers already published so and we didn't. It's not our paper that boomed the search for new sites of human intervention. This is something that archeologists are very eager on getting these days, not just because of paper. It's an interest that many people have in archeology and ecology to learn about it so they don't need my paper for that what, what our paper did was to give numbers. That's what I like about it. It gave numbers to the thoughts that they were nurturing. And I don't, and I think that's the was the success of the of the project is to be able to provide numbers, and the numbers we provided, thankfully, were not in a different space of the numbers that they were thinking about without knowing how to get them and we got them for them.
Rosemary Pennington
Well, that's all the time we have for this episode of stats and stories. Danny, thank you so much for being here today.. Stats and Stories is a partnership between the American Statistical Association and Miami University departments of statistics and media, journalism and film. You can follow us on Spotify, Apple podcast or other places where you find podcasts. If you'd like to share your thoughts on the program, Send your email to statsstories@amstat.org or check us out at statsandstories.net and be sure to listen for future editions of stats and stories where we discuss the statistics behind the stories and the stories behind the statistics.