The South American Dilemma & the North American Answer. (This is a paper for those who think that data is important. If that's not you ... now you know.)

UPDATE: I can bring USA & CPS data into my GIS files! As well, the other data I'm looking to gather for international study is maybe available through a project that is no longer funded ( ); though I still have access to the code base and data base !!! 😃
I'm not sure when I'll get to that code, though it is access ... another chance to dream.

There is very little survey or census information on the Central and South American parts of the world; especially the countries along the Andes Mountain Range.

This is not a problem that can be solved now; though it can be acknowledged that Latin America is on a unique economic trajectory caught between a seemingly strengthening US dollar, and an economically slowing China. While unlike Africa's geographic expanse, there is a unique cohesion to the people in South America specifically, provided by both the Andes Mountains, and the Amazon Rainforest.

And so there is also an approach to one day gathering that information; though the question is, how might such data be optimized?

And the answer is, by having organized data that is ripe, and prepared, for comparison; which is what this paper will initiate.

The question regards the Hispanic identity; if one were to separate those who claim Hispanic origins, and those who do not, is there significant difference as to how the individual is affected as seen through the wage return on education?

I'm not sure what other work has been done in this area, though the point is not to discern if there is a significant difference; but rather to begin gathering data with GIS information, so as to map out trajectories from various geonomic (geographic & economic) locales, disambiguated by race and ethnicity.

How do wage returns increase for racial demographics within and outside of the Hispanic identity, across city lines, county lines and zip codes?

The Hypothesis

Though for the sake of the paper, what we would hope to see, or what the hypothesis will be, is as follows:

Those within the "Hispanic" community, as disaggregated-demographic communities, fare better than other non-whites within the United States.

This being due to intra-cultural identity cohesion, which enables the group as a whole to progress with greater internal inertia. Or does it?

In either case, we will be able to view the growth of the Hispanic community since the 1980's by looking at census data, in conjunction with GIS data, to follow the evolution.

We will also be able to compare population centers of varying sizes, along with urban and rural locations. Which is important for comparison, due to Latin America's growing urban population; along with their extensive expansion of fiber optic and mobile data reach, into the countryside.

I recognize that the goal is slightly ambitious, though the idea is to create a solid foundation upon which to start asking these questions. And hopefully informing others about these tools; as certain harmonized data is what may be key to unlocking some really fun research!

The Data

We will be using the National Historical Geographical Information System (NHGIS) data to obtain income and education data from the 1930's to 2020 US Census, along with spatial data that will enable us to pin-point various geonomic locations for comparison.

There is a possibility that we will expand to include the IPUMS-CPS collection as well, and maybe IPUMS-USA; though for the near term, likely only NHGIS.

Though it being important to note, that part of the inspiration for this project is work being done on the IHGIS project which will enable one to compare datasets by mean population size and area, across international dataset.

My goal being to achieve something similar within US datasets, with regard to the Hispanic identity. And so observe the evolutions through various geonomic avenues.

The Variables

There are various measures of income, and education.

Regarding income, we have the Aggregate Earnings in 1989 by Work Status in 1989 by Sex by Race and Hispanic Origin by Age.

Though we're only able to get Household income data for 1980.

Curiously, 1970 has a few demarcations for people of Spanish-origin; but this is an ambiguous description, and so likely out of scope for this paper.

Regarding education, there is the "Educational Attainment" for persons > 25 as a distinct dataset. Though the struggle is to find a corresponding dataset for the other years!

Though it seems as though there are enough tables on the topic to go back to 1980, thankfully; there are quite a few variations for income measures, and we'll need to inquire about which are best for our use case.

As well, we'll need to distinguish between individual and household income data. There is also a per capita income income variable that may be tracked.

The deciding factor on which variable to use will likely be which variable corresponds best to other available variables in the other census years.

At present, I have 40 source variables; the next idea, being a bit unusual.

I'm going to reach out to some co-workers, being that I work at IPUMS, to get their opinion on my research direction, and datasets/data tables; or other possible data tables that I could use. As well, about how I might stitch together data across the years?

Ergo, I don't have any introductory data either; though I'm hoping to have a solid collection by the end of the week.

(I'll update this post that time.)

The Conclusion (for now.)

There might also be a way to pull in IPUMS USA or CPS data as well, and overlay the GIS data; while using zip codes as the smallest increment for geographic measurement. Either way, it's a solid pursuit, with sufficient (spatial) data!

Which is a great place to start.

It might be nice to compare only those 25 and older as well, due to it being a more economically stable baseline; though this also presents an interesting question ... does making it to the age of 25 indicate a higher likelihood of surviving old age for marginalized people? In the US? Across the world?

Dr Tara McKay presented some interesting work on epigenetic aging and stress/trauma last December; which leads to a few more questions, I guess ... so being the adventure!!!

Because at day's end, the thesis is that the vast expanse of the United States, itself in economic transition, can still provide many varied and distinct economic topographies and trajectories; that also follow comprehensible patterns.

I'm not the world's biggest fan, but Malcolm Gladwell's Outlier book, talking about the logic of outliers such as Bill Gates, Micheal Jordan, etc ... I never read the book to be honest, meh ... it's the ideas that were important.

Though expanding on the idea and seeking to find sense in who succeeds, and how ... especially in reflection of the vast technology sector that has arisen since the 1980s.

And in this sense, compare the US in the 1980's to where developing countries may currently be poised; making this data ripe for investigation and comparison, or perhaps even models, for growth and census/survey collection in South America.


