Data Analysis

We use PowerBI or Tableau or Python or -

we can do it ourselves :>) We've been programming with data for decades.

Data analysis is a process of collecting, inspecting, cleaning, transforming, and modelling data with the goal of discovering useful information, informing conclusions, and supporting decision-making. (Wikipedia)


A simple example: In looking for a data set to make this page with, I found myself in search of a question I'd enjoy answering. I have an interest in economics and found myself browsing around the BLS website (bls.gov) were I came across some data on unemployment for each US state. I then wondered if there would be any correlation to state tax rate.


I searched but couldn`t find anyone answering the question so - eventually I found a table that gave information on total tax rate by state. I then downloaded both tables as text files,



and saw they both lacked field separators, and so I had to manually place commas after each data point. I then merged both tables so a row in the new table contained the state as well as tax and unemployment data. While any SQL query could have done the merge I used MS Access. In doing so I was able to see issues with missing cells. (so here we have collecting, inspecting, cleaning and transforming)


I ran the data through first Microsoft`s PowerBI and then Tableau and various Python charting tools in search of the best looking visual.


From the Python Bokeh charting library


From PowerBI


And the same data in Tableau: here.


The chart suggested some correlation but not a very strong one, so using Excel I ran the `correl` or correlation function - which reported it as around .25; positive but not very strong. Puzzled by this I eventually found an article at the Brookings Institute here which suggested prior research also found a weak relationship between state tax and local economic success.


These steps are typical in data analysis - and could be essentially the same on a data set many times larger.