My New Tech Adventure: Final BrainStation Week #4 and Week#5 Recap

Christina Brown
7 min readFeb 22, 2022

Being a data analyst is harder than I thought.

Read up on my Part I, Part II, and Part III of my Brainstation journey.

On February 12, I successfully completed my BrainStation course with a great final group presentation and several lessons going into the data analytics world. The final step: waiting for the arrival of my certificate!

Week#4

My instructors had certain requirements for my final group project. It had to be presented as a PowerPoint or Google Slides deck. The dataset (or Excel spreadsheet) had to contain a certain number of categorical and numerical data. We had to form a hypothesis, summarize all of our data insights in less than a dozen slides, prove whether our hypothesis was correct or not, and present a well-developed conclusion. So in other words, my group had to learn how to confidently communicate our data in an intelligent way that excited our audience and answered all of their questions with little to no pushback. This is how data analysts, data scientists, and data engineers maintain respect and influence in the professional and academic worlds.

I thought that finding a dataset under these definite requirements would be a walk in the park. So, my group of three initially decided to split our time individually combing through datasets and particular topics that interested our fancies and then collectively come back together with our discoveries. I was personally researching Covid-19 and business-related datasets given the circumstances we are in now in this global pandemic. What could possibly go wrong?

Me before I started this final project:

Photo From Giphy.com

Me searching through dozens of datasets on Kaggle.com and a few federal government databases to find the ‘One’:

Photo From Giphy.com

Yes, I got humbled very quick. It probably took us nearly two weeks to find the right dataset because all the other datasets either had fewer columns than were required, an uneven number of categorical and numerical columns, or the data was very unformatted and would take hours to piece together a practical narrative. My Type A sensibilities were on high alert as I was scrolling through dozens of pages to find the right dataset. How can data practitioners properly use public datasets to build predictive models when the data is most likely incomplete, unformatted, and perhaps biased? Does this defeat the purpose of real scholarship in the field?

Maybe I’m a bit clueless, but I take professionalism in the same way that PhD candidates cite their resources in their scholarly papers before publishing out in the world. But let me stop rambling. :)

Days after going back and forth with my team members and my instructors, we finally reached a verdict: a sample S&P 500 index dataset that looked at the company stock price, stock volume, company earnings before interest, tax, depreciation and amortization (EBITDA), market cap, and revenue growth of the top 500 valuable companies in the world. Now that a feasible dataset was at our fingertips, we methodologically cleaned it in Excel and later in Tableau. We merged our three spreadsheets into one comprehensive spreadsheet — did I tell you that one of those spreadsheets contained over 750,000 rows of data that dated all the way back to December 2009?! Yeah, you heard that right! More about that later.

After we modified some of the column names and rows and saw what we were up against, I thought it was smooth sailing uploading our dataset from Excel to Tableau.

Photo From Complex.com

Yes, you guessed it. More roadblocks emerged. But before I drive in, I want to say that Tableau is an amazing piece of software — in all its brilliant and not-so-brilliant quirks. For all the non-data folks out there, Tableau is a very popular data visualization tool that allows you to transform your data from Excel, SQL, Google Analytics, ServiceNow, and a slew of other data oriented programs into complex visuals that can get your business stakeholders on your side. Think of it as Excel 2.0.

One of its Tableau’s main drawbacks is that if the imported data file is too large or the numerical and/or categorical data are not clearly labelled properly on the X-axis, Y-axis, or in the Marks section of Tableau, then you will be spending some additional time troubleshooting to get it to work properly. But if you have the patience and curiosity to sit down with your notes and watch some supplemental YouTube or Udemy tutorials as you poke around the dashboard, then Tableau is a worthwhile investment for you as a data practitioner. Just remember that Tableau is not for the single-minded.

Week #5

Remember those 750,000 rows of data? Well, it slowed down my computer a few times as we were trying to make sense of the data. I would, during my group sessions, cancel the integration process every time we wanted to use data from that Excel tab. Big mistake. It took us about a week to realize that if we ran the data without interruption, then we were able to create our amazing graphs. During this stage of our project, we decided to analyze 15 companies of the S&P 500 and out of that small subset, we picked five random companies that we thought were the safest companies to invest in for retail investors: JPMorgan Chase, American Airlines, Tesla, 3M, and Pfizer.

Keep in mind that we had several limitations. First, we could only analyze financial data collected on one single trading day (February 3, 2022) rather than data cumulated over a monthly, annual, or multiyear timeframe. Secondly, the stock volume data that we were able to extract from those 750,000 rows only contained data from 8 out of the 15 companies that we were surveying. Partial data = partial analysis = high probability that our hypothesis couldn’t be validated. Thirdly, since we had very limited data, we could only populate certain graph and table types on Tableau like the bar graph or scatter plot.

Some Quick Analysis

Despite these roadblocks and stress, my team successfully spent hours probing, tweaking, and packaging visual models into a neat presentation on our last day of class. The horizontal bar graph below is one of our prized possessions after learning the fundamentals of Tableau:

To give you a sneak preview into our research, we compared in this graph the February 3rd stock price and revenue growth rate of our 15 companies; the revenue growth rate is the percentage figure at the very end of each bar. Look at how Pfizer and American Airlines, for example, were some of the cheapest stocks on our list, yet had the first and second highest revenue growth rates respectively, outpacing big blue chip giants Microsoft, Walmart, and Tesla. Revenue growth is a positive factor for a company’s success — it indicates that after a company has paid its debts and expenditures, it has enough $$ to reinvest back into the organization to create better, more efficient products and services for its customers.

JPMorgan Chase, on the other hand, had the lowest (negative) revenue growth rate in the group. As a result, we could conclude that the firm hasn’t generated enough revenue this past year to make a profit or surplus. With those two variables in mind, would it be enough for investors to decide in purchasing American Airlines and Pfizer stock? Yes and no. Yes to the extent that negative revenue growth is a signal to potential investors that if they were to invest in JPMorgan Chase, then they may not see an increase in stock price so that they could sell the stock at a profit later down the line. Why would I buy one share of Tesla where I can buy nearly 60 shares of American Airlines at the same price and will more likely get more stock profit overtime?

However, there are several other factors that investors have to take into consideration when researching any security. American Airlines’ stock price increased at some point these last two years due to a strong demand for more international travel after millions of people have been stuck at home quarantining for months at an end during this pandemic. I wouldn’t be surprised if American Airlines as well as other airline stocks might see an increase in stock price and volume once more countries remove or loosen their mask and vaccine mandates globally and the virus is less of a nuisance. However, we don’t know if more deadlier COVID-19 variants are around the corner.

Photo from Ig.com

In addition, my group was given S&P 500 data from one particular date. If JPMorgan Chase’s negative revenue growth is a recurring trend, then I would have to analyze financial data from the last 5–10 years to get a complete picture. I would also have to look at other variables, its balance sheets, and annual shakeholder reports, and investigate how past and recent geopolitical trends have affected JPMorgan Chase’s stock price overtime. I would also have to analyze the company on a technical chart and see how it faired during bull and bear markets and other periods of economic growth and downturn.

As an investor, doing your due diligence on any company or sector is still the law of the land and that same rule also applies to data analysts who work with company data and need to conduct thourough market analysis to help give organizations their competitive edge. I’ve learned a lot about the field through this course and I hope to incorporate my new mindset in my current role at Microsoft as well as in my future side projects.

--

--