CST383: Reflection on Data Distributions and Pandas Aggregation

This week, I learned more about data distributions, probability, and how Pandas can be used for data analysis and aggregation. At first, I was confused about some of the concepts and coding exercises, especially when working with grouping, crosstabs, and understanding probability distributions. However, after rewatching the course videos and practicing with the other lab files, the topics became much clearer to me. I realized that repetition and hands-on practice really help me understand coding and data science concepts better.

One concept that took me some time to understand was the difference between PDFs and CDFs, especially how probabilities are interpreted for continuous variables. I was also initially confused about aggregation with grouping in Pandas because there were many different functions like groupby(), value_counts(), and aggregation methods being used together. After trying the labs multiple times, I started understanding how these functions work together to summarize and analyze datasets efficiently.

I also found it interesting how data science uses distributions and probabilities to make predictions. The bike-sharing lab helped me see how real-world datasets can be analyzed using Pandas operations without loops. Overall, this week improved my confidence in working with datasets, reading distributions, and using Pandas for analysis.

My CS Online Journey

Search This Blog

CST383: Reflection on Data Distributions and Pandas Aggregation

Comments

Post a Comment

Popular posts from this blog

Choosing Between MongoDB and MySQL

CST462S - From Learning to Impact: My Service Learning Journey

CST438: Hands-On System Testing and Cloud Technologies