This week, I learned more about data distributions, probability, and how Pandas can be used for data analysis and aggregation. At first, I was confused about some of the concepts and coding exercises, especially when working with grouping, crosstabs, and understanding probability distributions. However, after rewatching the course videos and practicing with the other lab files, the topics became much clearer to me. I realized that repetition and hands-on practice really help me understand coding and data science concepts better. One concept that took me some time to understand was the difference between PDFs and CDFs, especially how probabilities are interpreted for continuous variables. I was also initially confused about aggregation with grouping in Pandas because there were many different functions like groupby() , value_counts() , and aggregation methods being used together. After trying the labs multiple times, I started understanding how these functions work together to sum...
This week, I learned the basics of Python for data science and how tools like NumPy are used. I already have programming experience from my computer science classes, but Python feels different from languages like Java or C++. It is easier to write and more flexible because it does not require strict data types. This makes coding faster, but I also need to be careful to avoid mistakes. We also learned about the Python data science ecosystem, such as NumPy, Pandas, and tools like Google Colab and Jupyter Notebook. I liked using Google Colab because it is simple and runs in the browser, so I don’t need to install anything. However, I am curious when it is better to use local tools like Spyder or Jupyter instead of Colab. The most important concept for me this week was NumPy. I learned that NumPy arrays are much faster than Python lists because they store data in a continuous block of memory and use the same data type. This connects to what I learned in my algorithms class, where performan...