Skip to main content

Posts

Showing posts from May, 2026

CST383: Growing My Skills in Python and Data Visualization

This week, I learned more about working with data using Python, Pandas, NumPy, and visualization tools. I already have some experience with coding, so some parts felt familiar, especially reading code, testing outputs, and understanding how variables work. However, this week helped me practice applying those skills specifically to data analysis and visualization. One important thing I learned was how to choose the correct type of plot based on the variables. For example, a histogram is useful for showing the distribution of one numeric variable, a boxplot is helpful when comparing a numeric variable across categories, and a bar chart or count plot works well for categorical data. I realized that making a graph is not just about writing the code correctly. It is also about understanding what the question is asking and choosing a visualization that clearly answers it. I also practiced problems involving discrete distributions, such as binomial probability and expected value. These proble...

CST383: Learning Probability Distributions and Data Visualization in Python

This week, I learned more about probability distributions, density plots, histograms, and how to visualize data using Python libraries such as Pandas, Matplotlib, Seaborn, and SciPy. I practiced creating density plots, box plots, cumulative density plots, and histograms using real datasets. I also learned how changing things like bin width, bandwidth, transparency, and sample size can affect the appearance and interpretation of graphs. Another important topic was understanding skewness and how transformations such as log10 can help make heavily skewed data easier to analyze. One thing I found interesting was how probability density functions (PDFs) and histograms can represent the same data differently. Before this week, I thought graphs mostly showed the same information in different styles, but now I understand that each type of plot has a different purpose and can make patterns easier or harder to notice. I also learned that larger sample sizes tend to reflect the true distribution...

CST383: Reflection on Data Distributions and Pandas Aggregation

 This week, I learned more about data distributions, probability, and how Pandas can be used for data analysis and aggregation. At first, I was confused about some of the concepts and coding exercises, especially when working with grouping, crosstabs, and understanding probability distributions. However, after rewatching the course videos and practicing with the other lab files, the topics became much clearer to me. I realized that repetition and hands-on practice really help me understand coding and data science concepts better. One concept that took me some time to understand was the difference between PDFs and CDFs, especially how probabilities are interpreted for continuous variables. I was also initially confused about aggregation with grouping in Pandas because there were many different functions like groupby() , value_counts() , and aggregation methods being used together. After trying the labs multiple times, I started understanding how these functions work together to sum...

CST383 Week 1: Python for Data Science

This week, I learned the basics of Python for data science and how tools like NumPy are used. I already have programming experience from my computer science classes, but Python feels different from languages like Java or C++. It is easier to write and more flexible because it does not require strict data types. This makes coding faster, but I also need to be careful to avoid mistakes. We also learned about the Python data science ecosystem, such as NumPy, Pandas, and tools like Google Colab and Jupyter Notebook. I liked using Google Colab because it is simple and runs in the browser, so I don’t need to install anything. However, I am curious when it is better to use local tools like Spyder or Jupyter instead of Colab. The most important concept for me this week was NumPy. I learned that NumPy arrays are much faster than Python lists because they store data in a continuous block of memory and use the same data type. This connects to what I learned in my algorithms class, where performan...