Python

How to: Use Virtual Environments with JupyterLab Desktop

March 27, 2023
Python
Data Science

Using virtual environments is a very good idea when working on Python projects. Using Jupyter Lab desktop is a very good idea when working on data science projects. Using Jupyter Lab desktop with virtual environments is an especially good idea. Unfortunately, the documentation for how to do this is rather lacking. After some messing around, I was able to load up a virtual environment for a Python project from within Jupyter Lab desktop by doing the following: ...

Mann-Whitney U Test in Python

March 19, 2023
Python
Data Science, Statistics

Let’s say you have a dataset stored as a Pandas dataframe, df, with a numerical column, and another categorical column, and you want to compare the categories to determine if those different categories are statistically different from one another. A Mann-Whitney U test might be appropriate, especially if the fundamental assumptions for the more convention t-test are not met (e.g., variance across the groups are similar, disributions are mostly normal) ...

Train a model in Python from start to finish

January 12, 2022
Python
machine learning, Data Science

When it comes down to training and assessing a machine learning model in Python, the process tends to be pretty standard. The individual steps usually include: Accessing the data Preparing the data as appropriate. This step is highly dependent on the particular situation. Splitting the data into training, testing, and validation sets Training the model Tuning the model iteratively based on how it performing against the testing data Assessing the model against the validation data by examining the accuracy, confusion matrix, and other metrics reflected in a detailed report Identifying the most important features. ...