Programming Fundamentals Project Structure Python March 30, 2026 12 min read 11 views

Python Project Structure for Data Science Projects | Best Practices

Learn how to structure your Python data science projects like a pro to avoid chaos, boost reproducibility, and improve your grades.

Structuring Python Projects for Data Science: Best Practices

1. Problem Introduction

It’s 11:53 PM, and your data science assignment is due in seven minutes. You’re desperately scrolling through a folder named final_project_v2. Inside, you find chaos: final_code.py, final_code_FINAL.py, data_clean.ipynb, data_clean_(copy).ipynb, and a mysterious output_final(1).csv. You know the code works because it ran ten minutes ago, but now you can’t remember which file creates the final chart, or where the cleaned dataset is hiding.

Sound familiar? We’ve all been there. You’re so focused on getting the code to run and the analysis to work that you completely forget to organize your files. This isn’t just about being tidy; this chaotic structure is actively working against you. It leads to lost points on assignments (professors can’t grade what they can’t find), wasted time, and a whole lot of unnecessary stress.

But it doesn’t have to be this way. Just like you wouldn’t start building a house without a blueprint, you shouldn’t start a data science project without a solid structure. In this guide, we’ll walk you through a simple, professional template for organizing your Python projects that will save your grades, your time, and your sanity.

2. Why It Matters

You might be thinking, “My professor only cares about the code, not my folder structure.” But a good project structure is more than just a fancy filing system. It’s a tool that directly impacts your success.

  • Higher Grades & Fewer Mistakes: A clear structure makes your work easy for a TA or professor to follow. When they can quickly find your methodology, your cleaned data, and your final results, they can focus on grading the quality of your work, not deciphering your file names. It also minimizes errors, like accidentally using raw data instead of cleaned data in your final analysis.
  • Boost Your Confidence: Imagine opening your project folder from last month and immediately understanding every single file. That confidence allows you to build on previous work, use it in your portfolio, and discuss it in job interviews without fear.
  • Essential for Collaboration & Your Career: In any internship or job, you will never work alone. You’ll be part of a team. If you push a messy, disorganized repository to GitHub, your colleagues will not be happy. A standard structure is a universal language that allows teams to work together seamlessly. It’s a non-negotiable professional skill.

     

    Feeling stuck right now? Book a 30-minute tutoring session and get personalized help to get your project on track.

     

3. Step-by-Step Breakdown

Here is the recommended structure we’ll build together. Don’t worry, it’s simpler than it looks.

Plain Text

your_project_name/
│
├── data/
│   ├── raw/          <- Immutable original data dumps.
│   └── processed/     <- Cleaned data ready for analysis.
│
├── notebooks/         <- Jupyter notebooks for exploration and explanation.
│
├── src/               <- Python source code for the project.
│   └── __init__.py
│
├── reports/           <- Generated analysis outputs (graphs, reports).
│   └── figures/
│
├── requirements.txt   <- List of project dependencies.
└── README.md          <- The top-level description of your project.

 

Let’s break this down step by step.

Step 1: Create Your Project Root and README.md

This is the top-level folder, the home base for your entire project. Name it something clear and descriptive, like customer_churn_analysis or housing_price_prediction.

Inside this folder, the very first file you should create is README.md. This is your project’s front door. It’s a simple text file written in Markdown that tells anyone (including your future self) what this project is about.

  • What it does: Explains the project’s purpose, how to set it up, and how to run the code.
  • Why it matters: It provides context. Without it, a folder is just a collection of files.
  • Example (README.md):

Python

# Customer Churn Prediction

This project analyzes customer data from a telecom company to predict churn.

## Project Goal
To build a classification model that identifies customers at high risk of churning, allowing the company to take proactive retention measures.

## Setup
1.  Clone this repository.
2.  Install dependencies: `pip install -r requirements.txt`
3.  Run the main script: `python src/build_model.py`

## Data Source
The raw data is in `data/raw/` and was sourced from the company's internal CRM.

 

Step 2: Organize Your data/ Directory

Data is the heart of your project, but it should never be modified by hand. This is why we split it into two subfolders.

  • data/raw/: This is your sacred, read-only archive. Any dataset you download or receive goes here, exactly as you got it. Never, ever edit these files directly.
  • data/processed/: This folder holds the transformed, cleaned, and engineered datasets that are ready to be fed into your machine learning models.
  • Why it matters: By keeping a pristine copy of the original data, you can always trace back your steps. If you make a mistake in your cleaning script, you can easily start over without having to re-download the data. It’s the foundation of reproducible research.
  • Concrete Example:Imagine you have a CSV file, customer_data.csv. You place the original in data/raw/. Your cleaning script, clean_data.py, reads from data/raw/customer_data.csv, removes duplicates, handles missing values, and saves the new, clean version to data/processed/cleaned_customer_data.csv.

    💡 Pro Tip: If your data is very large (multiple gigabytes), you might not want to store it directly in your project folder. In that case, you can keep a script in your src/ folder that downloads the data from a cloud source and saves it to the correct data/ subdirectory.

Step 3: Separate Exploration from Code with notebooks/ and src/

This is where most students go wrong. They do everything in a single Jupyter Notebook, resulting in a chaotic mix of exploration, data cleaning, and model building. We separate these concerns.

  • notebooks/: This folder is for Jupyter Notebooks (.ipynb files). Use notebooks for exploration and communication. This is where you poke and prod the data, visualize relationships, and experiment with different models. The final notebook you submit for an assignment should be a clean, well-documented narrative of your thought process, with a clear beginning (question), middle (analysis), and end (conclusion).
  • src/ (Source Code): This folder is for reusable Python code (.py files). Think of this as your project’s toolkit. This is where you put the functions and classes you write to perform specific tasks. For example:src/load_data.py: Functions for reading data from various sources.
  • src/clean_data.py: Functions for handling missing values, encoding categories, etc.
  • src/build_features.py: Functions for creating new features from existing ones.
  • src/train_model.py: The script to train your final model.
  • src/init.py: This empty file tells Python that the src folder is a package, allowing you to import your functions cleanly (e.g., from src import clean_data).
    Why it matters: This separation is crucial. Your reusable code lives in src/, so you can import and use it in any notebook. If you find a bug in your data cleaning function, you fix it once in src/clean_data.py, and every notebook that uses it is instantly fixed. Your notebooks become clean narratives, not tangled messes of code.

 

Python

# Example: Inside a notebook, you'd import your own code
from src import load_data, clean_data

# Load the data
df = load_data.from_csv('../data/raw/customer_data.csv')

# Clean it using your custom function
df_clean = clean_data.remove_outliers(df)

print("Data ready for analysis!")

 

Step 4: Store Your Outputs in reports/

Your project will generate outputs: plots, charts, HTML reports, and maybe even the final trained model. These go into the reports/ folder.

  • reports/: This folder is for all generated outputs. It helps to have a subfolder like reports/figures/ specifically for images (.png, .jpg). This keeps the root directory clean and makes it easy to find that final plot for your report or presentation.
  • Why it matters: Imagine having to rerun an entire analysis just to find the one chart you want to put in your paper. By saving all outputs to a dedicated folder, you can access them instantly. It also makes it easy to clean up and start fresh by simply emptying this folder.

Step 5: Track Your Dependencies with requirements.txt

This file is a simple list of all the Python libraries your project needs to run. You create it by running pip freeze > requirements.txt in your terminal after you have installed all the necessary packages for your project.

  • Why it matters: This is for collaboration and reproducibility. When someone else (or your professor, or your future self on a new computer) wants to run your project, they can create a new environment and run pip install -r requirements.txt to install the exact versions of every library you used. This ensures the code runs without those dreaded “ModuleNotFoundError” or version conflict issues.

     

    Ready to go deeper? Join our expert sessions for hands-on help building your first structured project.

     

4. Common Mistakes

Even with a blueprint, it’s easy to slip into old habits. Here are the most common pitfalls students face and how to avoid them.

1. The “One Giant Notebook” Trap:What it looks like: 

A single project.ipynb file with 200+ cells. Code for loading, cleaning, analyzing, and modeling is all jumbled together. Variables defined at the top are used in cells at the bottom without explanation.

  • Why students make it: It’s the default. You open a notebook, start coding, and it grows organically. It feels easier in the moment.
  • How to avoid it: The moment you write a function that you might use again, cut it and paste it into a .py file in your src/ folder. Start refactoring your notebook into sections with clear markdown headings. Aim for your final notebook to read like a story, not like raw code output.
     

2. Messy Data Directories:

  • What it looks like: A data/ folder with files named data.csv, data_final.csv, data_final_2.csv, and old_data.csv. You have no idea which one is the actual source of truth.
  • Why students make it: It’s quick to save a new version of a file without thinking about the consequences for future-you.
  • How to avoid it: Strictly enforce the raw/ and processed/ rule. The only files in raw/ should be the original, untouched files. The files in processed/ should be generated by a script, not by manually saving a new version from Excel.
     

3. Ignoring the init.py File:

  • What it looks like: You have a great src/ folder with well-written .py files, but you can’t import them in your notebook without messing with sys.path.
  • Why students make it: It’s not obvious why an empty file is so important.
  • How to avoid it: Simply create an empty file named init.py inside your src/ folder. This single action turns the folder into a Python package, allowing for clean imports like from src.my_functions import clean_data. It’s a tiny step with a massive payoff.
     

4. Forgetting the README.md:

  • What it looks like: You upload a zip file of your project to the learning management system. The TA has to guess what it does, how to run it, and what the results mean.
  • Why students make it: You think, “The code speaks for itself!” But it rarely does.
  • How to avoid it: Write the README.md first, at the very beginning of the project. Outline your goals. Then, as you build, you can update it. By the time you’re done, you have a perfect summary ready to go. It’s like writing the introduction of a paper last.
     

5. Hardcoding Paths:

  • What it looks like: In your code, you see lines like df = pd.read_csv(‘C:/Users/YourName/Downloads/project/data.csv’).
  • Why students make it: It works on your machine right now.
  • How to avoid it: Use relative paths. If your project is structured as we’ve described, you can read the raw data from ../data/raw/data.csv. This ensures your project is portable. It will work on any computer, as long as the folder structure remains the same. For more complex projects, you can use the pathlib library for robust path management.

5. FAQ Section

1. Do I need to create all these folders for every tiny homework assignment?
 

For a simple, one-script assignment, a full-blown structure is overkill. A single well-organized notebook might be fine. However, for any project with multiple steps, data files, or that you plan to use in your portfolio, this structure is a lifesaver. Use your judgment, but when in doubt, lean towards more organization.

2. What if my data is a single CSV file? Do I still need raw and processed?
 

Yes! The principle is still valid. Keep a copy of the original, untouched CSV in raw/. Then, in your notebook or script, load it, make your changes, and if you need to save the cleaned version, save it to processed/. This makes your analysis completely reproducible.

3. How do I handle very large datasets (e.g., 10GB)?
 

You shouldn’t store large datasets in your project folder, especially if you’re using Git. Instead, use a .gitignore file to exclude the data/ folder from version control. Then, create a script in your src/ folder (e.g., download_data.py) that downloads the data from a public URL or cloud storage and places it in the correct data/raw/ directory.

4. What is the difference between a .py file in src/ and a Jupyter Notebook in notebooks/?
 

Think of .py files as your project’s functions and tools (the kitchen), and the notebooks as your story and experiments (the dining room). You build your tools in the kitchen (.py files) and then use them in the dining room (notebooks) to create and present your final meal (analysis). You can reuse the same tools in many different notebooks.

5. How do I import my own functions from the src folder into a notebook?
 

If you have an init.py file in your src folder, and your notebook is in the notebooks folder, you can use relative imports. At the top of your notebook, you can write:

 

Python

import sys
import os
sys.path.append(os.path.abspath('..')) # Add the parent directory to the path
from src import your_module

 

A more robust way for complex projects is to install your project as a package, but for student work, this path adjustment is a common and simple solution.

6. Should I include my virtual environment (like venv or .env) in this structure?
 

No, absolutely not. You should never commit your virtual environment folder to version control. It’s large, system-specific, and can be recreated from your requirements.txt file. Make sure to add venv/ and .env/ to your .gitignore file.

7. Where do I put my trained machine learning model files (like .pkl files)?
 

It’s common to save trained models so you don’t have to retrain them every time. A good practice is to create a models/ folder at the same level as src/ and reports/ to store these serialized model files.

8. Is this structure only for data science?
 

While it’s tailored for data science with folders like data/ and notebooks/, the core principles (separating source code from outputs, having a README, tracking dependencies) are universal in software development. Learning this now will set you up for success in any programming role.

6. Conclusion

A chaotic project folder is a silent grade-killer. It creates stress, wastes time, and makes your hard work look unprofessional. By adopting this simple, logical structure—with dedicated folders for data, notebooks, src, and reports, along with a README.md and requirements.txt—you’re not just organizing files. You’re building a foundation for reproducibility, collaboration, and professional growth.

Start with your next assignment. Create the folders, write the README first, and commit to keeping your raw data sacred. 

The 15 minutes it takes to set up will save you hours of confusion later and will undoubtedly impress your professors and future employers. You’ve got the skills; now give them the structure they deserve to shine.

Ready to turn your project from chaos to clean? We’re here to help.

 

Read more articles on our blog to level up your programming skills.


Related Posts

Binary Search Explained: Algorithm, Examples, & Edge Cases

Master the binary search algorithm with clear, step-by-step examples. Learn how to implement efficient searches in sorted arrays, avoid common …

Mar 11, 2026
Creating a Python Project from Scratch | Step-by-Step Student Guide

Learn how to go from a blank screen to a running application with this step-by-step guide on creating a Python …

Mar 28, 2026
How to Approach Hard LeetCode Problems | A Strategic Framework

Master the mental framework and strategies to confidently break down and solve even the most challenging LeetCode problems.

Mar 06, 2026

Need Coding Help?

Get expert assistance with your programming assignments and projects.