Technical Overview

PreppyData is a platform designed to make data preprocessing easily customizable. It adopts a modular architecture to provide flexibility and transparency.

Architecture

The application consists of the following key components:

  1. Preprocessing Engine - A backend system that performs data transformations based on the options selected by the user. - Handles preprocessing tasks such as data encoding, outlier detection, and missing value handling.

  2. Data Management Module - Manages data input and temporary storage while ensuring data security. - Supports data formats like CSV and XLSX.

Core Components

  1. Preprocessing Engine
    • Data Encoding Module

    • Offers various encoding techniques such as One-Hot Encoding and Label Encoding.

    • Converts categorical data into numerical formats suitable for analysis.

    • Outlier Detection Module

    • Provides methods like Z-Score, Interquartile Range (IQR), and Local Outlier Factor (LOF).

    • Identifies and processes outliers in the dataset.

    • Missing Value Handling Module

    • Offers strategies like Mean Imputation, Median Imputation, and Deletion.

    • Effectively handles missing data to improve dataset completeness.

  2. Data Management Module
    • File Handling System

    • Supports uploading data in CSV and XLSX formats.

    • Ensures data is securely managed during the session and not stored permanently.

    • Security Measures

    • Processes data locally to protect user privacy.

    • Prevents unauthorized access or transmission of data.

Underlying Technologies

  • Programming Language

  • Developed using Python for the backend.

  • Frameworks and Libraries

  • Pandas, NumPy: Used for efficient data processing and numerical computations.

  • Scikit-learn: Utilized for implementing preprocessing algorithms.

  • Flask or Django: Employed as the web framework for building the application.

  • Data Visualization

  • Matplotlib, Seaborn: Used as visualization tools for data analysis and evaluation.

Workflow Summary

  1. Data Upload - Users upload CSV or XLSX files.

  2. Selection of Preprocessing Options - Users choose the desired preprocessing methods.

  3. Data Processing - The Preprocessing Engine transforms the data based on selected options.

  4. Result Download - Users download the processed data in their preferred format.