Ideation - Srijan, Ronit, Soham, William,
Part 1: Predicting Titanic Survivability
Pull Dataset from Kaggle: Download the Titanic dataset from Kaggle or use the Kaggle API. Load the dataset into your Python environment using pandas. Check for missing values, outliers, and understand the dataset’s structure.
Clean Data: Preprocess the data by handling missing values (imputation or removal), encoding categorical variables, and scaling numerical features if necessary.
Create Model using Logistic Regression: Import the logistic regression model from scikit-learn. Define the features (X) and target variable (y) based on the dataset. Split the data into training and testing sets.
Train + Optimize Model: Train the logistic regression model on the training set and optimize it by tuning hyperparameters using techniques like grid search or randomized search. Use cross-validation to evaluate the model’s performance.
Run Model: Once the model is trained and optimized, evaluate its performance on the test set using metrics like accuracy, precision, recall, or F1-score. Use the model to make predictions on new data if needed.
Frontend + Backend: Accept user data from the frontend, send it to the backend where the model is applied to predict survival, and then send the output back to the frontend.
Part 2: Predicting Strokes
Pull Dataset from Kaggle: Use Kaggle’s API or download the dataset manually, then load it into your Python environment using pandas. Check for missing values, outliers, and understand the dataset’s structure.
Current Dataset: (Kaggle Link)[https://www.kaggle.com/datasets/fedesoriano/stroke-prediction-dataset/data]
Clean Data: Preprocess the data by handling missing values (imputation or removal), encoding categorical variables, and scaling numerical features if necessary. This step ensures the data is ready for modeling.
Create Model using Logistic Regression: Import the logistic regression model from scikit-learn, define our features (X) and target variable (y), split the data into training and testing sets, then fit the model on the training data.
Train + Optimize Model: Train the logistic regression model on the training set and optimize it by tuning hyperparameters using techniques like grid search or randomized search. Use cross-validation to evaluate the model’s performance.
Run Model: Once the model is trained and optimized, evaluate its performance on the test set using metrics like accuracy, precision, recall, or F1-score. Use the model to make predictions on new data if needed.
Frontend + Backend: We plan to accept user data from the frontend send it to the backend where the actual model is applied on the data where it then predicts an output and sends it to the frontend.
Current Coding Repo: (Github Link)[https://github.com/SrijDude3416/LLM-Detection]