Titanic Survival Prediction using Logistic Regression
π₯ Team Members
π― Problem Statement
The objective of this project is to predict whether a passenger survived the Titanic disaster or not using machine learning techniques. This is a binary classification problem where the output is either:
π Dataset Description
We used the Titanic dataset from Kaggle, which contains information about passengers such as:
The dataset is divided into:
π§Ή Data Preprocessing
Before training the model, we cleaned and prepared the data:
Removed unnecessary columns:
Handled missing values:
βοΈ Feature Engineering
We created new features to improve model performance:
π’ Data Encoding
Categorical variables like gender and embarked location were converted into numerical values using:
This is required because machine learning models only understand numerical data.
βοΈ Train-Test Split
The dataset was divided into:
This helps evaluate how well the model performs on unseen data.
π Feature Scaling
We applied StandardScaler to normalize the data so that all features have similar scale, which improves model performance.
π€ Model Used
We used Logistic Regression, which is suitable for classification problems.
Reasons:
π Model Training & Validation
π Model Evaluation
We evaluated the model using:
π Results
π This indicates the model performs well and generalizes properly.
π GitHub Collaboration
π§ Conclusion
This project demonstrates how machine learning can be used to solve real-world classification problems. By applying preprocessing, feature engineering, and Logistic Regression, we successfully built a model that predicts Titanic passenger survival with good accuracy.
π Future Improvements