The ml-2 folder contains scripts for machine learning tasks, focusing on exploratory data analysis and classification.
Description: Performs EDA on the Iris dataset, generating pairplots and profiling reports.
Dependencies: pandas
, seaborn
, matplotlib
, ydata_profiling
Code:
import pandas as pd import seaborn as sns import matplotlib.pyplot as plt from ydata_profiling import ProfileReport data = sns.load_dataset('iris') print("Missing Values:\n", data.isnull().sum()) numeric_cols = data.select_dtypes(include='number').columns data[numeric_cols] = data[numeric_cols].fillna(data[numeric_cols].mean()) sns.pairplot(data, hue='species', markers=["o", "s", "D"]) plt.show() profile = ProfileReport(data, title="Iris Dataset EDA Report", explorative=True) profile.to_file("Iris_EDA_Report.html")
Description: Logistic regression and random forest classifiers for binary classification on Iris dataset.
Dependencies: scikit-learn
, matplotlib
, seaborn
Code:
from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split from sklearn.linear_model import LogisticRegression from sklearn.ensemble import RandomForestClassifier from sklearn.metrics import confusion_matrix import seaborn as sns import matplotlib.pyplot as plt iris = load_iris() X = iris.data[iris.target != 2] y = iris.target[iris.target != 2] X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42) log_reg = LogisticRegression() log_reg.fit(X_train, y_train) rf = RandomForestClassifier(n_estimators=100, random_state=42) rf.fit(X_train, y_train)