Pra Folder

Overview

The pra folder contains scripts for practical machine learning and text classification tasks.

File: 1st.py

Description: Decision tree classifier for Iris dataset with visualization.

Dependencies: scikit-learn, matplotlib

Code:

                
                from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
import matplotlib.pyplot as plt
from sklearn.tree import plot_tree

iris = load_iris()
X, y = iris.data, iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

dt = DecisionTreeClassifier(random_state=42)
dt.fit(X_train, y_train)
plt.figure(figsize=(10, 8))
plot_tree(dt, feature_names=iris.feature_names, class_names=iris.target_names, filled=True)
plt.show()
            

File: textClassificationModel.py

Description: Text classification pipeline using Naive Bayes on 20 Newsgroups dataset.

Dependencies: scikit-learn, joblib

Code:

                
                from sklearn.datasets import fetch_20newsgroups
from sklearn.pipeline import Pipeline
from sklearn.feature_extraction.text import CountVectorizer, TfidfTransformer
from sklearn.naive_bayes import MultinomialNB
import joblib

train = fetch_20newsgroups(subset='train', shuffle=True)
test = fetch_20newsgroups(subset='test', shuffle=True)

model = Pipeline([
    ('vect', CountVectorizer()),
    ('tfidf', TfidfTransformer()),
    ('clf', MultinomialNB()),
])

model.fit(train.data, train.target)
joblib.dump(model, 'text_classifier.pkl')
            

File: .ipynb_checkpoints/2nd-checkpoint.py

Description: Text classification on 20 Newsgroups with hyperparameter tuning.

Dependencies: scikit-learn

Code:

                
                from sklearn.datasets import fetch_20newsgroups
from sklearn.pipeline import Pipeline
from sklearn.feature_extraction.text import CountVectorizer, TfidfTransformer
from sklearn.naive_bayes import MultinomialNB
from sklearn.model_selection import GridSearchCV

train = fetch_20newsgroups(subset='train', shuffle=True)
pipeline = Pipeline([
    ('vect', CountVectorizer()),
    ('tfidf', TfidfTransformer()),
    ('clf', MultinomialNB()),
])
parameters = {'clf__alpha': [0.1, 1.0]}
grid_search = GridSearchCV(pipeline, parameters, cv=5)
grid_search.fit(train.data, train.target)
            

File: .ipynb_checkpoints/Untitled-checkpoint.ipynb

Description: Empty Jupyter notebook checkpoint.

Dependencies: None

Code:

                
                # Empty notebook