The pra folder contains scripts for practical machine learning and text classification tasks.
Description: Decision tree classifier for Iris dataset with visualization.
Dependencies: scikit-learn
, matplotlib
Code:
from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split from sklearn.tree import DecisionTreeClassifier import matplotlib.pyplot as plt from sklearn.tree import plot_tree iris = load_iris() X, y = iris.data, iris.target X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42) dt = DecisionTreeClassifier(random_state=42) dt.fit(X_train, y_train) plt.figure(figsize=(10, 8)) plot_tree(dt, feature_names=iris.feature_names, class_names=iris.target_names, filled=True) plt.show()
Description: Text classification pipeline using Naive Bayes on 20 Newsgroups dataset.
Dependencies: scikit-learn
, joblib
Code:
from sklearn.datasets import fetch_20newsgroups from sklearn.pipeline import Pipeline from sklearn.feature_extraction.text import CountVectorizer, TfidfTransformer from sklearn.naive_bayes import MultinomialNB import joblib train = fetch_20newsgroups(subset='train', shuffle=True) test = fetch_20newsgroups(subset='test', shuffle=True) model = Pipeline([ ('vect', CountVectorizer()), ('tfidf', TfidfTransformer()), ('clf', MultinomialNB()), ]) model.fit(train.data, train.target) joblib.dump(model, 'text_classifier.pkl')
Description: Text classification on 20 Newsgroups with hyperparameter tuning.
Dependencies: scikit-learn
Code:
from sklearn.datasets import fetch_20newsgroups from sklearn.pipeline import Pipeline from sklearn.feature_extraction.text import CountVectorizer, TfidfTransformer from sklearn.naive_bayes import MultinomialNB from sklearn.model_selection import GridSearchCV train = fetch_20newsgroups(subset='train', shuffle=True) pipeline = Pipeline([ ('vect', CountVectorizer()), ('tfidf', TfidfTransformer()), ('clf', MultinomialNB()), ]) parameters = {'clf__alpha': [0.1, 1.0]} grid_search = GridSearchCV(pipeline, parameters, cv=5) grid_search.fit(train.data, train.target)
Description: Empty Jupyter notebook checkpoint.
Dependencies: None
Code:
# Empty notebook