Frosty-8 Poultry & Dairy Analytics

A comprehensive analytics solution for poultry health monitoring and dairy farm management using machine learning

View Dairy Farm PDF Reports

Project Overview

This project provides a complete analytics pipeline for poultry health monitoring and dairy farm management. It includes:

The system analyzes various poultry health parameters like mortality rates, feed consumption, environmental conditions, and predicts potential disease outbreaks with 98.6% accuracy.

Project Goal and Problem Statement

Goal: To develop a predictive analytics solution to identify potential disease outbreaks in poultry flocks early, enabling proactive intervention and minimizing economic losses.

Problem: Disease outbreaks in poultry farms can lead to significant mortality rates, reduced productivity, increased veterinary costs, and substantial financial losses. Traditional methods of detecting outbreaks often rely on manual observation or post-facto analysis, which can delay intervention and exacerbate the impact. This project aims to provide an early warning system using machine learning to mitigate these risks.

Dataset Overview

Data Source: poultry_health_data.csv (simulated data)

Key Features (Input Data):

Target Variable (Output Prediction):

Methodology and Model Selection

Task Type: Binary Classification (predicting disease_outbreak).

Approach: Given the sequential nature of daily farm records, Recurrent Neural Networks (RNNs) are chosen to capture temporal dependencies and patterns over time.

Primary Model:

Alternative/Considered Models:

Key Steps in the Machine Learning Pipeline

  1. Data Loading and Preparation:
    • Reading the poultry_health_data.csv file.
    • Converting date column to datetime objects and sorting data by flock_id and date.
    • Feature Scaling: Normalizing numerical features (e.g., using MinMaxScaler) to a common range (0 to 1) for optimal neural network performance.
    • Sequence Generation: Transforming the tabular data into sequences. For each flock_id, fixed-length look-back windows (e.g., 7 days of features) are created as input (X), with the disease_outbreak status for a subsequent day (e.g., the 8th day) as the target (y).
    • Data Splitting: Dividing the sequences into training and testing sets (e.g., 80% training, 20% testing). Stratified splitting is used if the disease_outbreak class is imbalanced.
  2. Model Architecture (TensorFlow/Keras):
    • A Sequential Keras model is constructed.
    • Typically includes one or more LSTM layers to process the time-series data.
    • Dropout layers are added to prevent overfitting.
    • Dense (fully connected) layers follow the LSTM layers, leading to a final output layer.
    • The output layer uses a sigmoid activation function for binary classification (predicting the probability of an outbreak).
  3. Model Training:
    • The model is compiled with an adam optimizer and binary_crossentropy loss function (suitable for binary classification).
    • Trained on the X_train and y_train data for a specified number of epochs with a defined batch_size.
    • Early Stopping: A callback is used to monitor val_loss and stop training if validation loss doesn't improve for a certain number of epochs (patience), restoring the best weights to prevent overfitting.
  4. Model Evaluation:
    • The trained model's performance is evaluated on the unseen X_test and y_test data.
    • Metrics: Beyond basic accuracy, critical metrics for imbalanced classification (like disease prediction) include:
      • Confusion Matrix: Provides counts of True Positives, True Negatives, False Positives, and False Negatives.
      • Precision: Of all predicted outbreaks, how many were correct.
      • Recall (Sensitivity): Of all actual outbreaks, how many were correctly detected (crucial for early warning).
      • F1-Score: Harmonic mean of precision and recall.
      • ROC AUC Score: Measures the model's ability to distinguish between outbreak and non-outbreak classes across various thresholds.

Output and Predictions

The model outputs a probability (between 0 and 1) for a disease outbreak. This probability is then converted into a binary prediction (0 or 1) using a threshold (typically 0.5, but adjustable based on desired precision-recall trade-offs).

Key Visualizations for Insights

Potential Impact and Use Cases

Project Structure

frosty-8-poultry-dairy-analytics/
├── README.md                # Project documentation
├── codes.ipynb              # Jupyter notebook with model code
├── DairyFarm.pbix           # Power BI dashboard for dairy analytics
├── poultry_health_data.joblib # Serialized trained model
├── pyproject.toml           # Python project configuration
├── uv.lock                  # Dependency lock file
├── .python-version          # Python version
└── docs/                    # Documentation files
                

Key Features

Disease Prediction

Predicts poultry disease outbreaks with 98.6% accuracy using a Bidirectional LSTM model trained on historical farm data.

Real-time Monitoring

Analyzes key metrics like daily mortality, feed consumption, water usage, and environmental conditions in real-time.

Interactive Dashboards

Power BI dashboards provide visual insights into dairy farm operations and performance metrics.

Technical Implementation

Model Architecture

The poultry health prediction model uses a Bidirectional LSTM architecture with dropout layers for regularization:

model_bi = Sequential([
    Bidirectional(LSTM(units=50, return_sequences=True)), input_shape=(X_train.shape[1], X_train.shape[2])),
    Dropout(0.2),
    Bidirectional(LSTM(units=100)),
    Dropout(0.2),
    Dense(units=64, activation='relu'),
    Dropout(0.2),
    Dense(units=1, activation='sigmoid'))
])
                    

Data Preprocessing

The data is normalized and transformed into sequences for the LSTM model:

# Normalize features
scaler = MinMaxScaler(feature_range=(0,1))
df[features] = scaler.fit_transform(df[features])

# Create sequences
sequences = []
labels = []
look_back = 14

for flock_id in df['flock_id'].unique():
    flock_df = df[(df['flock_id']== flock_id)].copy()
    
    if len(flock_df) > look_back:
        for i in range(len(flock_df) - look_back):
            seq_x = flock_df[features].iloc[i:i + look_back].values
            seq_y = flock_df[target].iloc[i + look_back]
            sequences.append(seq_x)
            labels.append(seq_y)
                    

Technology Stack

Python TensorFlow Keras Pandas NumPy scikit-learn Jupyter Power BI Joblib

Model Performance

The trained model achieves excellent performance on the test set:

Test Loss: 0.0741
Test Accuracy: 98.6111%
                

Sample Predictions

Actual: 0, Predicted Probability: 0.0096, Predicted Class: 0
Actual: 0, Predicted Probability: 0.0115, Predicted Class: 0
Actual: 0, Predicted Probability: 0.0100, Predicted Class: 0
Actual: 0, Predicted Probability: 0.0089, Predicted Class: 0
Actual: 0, Predicted Probability: 0.0090, Predicted Class: 0