Frosty-8 Poultry & Dairy Analytics

Project Overview

This project provides a complete analytics pipeline for poultry health monitoring and dairy farm management. It includes:

A Bidirectional LSTM model for predicting disease outbreaks in poultry farms
Data preprocessing and feature engineering pipelines
Interactive Power BI dashboards for dairy farm analytics
Jupyter notebooks for model development and experimentation

The system analyzes various poultry health parameters like mortality rates, feed consumption, environmental conditions, and predicts potential disease outbreaks with 98.6% accuracy.

Project Goal and Problem Statement

Goal: To develop a predictive analytics solution to identify potential disease outbreaks in poultry flocks early, enabling proactive intervention and minimizing economic losses.

Problem: Disease outbreaks in poultry farms can lead to significant mortality rates, reduced productivity, increased veterinary costs, and substantial financial losses. Traditional methods of detecting outbreaks often rely on manual observation or post-facto analysis, which can delay intervention and exacerbate the impact. This project aims to provide an early warning system using machine learning to mitigate these risks.

Dataset Overview

Data Source: poultry_health_data.csv (simulated data)

Key Features (Input Data):

date: The specific date of the observation.
flock_id: Unique identifier for each poultry flock.
day_of_flock_cycle: The current day in the flock's life cycle.
number_of_birds_start: Initial number of birds in the flock.
daily_mortality: Number of birds that died on a given day.
avg_weight_g: Average weight of birds in the flock (in grams).
feed_consumption_kg: Total feed consumed by the flock (in kilograms).
water_consumption_liters: Total water consumed by the flock (in liters).
shed_temperature_c: Average temperature in the poultry shed (in Celsius).
shed_humidity_percent: Average humidity in the poultry shed (in percentage).
ammonia_level_ppm: Ammonia levels in the shed (in parts per million).

Target Variable (Output Prediction):

disease_outbreak: A binary indicator (0 for no outbreak, 1 for an outbreak).
symptoms_observed: Details on symptoms if an outbreak occurred (informational).

Methodology and Model Selection

Task Type: Binary Classification (predicting disease_outbreak).

Approach: Given the sequential nature of daily farm records, Recurrent Neural Networks (RNNs) are chosen to capture temporal dependencies and patterns over time.

Primary Model:

Long Short-Term Memory (LSTM) Network: A powerful type of RNN specifically designed to handle long-term dependencies and overcome the vanishing gradient problem inherent in traditional RNNs. This allows the model to learn from trends and changes in various metrics over several days or weeks leading up to a potential outbreak.

Alternative/Considered Models:

Gradient Boosting Machines (XGBoost, LightGBM): Strong baselines for tabular data, capable of high accuracy and providing feature importance insights.
Random Forest: Another robust ensemble method for classification.
Logistic Regression: A simple, interpretable baseline model.

Key Steps in the Machine Learning Pipeline

Data Loading and Preparation:
- Reading the poultry_health_data.csv file.
- Converting date column to datetime objects and sorting data by flock_id and date.
- Feature Scaling: Normalizing numerical features (e.g., using MinMaxScaler) to a common range (0 to 1) for optimal neural network performance.
- Sequence Generation: Transforming the tabular data into sequences. For each flock_id, fixed-length look-back windows (e.g., 7 days of features) are created as input (X), with the disease_outbreak status for a subsequent day (e.g., the 8th day) as the target (y).
- Data Splitting: Dividing the sequences into training and testing sets (e.g., 80% training, 20% testing). Stratified splitting is used if the disease_outbreak class is imbalanced.
Model Architecture (TensorFlow/Keras):
- A Sequential Keras model is constructed.
- Typically includes one or more LSTM layers to process the time-series data.
- Dropout layers are added to prevent overfitting.
- Dense (fully connected) layers follow the LSTM layers, leading to a final output layer.
- The output layer uses a sigmoid activation function for binary classification (predicting the probability of an outbreak).
Model Training:
- The model is compiled with an adam optimizer and binary_crossentropy loss function (suitable for binary classification).
- Trained on the X_train and y_train data for a specified number of epochs with a defined batch_size.
- Early Stopping: A callback is used to monitor val_loss and stop training if validation loss doesn't improve for a certain number of epochs (patience), restoring the best weights to prevent overfitting.
Model Evaluation:
- The trained model's performance is evaluated on the unseen X_test and y_test data.
- Metrics: Beyond basic accuracy, critical metrics for imbalanced classification (like disease prediction) include:
  - Confusion Matrix: Provides counts of True Positives, True Negatives, False Positives, and False Negatives.
  - Precision: Of all predicted outbreaks, how many were correct.
  - Recall (Sensitivity): Of all actual outbreaks, how many were correctly detected (crucial for early warning).
  - F1-Score: Harmonic mean of precision and recall.
  - ROC AUC Score: Measures the model's ability to distinguish between outbreak and non-outbreak classes across various thresholds.

Output and Predictions

The model outputs a probability (between 0 and 1) for a disease outbreak. This probability is then converted into a binary prediction (0 or 1) using a threshold (typically 0.5, but adjustable based on desired precision-recall trade-offs).

Key Visualizations for Insights

Confusion Matrix Heatmap: To clearly show the number of correct predictions, false alarms, and missed outbreaks.
ROC Curve: To assess the model's discriminative power.
Precision-Recall Curve: Essential for understanding the trade-off between false alarms and missed detections, particularly for rare outbreak events.
Accuracy/Loss Over Epochs: To monitor training progress and diagnose overfitting/underfitting.
Predicted Outbreak Probability Trend per Flock: A line chart visualizing the model's estimated risk of an outbreak for each flock over time, allowing managers to identify rising risk.
Actual vs. Predicted Outbreak Events Timeline: A scatter plot or timeline showing when actual outbreaks occurred versus when the model predicted them, to assess the timeliness and correctness of alerts.

Potential Impact and Use Cases

Early Warning System: Provides proactive alerts for potential disease outbreaks, enabling timely intervention.
Reduced Economic Losses: By preventing or mitigating the spread of disease, the system can reduce mortality rates and associated financial impact.
Optimized Resource Allocation: Farm managers can prioritize attention and resources (e.g., veterinary visits, medication) to high-risk flocks.
Data-Driven Decision Making: Moves from reactive responses to predictive, evidence-based farm management.
Improved Animal Welfare: Healthier flocks due to early detection and intervention.
Enhanced Farm Efficiency: Streamlines monitoring and decision processes related to flock health.

Project Structure

frosty-8-poultry-dairy-analytics/
├── README.md                # Project documentation
├── codes.ipynb              # Jupyter notebook with model code
├── DairyFarm.pbix           # Power BI dashboard for dairy analytics
├── poultry_health_data.joblib # Serialized trained model
├── pyproject.toml           # Python project configuration
├── uv.lock                  # Dependency lock file
├── .python-version          # Python version
└── docs/                    # Documentation files

Key Features

Disease Prediction

Predicts poultry disease outbreaks with 98.6% accuracy using a Bidirectional LSTM model trained on historical farm data.

Real-time Monitoring

Analyzes key metrics like daily mortality, feed consumption, water usage, and environmental conditions in real-time.

Interactive Dashboards

Power BI dashboards provide visual insights into dairy farm operations and performance metrics.

Technical Implementation

Model Architecture

The poultry health prediction model uses a Bidirectional LSTM architecture with dropout layers for regularization:

model_bi = Sequential([
    Bidirectional(LSTM(units=50, return_sequences=True)), input_shape=(X_train.shape[1], X_train.shape[2])),
    Dropout(0.2),
    Bidirectional(LSTM(units=100)),
    Dropout(0.2),
    Dense(units=64, activation='relu'),
    Dropout(0.2),
    Dense(units=1, activation='sigmoid'))
])

Data Preprocessing

The data is normalized and transformed into sequences for the LSTM model:

# Normalize features
scaler = MinMaxScaler(feature_range=(0,1))
df[features] = scaler.fit_transform(df[features])

# Create sequences
sequences = []
labels = []
look_back = 14

for flock_id in df['flock_id'].unique():
    flock_df = df[(df['flock_id']== flock_id)].copy()
    
    if len(flock_df) > look_back:
        for i in range(len(flock_df) - look_back):
            seq_x = flock_df[features].iloc[i:i + look_back].values
            seq_y = flock_df[target].iloc[i + look_back]
            sequences.append(seq_x)
            labels.append(seq_y)

Technology Stack

Python TensorFlow Keras Pandas NumPy scikit-learn Jupyter Power BI Joblib

Model Performance

The trained model achieves excellent performance on the test set:

Test Loss: 0.0741
Test Accuracy: 98.6111%

Sample Predictions

Actual: 0, Predicted Probability: 0.0096, Predicted Class: 0
Actual: 0, Predicted Probability: 0.0115, Predicted Class: 0
Actual: 0, Predicted Probability: 0.0100, Predicted Class: 0
Actual: 0, Predicted Probability: 0.0089, Predicted Class: 0
Actual: 0, Predicted Probability: 0.0090, Predicted Class: 0