24 Commits (e9738886ac6a680511d30b72c89c74f0998b0179)

Author SHA1 Message Date
johnlockejrr f298643fcf
Fix `ReduceONPlateau` wrong logic
# Training Script Improvements

## Learning Rate Management Fixes

### 1. ReduceLROnPlateau Implementation
- Fixed the learning rate reduction mechanism by replacing the manual epoch loop with a single `model.fit()` call
- This ensures proper tracking of validation metrics across epochs
- Configured with:
  ```python
  reduce_lr = ReduceLROnPlateau(
      monitor='val_loss',
      factor=0.2,        # More aggressive reduction
      patience=3,        # Quick response to plateaus
      min_lr=1e-6,       # Minimum learning rate
      min_delta=1e-5,    # Minimum change to be considered improvement
      verbose=1
  )
  ```

### 2. Warmup Implementation
- Added learning rate warmup using TensorFlow's native scheduling
- Gradually increases learning rate from 1e-6 to target (2e-5) over 5 epochs
- Helps stabilize initial training phase
- Implemented using `PolynomialDecay` schedule:
  ```python
  lr_schedule = tf.keras.optimizers.schedules.PolynomialDecay(
      initial_learning_rate=warmup_start_lr,
      decay_steps=warmup_epochs * steps_per_epoch,
      end_learning_rate=learning_rate,
      power=1.0  # Linear decay
  )
  ```

### 3. Early Stopping
- Added early stopping to prevent overfitting
- Configured with:
  ```python
  early_stopping = EarlyStopping(
      monitor='val_loss',
      patience=6,
      restore_best_weights=True,
      verbose=1
  )
  ```

## Model Saving Improvements

### 1. Epoch-based Model Saving
- Implemented custom `ModelCheckpointWithConfig` to save both model and config
- Saves after each epoch with corresponding config.json
- Maintains compatibility with original script's saving behavior

### 2. Best Model Saving
- Saves the best model at training end
- If early stopping triggers: saves the best model from training
- If no early stopping: saves the final model

## Configuration
All parameters are configurable through the JSON config file:
```json
{
    "reduce_lr_enabled": true,
    "reduce_lr_monitor": "val_loss",
    "reduce_lr_factor": 0.2,
    "reduce_lr_patience": 3,
    "reduce_lr_min_lr": 1e-6,
    "reduce_lr_min_delta": 1e-5,
    "early_stopping_enabled": true,
    "early_stopping_monitor": "val_loss",
    "early_stopping_patience": 6,
    "early_stopping_restore_best_weights": true,
    "warmup_enabled": true,
    "warmup_epochs": 5,
    "warmup_start_lr": 1e-6
}
```

## Benefits
1. More stable training with proper learning rate management
2. Better handling of training plateaus
3. Automatic saving of best model
4. Maintained compatibility with existing config saving
5. Improved training monitoring and control
7 days ago
johnlockejrr 7661080899
LR Warmup and Optimization Implementation
# Learning Rate Warmup and Optimization Implementation

## Overview
Added learning rate warmup functionality to improve training stability, especially when using pretrained weights. The implementation uses TensorFlow's native learning rate scheduling for better performance.

## Changes Made

### 1. Configuration Updates (`runs/train_no_patches_448x448.json`)
Added new configuration parameters for warmup:
```json
{
    "warmup_enabled": true,
    "warmup_epochs": 5,
    "warmup_start_lr": 1e-6
}
```

### 2. Training Script Updates (`train.py`)

#### A. Optimizer and Learning Rate Schedule
- Replaced fixed learning rate with dynamic scheduling
- Implemented warmup using `tf.keras.optimizers.schedules.PolynomialDecay`
- Maintained compatibility with existing ReduceLROnPlateau and EarlyStopping

```python
if warmup_enabled:
    lr_schedule = tf.keras.optimizers.schedules.PolynomialDecay(
        initial_learning_rate=warmup_start_lr,
        decay_steps=warmup_epochs * steps_per_epoch,
        end_learning_rate=learning_rate,
        power=1.0  # Linear decay
    )
    optimizer = Adam(learning_rate=lr_schedule)
else:
    optimizer = Adam(learning_rate=learning_rate)
```

#### B. Learning Rate Behavior
- Initial learning rate: 1e-6 (configurable via `warmup_start_lr`)
- Target learning rate: 5e-5 (configurable via `learning_rate`)
- Linear increase over 5 epochs (configurable via `warmup_epochs`)
- After warmup, learning rate remains at target value until ReduceLROnPlateau triggers

## Benefits
1. Improved training stability during initial epochs
2. Better handling of pretrained weights
3. Efficient implementation using TensorFlow's native scheduling
4. Configurable through JSON configuration file
5. Maintains compatibility with existing callbacks (ReduceLROnPlateau, EarlyStopping)

## Usage
To enable warmup:
1. Set `warmup_enabled: true` in the configuration file
2. Adjust `warmup_epochs` and `warmup_start_lr` as needed
3. The warmup will automatically integrate with existing learning rate reduction and early stopping

To disable warmup:
- Set `warmup_enabled: false` or remove the warmup parameters from the configuration file
1 week ago
johnlockejrr 451188c3b9
Changed deprecated `lr` to `learning_rate` and `model.fit_generator` to `model.fit` 7 months ago
vahidrezanezhad c502e67c14 adding foreground rgb to augmentation 9 months ago
vahidrezanezhad f31219b1c9 scaling, channels shuffling, rgb background and red content added to no patch augmentation 9 months ago
vahidrezanezhad 95bbdf8040 updating augmentations 9 months ago
vahidrezanezhad 743f2e97d6 Transformer+CNN structure is added to vision transformer type 12 months ago
vahidrezanezhad f1fd74c7eb transformer patch size is dynamic now. 12 months ago
vahidrezanezhad 2aa216e388 binarization as a separate task of segmentation 12 months ago
vahidrezanezhad 41a0e15e79 updating train.py nontransformer backend 12 months ago
vahidrezanezhad 815e5a1d35 updating train.py 12 months ago
vahidrezanezhad 4e4490d740 machine based reading order training is integrated 1 year ago
vahidrezanezhad a7e1f255f3
Update train.py
avoid ensembling if no model weights met the threshold f1 score in the case of classification
1 year ago
vahidrezanezhad 8d1050ec30 inference script is added 1 year ago
vahidrezanezhad 38db3e9289 adding enhancement training 1 year ago
vahidrezanezhad dbb84507ed integrating first working classification training model 1 year ago
vahidrezanezhad d27647a0f1 first working update of branch 1 year ago
cneud 02b1436f39 code formatting with black; typos 1 year ago
cneud 5f84938839 update parameter config docs (fix #11) 1 year ago
vahidrezanezhad 522f00ab99 adjusting to tf2 1 year ago
vahid 4bea9fd535 continue training, losses and etc 4 years ago
vahid 5fb7552dbe first updates, padding, rotations 4 years ago
vahidrezanezhad bb212daf0b
Update main.py 6 years ago
Gerber, Mike 4897fd3dd7 📝 howto: Be more verbose with the subtree pull 6 years ago