Commit graph

24 commits

Author SHA1 Message Date
johnlockejrr
f298643fcf
Fix ReduceONPlateau wrong logic
# Training Script Improvements

## Learning Rate Management Fixes

### 1. ReduceLROnPlateau Implementation
- Fixed the learning rate reduction mechanism by replacing the manual epoch loop with a single `model.fit()` call
- This ensures proper tracking of validation metrics across epochs
- Configured with:
  ```python
  reduce_lr = ReduceLROnPlateau(
      monitor='val_loss',
      factor=0.2,        # More aggressive reduction
      patience=3,        # Quick response to plateaus
      min_lr=1e-6,       # Minimum learning rate
      min_delta=1e-5,    # Minimum change to be considered improvement
      verbose=1
  )
  ```

### 2. Warmup Implementation
- Added learning rate warmup using TensorFlow's native scheduling
- Gradually increases learning rate from 1e-6 to target (2e-5) over 5 epochs
- Helps stabilize initial training phase
- Implemented using `PolynomialDecay` schedule:
  ```python
  lr_schedule = tf.keras.optimizers.schedules.PolynomialDecay(
      initial_learning_rate=warmup_start_lr,
      decay_steps=warmup_epochs * steps_per_epoch,
      end_learning_rate=learning_rate,
      power=1.0  # Linear decay
  )
  ```

### 3. Early Stopping
- Added early stopping to prevent overfitting
- Configured with:
  ```python
  early_stopping = EarlyStopping(
      monitor='val_loss',
      patience=6,
      restore_best_weights=True,
      verbose=1
  )
  ```

## Model Saving Improvements

### 1. Epoch-based Model Saving
- Implemented custom `ModelCheckpointWithConfig` to save both model and config
- Saves after each epoch with corresponding config.json
- Maintains compatibility with original script's saving behavior

### 2. Best Model Saving
- Saves the best model at training end
- If early stopping triggers: saves the best model from training
- If no early stopping: saves the final model

## Configuration
All parameters are configurable through the JSON config file:
```json
{
    "reduce_lr_enabled": true,
    "reduce_lr_monitor": "val_loss",
    "reduce_lr_factor": 0.2,
    "reduce_lr_patience": 3,
    "reduce_lr_min_lr": 1e-6,
    "reduce_lr_min_delta": 1e-5,
    "early_stopping_enabled": true,
    "early_stopping_monitor": "val_loss",
    "early_stopping_patience": 6,
    "early_stopping_restore_best_weights": true,
    "warmup_enabled": true,
    "warmup_epochs": 5,
    "warmup_start_lr": 1e-6
}
```

## Benefits
1. More stable training with proper learning rate management
2. Better handling of training plateaus
3. Automatic saving of best model
4. Maintained compatibility with existing config saving
5. Improved training monitoring and control
2025-05-17 23:24:40 +03:00
johnlockejrr
7661080899
LR Warmup and Optimization Implementation
# Learning Rate Warmup and Optimization Implementation

## Overview
Added learning rate warmup functionality to improve training stability, especially when using pretrained weights. The implementation uses TensorFlow's native learning rate scheduling for better performance.

## Changes Made

### 1. Configuration Updates (`runs/train_no_patches_448x448.json`)
Added new configuration parameters for warmup:
```json
{
    "warmup_enabled": true,
    "warmup_epochs": 5,
    "warmup_start_lr": 1e-6
}
```

### 2. Training Script Updates (`train.py`)

#### A. Optimizer and Learning Rate Schedule
- Replaced fixed learning rate with dynamic scheduling
- Implemented warmup using `tf.keras.optimizers.schedules.PolynomialDecay`
- Maintained compatibility with existing ReduceLROnPlateau and EarlyStopping

```python
if warmup_enabled:
    lr_schedule = tf.keras.optimizers.schedules.PolynomialDecay(
        initial_learning_rate=warmup_start_lr,
        decay_steps=warmup_epochs * steps_per_epoch,
        end_learning_rate=learning_rate,
        power=1.0  # Linear decay
    )
    optimizer = Adam(learning_rate=lr_schedule)
else:
    optimizer = Adam(learning_rate=learning_rate)
```

#### B. Learning Rate Behavior
- Initial learning rate: 1e-6 (configurable via `warmup_start_lr`)
- Target learning rate: 5e-5 (configurable via `learning_rate`)
- Linear increase over 5 epochs (configurable via `warmup_epochs`)
- After warmup, learning rate remains at target value until ReduceLROnPlateau triggers

## Benefits
1. Improved training stability during initial epochs
2. Better handling of pretrained weights
3. Efficient implementation using TensorFlow's native scheduling
4. Configurable through JSON configuration file
5. Maintains compatibility with existing callbacks (ReduceLROnPlateau, EarlyStopping)

## Usage
To enable warmup:
1. Set `warmup_enabled: true` in the configuration file
2. Adjust `warmup_epochs` and `warmup_start_lr` as needed
3. The warmup will automatically integrate with existing learning rate reduction and early stopping

To disable warmup:
- Set `warmup_enabled: false` or remove the warmup parameters from the configuration file
2025-05-17 16:17:38 +03:00
johnlockejrr
451188c3b9
Changed deprecated lr to learning_rate and model.fit_generator to model.fit 2024-10-19 13:25:50 -07:00
vahidrezanezhad
c502e67c14 adding foreground rgb to augmentation 2024-08-28 02:09:27 +02:00
vahidrezanezhad
f31219b1c9 scaling, channels shuffling, rgb background and red content added to no patch augmentation 2024-08-21 19:33:23 +02:00
vahidrezanezhad
95bbdf8040 updating augmentations 2024-08-21 16:17:59 +02:00
vahidrezanezhad
743f2e97d6 Transformer+CNN structure is added to vision transformer type 2024-06-12 17:39:57 +02:00
vahidrezanezhad
f1fd74c7eb transformer patch size is dynamic now. 2024-06-12 13:26:27 +02:00
vahidrezanezhad
2aa216e388 binarization as a separate task of segmentation 2024-06-11 17:48:30 +02:00
vahidrezanezhad
41a0e15e79 updating train.py nontransformer backend 2024-06-10 22:15:30 +02:00
vahidrezanezhad
815e5a1d35 updating train.py 2024-06-07 16:24:31 +02:00
vahidrezanezhad
4e4490d740 machine based reading order training is integrated 2024-05-24 16:39:48 +02:00
vahidrezanezhad
a7e1f255f3
Update train.py
avoid ensembling if no model weights met the threshold f1 score in the case of classification
2024-05-08 14:47:16 +02:00
vahidrezanezhad
8d1050ec30 inference script is added 2024-05-07 13:34:03 +02:00
vahidrezanezhad
38db3e9289 adding enhancement training 2024-05-06 18:31:48 +02:00
vahidrezanezhad
dbb84507ed integrating first working classification training model 2024-04-29 20:59:36 +02:00
vahidrezanezhad
d27647a0f1 first working update of branch 2024-04-16 01:00:48 +02:00
cneud
02b1436f39 code formatting with black; typos 2024-04-10 22:20:23 +02:00
cneud
5f84938839 update parameter config docs (fix #11) 2024-04-10 21:40:23 +02:00
vahidrezanezhad
522f00ab99 adjusting to tf2 2024-04-04 11:26:28 +02:00
vahid
4bea9fd535 continue training, losses and etc 2021-06-22 18:47:59 -04:00
vahid
5fb7552dbe first updates, padding, rotations 2021-06-22 14:20:51 -04:00
vahidrezanezhad
bb212daf0b
Update main.py 2019-12-10 14:01:55 +01:00
4897fd3dd7 📝 howto: Be more verbose with the subtree pull 2019-12-09 15:33:53 +01:00