CLI / WebUI¶
Overview¶
CLI (Command Line Interface) provides terminal-based interaction with the program, enabling efficient and flexible execution of model training, inference, and evaluation tasks through parameterized configurations.
WebUI (Web User Interface) offers a browser-based visual interface that allows users to perform model training, chatting, and deployment without coding or complex commands, making it ideal for non-technical users and rapid prototyping.
Features¶
This document details the usage of CLI tools and WebUI in the ERNIE model toolkit, covering core functionalities:
- 📈 Model Fine-tuning: SFT/LoRA/DPO fine-tuning with built-in/custom datasets
- 🗣️ Chat Interaction: Load models for multi-turn conversation testing
- 📊 Performance Evaluation: Validate models on built-in/custom datasets
- 📁 Model Export: Convert trained models to deployable formats
Whether you're a developer seeking script-based customization or prefer graphical interfaces for quick experimentation, both approaches are supported.
Quick Start¶
Installation
Run in the erniekit root directory:
Verify installation:
Expected output:
------------------------------------------------------------
| Usage: |
| erniekit train -h: model finetuning |
| erniekit export -h: model export |
| erniekit split -h: model split |
| erniekit eval -h: model evaluation |
| erniekit server -h: model deployment |
| erniekit chat -h: launch a chat interface in CLI |
| erniekit webui -h: launch webui |
| erniekit version: show version info |
| erniekit help: show helping info |
------------------------------------------------------------
GPU Configuration
By default, all available gpus are used in CLI/WebUI. If you wan to specify certain gpus, please set CUDA_VISIBLE_DEVICES before running CLI/WebUI:
# Single GPU
export CUDA_VISIBLE_DEVICES=0
# Multi GPUs
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
# Single XPU
export XPU_VISIBLE_DEVICES=0
# Multi XPUs
export XPU_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
# Single NPU
export ASCEND_RT_VISIBLE_DEVICES=0
# Multi NPUs
export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
- Note: In
Chatmodule, the number of gpus configured by CUDA_VISIBLE_DEVICES should be equal totensor_parallel_degreein the config. Alternatively, you can also unset CUDA_VISIBLE_DEVICES.
1. CLI Usage¶
Examples using ERNIE-4.5-0.3B model:
1.1. Chat¶
# download model from huggingface
huggingface-cli download baidu/ERNIE-4.5-0.3B-Paddle --local-dir baidu/ERNIE-4.5-0.3B-Paddle
# Load model and start service
erniekit server examples/configs/ERNIE-4.5-0.3B/run_chat.yaml
# Launch CLI chat interface
erniekit chat examples/configs/ERNIE-4.5-0.3B/run_chat.yaml
- Note: the command-line dialogue for VL-model only supports pure text input.
1.2. Model Fine-tuning¶
1.2.1. SFT & LoRA Fine-tuning¶
# download model from huggingface
huggingface-cli download baidu/ERNIE-4.5-0.3B-Paddle --local-dir baidu/ERNIE-4.5-0.3B-Paddle
# Example 1: 8K seq length, SFT
erniekit train examples/configs/ERNIE-4.5-0.3B/sft/run_sft_8k.yaml
# Example 2: 32K seq length, SFT
erniekit train examples/configs/ERNIE-4.5-0.3B/sft/run_sft_32k.yaml
# Example 3: 8K seq length, SFT-LoRA
erniekit train examples/configs/ERNIE-4.5-0.3B/sft/run_sft_lora_8k.yaml
# Example 4: 32K seq length, SFT-LoRA
erniekit train examples/configs/ERNIE-4.5-0.3B/sft/run_sft_lora_32k.yaml
1.2.2. DPO & LoRA Fine-tuning¶
# download model from huggingface
huggingface-cli download baidu/ERNIE-4.5-0.3B-Paddle --local-dir baidu/ERNIE-4.5-0.3B-Paddle
# Example 1: 8K seq length, DPO
erniekit train examples/configs/ERNIE-4.5-0.3B/dpo/run_dpo_8k.yaml
# Example 2: 32K seq length, DPO
erniekit train examples/configs/ERNIE-4.5-0.3B/dpo/run_dpo_32k.yaml
# Example 3: 8K seq length, DPO-LoRA
erniekit train examples/configs/ERNIE-4.5-0.3B/dpo/run_dpo_lora_8k.yaml
# Example 4: 32K seq length, DPO-LoRA
erniekit train examples/configs/ERNIE-4.5-0.3B/dpo/run_dpo_lora_32k.yaml
1.3. Model Evaluation¶
1.4. Model Export¶
1.5. Multi-Node Training¶
NNODES={num_nodes} MASTER_ADDR={your_master_addr} MASTER_PORT={your_master_port} CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 erniekit train examples/configs/ERNIE-4.5-300B-A47B/sft/run_sft_lora_8k.yaml
2. WebUI Examples¶
Launch WebUI:
WebUI contains five modules: Basic Info, Training, Chat, Evaluation, and Export.
2.1. Basic Info¶
2.1.1 Model¶
Default model name is Customization. Custom models support local paths (relative/absolute).
If using a multimodal model, you need to select Customization_VL.
2.1.2 Export Directory¶
If empty, training will auto-generate paths like ./output/ERNIE-4.5-0.3B_SFT_LoRA_2025_06_29_12_03_36. Evaluation/chat/export default to ./output.
2.1.3 Available GPUs¶
Displays GPU count (read-only).
2.1.4 Training Method¶
| WebUI Param | Variable | Description |
|---|---|---|
| Fine-tuning | fine_tuning | LoRA or Full-parameter |
| Compute Type | compute_type | bf16, fp16, fp8 (NVIDIA H-series only), wint8, wint4/8 |
| AMP Master Grad | amp_master_grad | For AMP O2, uses fp32 weight gradients (default: keep unchanged) |
| Disable CKPT Quant | disable_ckpt_quant | Disables weight quantization |
| LoRA Rank | lora_rank | LoRA rank dimension |
| LoRA Alpha | lora_alpha | LoRA scaling factor |
| LoRA+ Scale | lora_plus_scale | LoRA B scale in LoRA+ |
| RSLoRA | rslora | Enable RSLoRA |
2.1.5 Distributed Parameters¶
| WebUI Param | Variable | Description |
|---|---|---|
| Tensor Parallel | tensor_parallel_degree | Tensor parallelism degree |
| Pipeline Parallel | pipeline_parallel_degree | Pipeline parallelism degree |
| Sharding Parallel | sharding_parallel_degree | Sharding parallelism degree |
| Pipeline Config | pipeline_parallel_config | Recommended: "disable_partial_send_recv enable_clear_every_step_cache enable_delay_scale_loss enable_overlap_p2p_comm best_unbalanced_scheduler" |
| PP Seg Method | pp_seg_method | Pipeline layer segmentation |
| Sharding | sharding | Sharding stage: stage1 (optimizer), stage2 (gradients), stage3 (model) |
| Use SP Callback | use_sp_callback | Skips redundant gradient calculations |
| MoE Group | moe_group | MoE communication group ("mp" or "dummy") |
2.2. Training Module¶
Default SFT/DPO configurations for ERNIE-4.5-0.3B-Paddle are provided under "Switch SFT/DPO Presets".
After setting dataset paths/probabilities, click "Preview Dataset" for visualization. Click "Preview" to show configurations, "Start" to begin training, and "Stop" to interrupt.
2.2.1 Data Parameters¶
| WebUI Param | Variable | Description |
|---|---|---|
| Max Sequence Length | max_seq_len | Token limit (adjust lower with larger GBS to avoid OOM) |
| Max Prompt Length | max_prompt_len | For DPO (max: max_seq_len-10) |
| Virtual Epoch Size | num_samples_each_epoch | Recommended default |
| Recompute | recompute | Gradient checkpointing to save memory |
| Training Epochs | num_train_epochs | Overridden by max_steps if both set |
| Max Steps | max_steps | Total training steps |
| Batch Size | batch_size | Micro batch size |
| Gradient Accumulation | gradient_accumulation_steps | Steps for gradient accumulation |
2.2.2 Training Dataset¶
Choose built-in (demo/HuggingFace) or custom datasets (mixed by probability):
| WebUI Param | Variable | Description |
|---|---|---|
| Dataset Path | train_dataset_path | Training dataset path |
| Dataset Probability | train_dataset_prob | Sampling probability |
| Data Type | train_dataset_type | Supported: erniekit, alpaca |
- Note: Multimodal models can additionally be configured with text-only datasets, allowing for mixed training with both multimodal and text-only data. You can adjust the data ratio through a sliding window interface.
2.2.3 Evaluation Dataset¶
Same options as training dataset:
| WebUI Param | Variable | Description |
|---|---|---|
| Dataset Path | eval_dataset_path | Evaluation dataset path |
| Dataset Probability | eval_dataset_prob | Sampling probability |
| Data Type | eval_dataset_type | Supported: erniekit, alpaca |
2.2.4 Dataloader¶
| WebUI Param | Variable | Description |
|---|---|---|
| Workers | dataloader_num_workers | Subprocess count (0 to disable) |
| Distributed | distributed_dataloader | Saves memory for large datasets |
2.2.5 Optimizer¶
| WebUI Param | Variable | Description |
|---|---|---|
| LR Scheduler | lr_scheduler_type | linear/cosine/polynomial/constant/constant_with_warmup |
| Learning Rate | learning_rate | Suggested: 3e-5 (SFT), 1e-6 (DPO), 3e-4 (SFT-LoRA), 1e-5 (DPO-LoRA) |
| Min LR | min_lr | For cosine scheduler only |
| Layerwise Decay | layerwise_lr_decay_bound | (0, 1], 1=no decay |
| Warmup Steps | warmup_steps | Typically 1-10% of max_steps |
| Optimizer | optim | Default: adamw |
| Offload Optim | offload_optim | Offload to CPU |
| Release Grads | release_grads | Reduces peak memory (recommended: True) |
| Loss Scaling | scale_loss | For float16 training |
| Weight Decay | weight_decay | AdamW parameter |
| Adam Epsilon | adam_epsilon | AdamW parameter |
| Adam Beta1 | adam_beta1 | AdamW parameter |
| Adam Beta2 | adam_beta2 | AdamW parameter |
2.2.6 Output¶
| WebUI Param | Variable | Description |
|---|---|---|
| Logging Steps | logging_steps | Log interval |
| Eval Steps | eval_steps | Evaluation interval |
| Eval Strategy | evaluation_strategy | "steps" enables periodic evaluation |
| Save Steps | save_steps | Checkpoint interval (when save_strategy=="steps") |
| Save Strategy | save_strategy | Checkpoint saving method |
| Save Limit | save_total_limit | Max checkpoints to keep |
2.3. Chat Module¶
Load models from Basic Info section. Click "Verify Model Loading" to check status, and "Unload" to release models.
*Note: Full-parameter checkpoints in output_dir take priority for deployment.
After successful loading: - Enter prompts in the input box - Set roles/system prompts - 【VL model】 Select "Enable VL Thought Mode" to enable thinking mode - 【VL model】 You can drag and drop to upload images or videos, or click to upload, or enter a URL - Click "Submit" to start chatting - View history in "Chat History" - "Clear" resets conversation - "Stop" interrupts generation
| WebUI Param | Variable | Description |
|---|---|---|
| Max Length | max_model_len | Input+output token limit |
| Port | port | Service port |
| Max New Tokens | max_new_tokens | Generation limit |
| Top-p | top_p | Nucleus sampling (higher=more diverse) |
| Temperature | temperature | Controls randomness (higher=more creative) |
2.4. Evaluation Module¶
Select model in Basic Info (latest checkpoint in export dir used by default).
Choose evaluation dataset (built-in/custom). Click "Preview Eval Dataset" for visualization.
"Preview Command" shows configurations. "Start" begins evaluation, "Stop" interrupts.
| WebUI Param | Variable | Description |
|---|---|---|
| Dataset Path | eval_dataset_path | Evaluation dataset path |
| Dataset Probability | eval_dataset_prob | Sampling probability |
| Data Type | eval_dataset_type | Supported: erniekit, alpaca |
2.5. Export Module¶
Two functions: 1. LoRA weight merging 2. Model weight splitting (safetensors format only)
LoRA Merging
Set export directory to training output dir. Click "Start Merge LoRA Weights" to merge into original model (saved in export_dir/export).
Weight Splitting
For large safetensors files, click "Start Split Model" to split weights (saved in export_dir/split_export).
| WebUI Param | Variable | Description |
|---|---|---|
| Max Shard Size (GB) | max_shard_size | Split file size limit |