MindSpore Transformers Documentation ===================================== The goal of the MindSpore Transformers suite is to build a full-process development suite for large model pre-training, fine-tuning, evaluation, inference, and deployment. It provides mainstream Transformer-based Large Language Models (LLMs) and Multimodal Models (MMs). It is expected to help users easily realize the full process of large model development. Users can refer to `Overall Architecture `_ and `Model Library `_ to get a quick overview of the MindSpore Transformers system architecture, and the list of supported functional features and foundation models. Further, refer to the `Installation `_ and `Quick Start `_ to get started with MindSpore Transformers. If you have any suggestions for MindSpore Transformers, please contact us via `issue `_ and we will handle them promptly. MindSpore Transformers supports one-click start of single/multi-card training, fine-tuning, evaluation, and inference processes for any task, which makes the execution of deep learning tasks more efficient and user-friendly by simplifying the operation, providing flexibility, and automating the process. Users can learn from the following explanatory documents: - `Development Migration `_ - `Pretraining `_ - `SFT Tuning `_ - `Evaluation `_ - `Inference `_ - `Quantization `_ - `Service Deployment `_ - `Multimodal Model Development `_ Code repository address: Flexible and Easy-to-Use Personalized Configuration with MindSpore Transformers ------------------------------------------------------------------------------------------- With its powerful feature set, MindSpore Transformers provides users with flexible and easy-to-use personalized configuration options. Specifically, it comes with the following key features: 1. `Weight Format Conversion `_ Provides a unified weight conversion tool that converts model weights between the formats used by HuggingFace and MindSpore Transformers. 2. `Distributed Weight Slicing and Merging `_ Weights in different distributed scenarios are flexibly sliced and merged. 3. `Distributed Parallel `_ One-click configuration of multi-dimensional hybrid distributed parallel allows models to run efficiently in clusters up to 10,000 cards. 4. `Dataset `_ Support multiple types and formats of datasets. 5. `Weight Saving and Resumable Training After Breakpoint `_ Supports step-level resumable training after breakpoint, effectively reducing the waste of time and resources caused by unexpected interruptions during large-scale training. 6. `Training Metrics Monitoring `_ Provides visualization services for the training phase of large models for monitoring and analyzing various indicators and information during the training process. 7. `Training High Availability `_ Provide high-availability capabilities for the training phase of large models, including end-of-life CKPT preservation, UCE fault-tolerant recovery, and process-level rescheduling recovery. 8. `Safetensors Weights `_ Support the function of saving and loading weight files in safetensors format. 9. `Fine-Grained Activations SWAP _` Support fine-grained selection of specific activations to enable SWAP and reduce peak memory overhead during model training. Deep Optimizing with MindSpore Transformers --------------------------------------------- - `Precision Optimizing `_ - `Performance Optimizing `_ Appendix ------------------------------------ - `Environment Variables Descriptions `_ - `Configuration File Descriptions `_ FAQ ------------------------------------ - `Model-Related `_ - `Function-Related `_ - `MindSpore Transformers Contribution Guide `_ - `Modelers Contribution Guide `_ .. toctree:: :glob: :maxdepth: 1 :caption: Start :hidden: start/overview start/models .. toctree:: :glob: :maxdepth: 1 :caption: Quick Start :hidden: quick_start/install quick_start/source_code_start .. toctree:: :glob: :maxdepth: 1 :caption: Usage Tutorials :hidden: usage/dev_migration usage/multi_modal usage/pre_training usage/sft_tuning usage/evaluation usage/inference usage/quantization usage/deployment usage/pretrain_gpt .. toctree:: :glob: :maxdepth: 1 :caption: Function Description :hidden: function/weight_conversion function/transform_weight function/distributed_parallel function/dataset function/resume_training function/monitor function/high_availability function/safetensors function/fine_grained_activations_swap .. toctree:: :glob: :maxdepth: 1 :caption: Precision Optimization :hidden: acc_optimize/acc_optimize .. toctree:: :glob: :maxdepth: 1 :caption: Performance Optimization :hidden: perf_optimize/perf_optimize .. toctree:: :maxdepth: 1 :caption: API :hidden: mindformers mindformers.core mindformers.dataset mindformers.generation mindformers.models mindformers.modules mindformers.pet mindformers.pipeline mindformers.tools mindformers.wrapper .. toctree:: :glob: :maxdepth: 1 :caption: Appendix :hidden: appendix/env_variables appendix/conf_files .. toctree:: :glob: :maxdepth: 1 :caption: FAQ :hidden: faq/model_related faq/func_related faq/mindformers_contribution faq/modelers_contribution .. toctree:: :glob: :maxdepth: 1 :caption: RELEASE NOTES :hidden: RELEASE