Paper Title
Utilizing Linux Clusters for Efficient Training Data Preparation With Deep MD-Kit: A Case Study of High-Entropy Ceramic Agmngesbte4

Abstract
This paper describes the process of preparing training data for DeepMD-kit on a Linux cluster. DeepMD-kit is a powerful software package developed by Wang et al. (Computer Physics Communications, 2018, 228: p. 178-184) that uses deep learning algorithms to simulate atomic interactions in large-scale materials systems efficiently and accurately. By training neural networks to represent the atomic interactions in a system, DeepMD-kit can simulate the system's behavior. We demonstrate the scalability of the process by training multiple neural networks using different training data sets on multiple compute nodes. Our results show that the use of a Linux cluster can significantly reduce the time required to prepare training data for DeepMD-kit, making it a more efficient and practical tool for simulating large-scale materials systems.To showcase the capabilities of DeepMD-kit, we used high-entropy ceramic (HEC), AgMnGeSbTe4, as a benchmark material for training the deep learning potential for molecular dynamics (MD) simulation. HECs are disordered multicomponent systems that possess unique properties and promising applications.The first step in preparing training data is to generate a set of input structures that represent the material system of interest. We generated these structures using density functional theory (DFT) calculations. Once the input structures are generated, we pass them through a pre-processing step that converts them into a format usable by DeepMD-kit.Since large amounts of training data are required to train the neural networks, it is common to use a high-performance computing cluster. To speed up this process, we describe the process of preparing training data for DeepMD-kit on a Linux cluster. Additionally, we introduce how to deploy a home-made PC cluster using our Perl script available on our github repository (https://github.com/jushinpon/Centos8ClusterScripts_20210404.git). Keywords - Linux Cluster, Parallel Computing, High Entropy Ceramic, High Entropy Semiconductor, Molecular Dynamics, Deep Learning Potential.