The code smells catalogue assists data scientists and programmers in the creation and maintenance of high-quality machine learning program code. Machine learning has grown in prominence in recent years.
The code smellscatalogue assists data scientists and programmers in the creation and maintenance of high-quality machine learning program code. Machine learning has grown in prominence in recent years.
Machine learning techniques have been intensively researched in academia and are utilised in industry to provide economic value. However, requirements for code quality in machine learning applications are lacking.
Code smells, in particular, have received little attention in this sector. Although machine learning code is typically implemented as a minor component of a larger system, it plays a vital role in its core functionality. As a result, making sure the code is good is important to avoid problems in the long run.
A group of researchers from Delft University of Technologyand AI for Fintech Research in the Netherlands led by Haiyin Zhang, Lus Cruz, and Arie van Deursen proposed and identified a list of 22 machine learning-specific code smells culled from papers, grey literature, GitHub commits, and Stack Overflow posts.
They highlighted each odour with a description of its background, potential long-term difficulties, and offered solutions. There was evidence from both academic and non-academic sources that they also linked them to the right pipeline stage.
The researchers used academic literature, grey literature, community-based coding Q&A platforms (like Stack Overflow), and public software repositories to collect machine learning-specific code smells (with GitHub).
They mined articles, grey literature, reused existing bug datasets, and conducted Stack Overflow mining as well. They then triangulated their gathered odours with the recommendations offered in the machine learning library's official documentation. Finally, the code smell catalogue was validated.
There were a total of 22 machine learning-specific code odours that were collected and described. The researchers presented a broad explanation for each smell, followed by the setting of the smell, the problem with its presence, and the solution.
Finally, they summarised all of the scents, including the references that support the smell, the stage of the machine learning pipeline where they are most important, and the major effect of having those smells.
The catalogue investigates recurring code issues from many sources to assist in understanding prevalent errors in machine learning application development. Because many data scientists lack software engineering experience and are not up to date on software engineering best practises, their catalogue of smells helps to overcome this barrier by offering some guidance for designing machine learning applications.
New versions of machine learning libraries are released on a regular basis. They re-used the "TensorFlow Bugs" replication package and discovered that several instances had been deprecated due to TensorFlow's upgrading to version 2.
As a result, they anticipated that new API-specific code smells will emerge with new library versions and features. In reality, the findings suggest that the majority of API-related odours are only recorded in grey literature rather than in literature.
Compiling a catalogue of code smells aids in the promotion of a collaborative effort between practitioners and academics. Because the ecosystem of artificial intelligenceframeworks is changing so quickly, some smells may become obsolete in the meanwhile.
They predicted that three code smells in their catalogue would be deemed temporary: Dataframe Conversion API Misused, Matrix Multiplication API Misused, and Gradients Not Cleared Before Backward Propagation.
Temporary smells may be deprecated after a few years, whilst some smells are considered to last a long time. However, these three smells are significant and should be recognised to assist practitioners in preventing problems down the road.
They gathered the code smells from a variety of sources, including mining 1750 publications, mining 2170 grey literature items, utilising existing bug datasets, which included 88 Stack Overflow posts and 87 GitHub commits, and gathering 403 complimentary Stack Overflow postings.
They examined the pitfalls described in the blogs and assessed whether or not to consider them a code smell. They gathered 22 odours from the code, including generic and API-specific smells. They also classified the code smell according to the pipeline phases and its impact.
This has aided the machine learning community in improving code quality. It would be interesting to find out how common these code smells are in real-world machine learning systems and how useful it would be to have a list of machine learning-specific code smells.
Suleman Shah is a researcher and freelance writer. As a researcher, he has worked with MNS University of Agriculture, Multan (Pakistan) and Texas A & M University (USA). He regularly writes science articles and blogs for science news website immersse.com and open access publishers OA Publishing London and Scientific Times. He loves to keep himself updated on scientific developments and convert these developments into everyday language to update the readers about the developments in the scientific era. His primary research focus is Plant sciences, and he contributed to this field by publishing his research in scientific journals and presenting his work at many Conferences.
Shah graduated from the University of Agriculture Faisalabad (Pakistan) and started his professional carrier with Jaffer Agro Services and later with the Agriculture Department of the Government of Pakistan. His research interest compelled and attracted him to proceed with his carrier in Plant sciences research. So, he started his Ph.D. in Soil Science at MNS University of Agriculture Multan (Pakistan). Later, he started working as a visiting scholar with Texas A&M University (USA).
Shah’s experience with big Open Excess publishers like Springers, Frontiers, MDPI, etc., testified to his belief in Open Access as a barrier-removing mechanism between researchers and the readers of their research. Shah believes that Open Access is revolutionizing the publication process and benefitting research in all fields.
Han Ju
Reviewer
Hello! I'm Han Ju, the heart behind World Wide Journals. My life is a unique tapestry woven from the threads of news, spirituality, and science, enriched by melodies from my guitar. Raised amidst tales of the ancient and the arcane, I developed a keen eye for the stories that truly matter. Through my work, I seek to bridge the seen with the unseen, marrying the rigor of science with the depth of spirituality.
Each article at World Wide Journals is a piece of this ongoing quest, blending analysis with personal reflection. Whether exploring quantum frontiers or strumming chords under the stars, my aim is to inspire and provoke thought, inviting you into a world where every discovery is a note in the grand symphony of existence.
Welcome aboard this journey of insight and exploration, where curiosity leads and music guides.