Prof. Shaohua Wan
University of Electronic Science and Technology of China, China
Vol. 3, No. 1 (2025) (In Publishing)
-
Open Access
Article
Article ID: 1450
Pre-trained models for linking process in data washing machineby Bushra Sajid, Ahmed Abu-Halimeh, Nuh Jakoet
Computing and Artificial Intelligence, Vol.3, No.1, 2024; 48 Views, 22 PDF Downloads
Entity Resolution (ER) has been investigated for decades in various domains as a fundamental task in data integration and data quality. The emerging volume of heterogeneously structured data and even unstructured data challenges traditional ER methods. This research mainly focuses on the Data Washing Machine (DWM). The DWM was developed in the NSF DART Data Life Cycle and Curation research theme, which helps to detect and correct certain types of data quality errors automatically. It also performs unsupervised entity resolution to identify duplicate records. However, it uses traditional methods that are driven by algorithmic pattern rules such as Levenshtein Edit Distances and Matrix comparators. The goal of this research is to assess the replacement of rule-based methods with machine learning and deep learning methods to improve the effectiveness of the processes using 18 sample datasets. The DWM has different processes to improve data quality, and we are currently focusing on working with the scoring and linking processes. To integrate the machine model into the DWM, different pre-trained models were tested to find the one that helps to produce accurate vectors that can be used to calculate the similarity between the records. After trying different pre-trained models, distilroberta was chosen to get the embeddings, and cosine similarity metrics were later used to get the similarity scores, which helped us assess the machine learning model into DWM and gave us closer results to what the scoring matrix is giving. The model performed well and gave closer results overall, and the reason can be that it helped to pick up the important features and helped at the entity matching process.
show more -
Open Access
Article
Article ID: 1498
Generative artificial intelligence (GAI): From large language models (LLMs) to multimodal applications towards fine tuning of models, implications, investigationsby Zarif Bin Akhtar
Computing and Artificial Intelligence, Vol.3, No.1, 2024; 105 Views, 57 PDF Downloads, 2 Supp. file Downloads
This research explores the transformative integration of artificial intelligence (AI), robotics, and language models, with a particular emphasis on the PaLM-E model. The exploration aims to assess PaLM-E’s decision-making processes and adaptability across various robotic environments, demonstrating its capacity to convert textual prompts into very precise robotic actions. In addition, the research investigates Parameter-Efficient Fine-Tuning (PEFT) techniques, such as Low-Rank Adaptation (LoRA) and Quantized Low-Rank Adaptation (QLoRA), providing a historical overview of PEFT and highlighting their significance in enhancing task performance while reducing the number of trainable parameters. The broader scope of Generative AI is examined through an analysis of influential models like GPT-3, GPT-4, Copilot, Bard, LLaMA, Stable Diffusion, Midjourney, and DALL-E. These models’ abilities to process natural language prompts and generate a wide range of outputs are thoroughly investigated. The research traces the historical evolution of AI, from its roots in science fiction to its practical applications today, with a focus on the rise of Generative AI in the 21st century. Furthermore, the research delves into the various modalities of Generative AI, covering applications in text, code, images, and more, and assesses their real-world impact on robotics, planning, and business intelligence. The implications of synthetic data generation for business analytics are also explored. The research inspects within both software and hardware landscapes, comparing local deployment on consumer-grade hardware along with cloud-based services, and underscores the benefits of local model deployment in terms of privacy protection, intellectual property security, and censorship resistance. Ethical considerations are central to this research, addressing concerns related to privacy, security, societal impact, biases, and misinformation. The research proposes ethical guidelines for the responsible development and deployment of AI technologies. Ultimately, this work reveals the deep interconnections between vision, language, and robotics, pushing the boundaries of AI capabilities and providing crucial insights for future AI model development and technological innovation. These findings are intended to guide the field through the emerging challenges of the rapidly evolving Generative AI landscape.
show more