AI-Driven AutoML for Analytics Workflow Acceleration on Distributed Data Lakes
Keywords:
Analytics Workflow, AutoML, Distributed Data LakesAbstract
As organisations generate data from numerous sources, which have progressively become larger in scale—ranging from IoT devices to cloud applications—the typical centralized data warehouse has been replaced by the distributed data lakes. Such repositories are cloud-based and therefore offer flexible and scalable storage for structured and unstructured data. However, distributed data lakes, besides being a perfect solution for storage and access issues, still bring some adverse effects: the capability to execute the analytical workflows has become more complex, resource-consuming, and hard to control when scaling. Launching and implementing machine learning models in these kinds of places requires, most of the time, a great deal of time, high-level professional skills, and good coordination between different teams. AutoML (Automated Machine Learning) is a newly appearing possibility that aims to solve major problems. It does so by carrying out the laborious stages of feature selection, model training, and hyperparameter tuning—that greatly facilitates the implementation of analytics and makes it less time-consuming—but it is not a perfect solution. In order to get the utmost benefit from AutoML in distributed environments, AI-driven orchestration is necessary. With the help of intelligent agents, the pipeline gains real-time decision power about power distribution, routing, and failure recovery; thus, static processes become more dynamic and self-optimized. This paper is a discussion of a combination of AI-driven AutoML frameworks with dispersed data lake architectures to make analytics pipelines easier and faster.
Downloads
References
Kothandapani, Hariharan Pappil. "Integrating robotic process automation and machine learning in data lakes for automated model deployment, retraining, and data-driven decision making." Sage Science Review of Applied Machine Learning 4.2 (2021): 16-30.
Mishra, Sarbaree. “A Reinforcement Learning Approach for Training Complex Decision Making Models”. International Journal of Artificial Intelligence, Data Science, and Machine Learning, vol. 3, no. 3, Oct. 2022, pp. 82-92
Mohammad, Abdul Jabbar. “Sentiment-Driven Scheduling Optimizer”. International Journal of Emerging Research in Engineering and Technology, vol. 1, no. 2, June 2020, pp. 50-59
Manda, Jeevan Kumar. "AI-powered Threat Intelligence Platforms in Telecom: Leveraging AI for Real-time Threat Detection and Intelligence Gathering in Telecom Network Security Operations." Available at SSRN 5003638 (2024).
Guntupalli, Bhavitha, and Surya Vamshi ch. “Designing Microservices That Handle High-Volume Data Loads”. International Journal of AI, BigData, Computational and Management Studies, vol. 4, no. 4, Dec. 2023, pp. 76-87
Sasmal, Shubhodip. "Smart Data Lakes: AI Innovations in Data Engineering." (2023).
Chaganti, Krishna Chaitanya. "Securing Enterprise Java Applications: A Comprehensive Approach." International Journal of Science And Engineering 10.2 (2024): 18-27.
Nookala, G. (2023). Serverless Data Architecture: Advantages, Drawbacks, and Best Practices. Journal of Computing and Information Technology, 3(1).
Tranquillin, Marco, Valliappa Lakshmanan, and Firat Tekiner. Architecting data and machine learning platforms: enable analytics and AI-driven innovation in the cloud. " O'Reilly Media, Inc.", 2023.
Mishra, Sarbaree. “Comparing Apache Iceberg and Databricks in Building Data Lakes and Mesh Architectures”. International Journal of AI, BigData, Computational and Management Studies, vol. 3, no. 4, Dec. 2022, pp. 37-48
Lalith Sriram Datla, and Samardh Sai Malay. “Patient-Centric Data Protection in the Cloud: Real-World Strategies for Privacy Enforcement and Secure Access”. European Journal of Quantum Computing and Intelligent Agents, vol. 8, Aug. 2024, pp. 19-43
Veluru, Sai Prasad. "Leveraging AI and ML for Automated Incident Resolution in Cloud Infrastructure." International Journal of Artificial Intelligence, Data Science, and Machine Learning 2.2 (2021): 51-61.
Abdul Jabbar Mohammad. “Cross-Platform Timekeeping Systems for a Multi-Generational Workforce”. American Journal of Cognitive Computing and AI Systems, vol. 5, Dec. 2021, pp. 1-22
Prabhakaran, Sushil Prabhu, Satyanarayana Murthy Polisetty, and Santhosh Kumar Pendyala. "Building a Unified and Scalable Data Ecosystem: AI-DrivenSolution Architecture for Cloud Data Analytics." International Journal of Computer Engineering and Technology (IJCET) 13.3 (2022).
Vasanta Kumar Tarra, and Arun Kumar Mittapelly. “The Role of Generative AI in Salesforce CRM: Exploring How Tools Like ChatGPT and Einstein GPT Transform Customer Engagement”. JOURNAL OF RECENT TRENDS IN COMPUTER SCIENCE AND ENGINEERING ( JRTCSE), vol. 12, no. 1, May 2024, pp. 50-66
Jani, Parth. “Embedding NLP into Member Portals to Improve Plan Selection and CHIP Re-Enrollment”. Newark Journal of Human-Centric AI and Robotics Interaction, vol. 1, Nov. 2021, pp. 175-92
Shaik, Babulal, and Jayaram Immaneni. "Enhanced Logging and Monitoring With Custom Metrics in Kubernetes." African Journal of Artificial Intelligence and Sustainable Development 1 (2021): 307-30.
Lakarasu, Phanish. "AI-Driven Data Engineering: Automating Data Quality, Lineage, And Transformation In Cloud-Scale Platforms." Lineage, and Transformation in Cloud-scale Platforms (December 10, 2022) (2022).
Mohammad, Abdul Jabbar. “Dynamic Labor Forecasting via Real-Time Timekeeping Stream”. International Journal of AI, BigData, Computational and Management Studies, vol. 4, no. 4, Dec. 2023, pp. 56-65
Agarwal, Giriraj. "Robust Data Pipelines for AI Workloads: Architectures, Challenges, and Future Directions." International Journal of Advanced Research in Science, Communication and Technology 5.2 (2024): 622-632.
Balkishan Arugula. “Building Scalable Ecommerce Platforms: Microservices and Cloud-Native Approaches”. Journal of Artificial Intelligence & Machine Learning Studies, vol. 8, Aug. 2024, pp. 42-74
Kothandapani, HARIHARAN PAPPIL. "Emerging trends and technological advancements in data lakes for the financial sector: An in-depth analysis of data processing, analytics, and infrastructure innovations." Quarterly Journal of Emerging Technologies and Innovations 8.2 (2023): 62-75.
Manda, Jeevan Kumar. "Blockchain-based Identity Management in Telecom: Implementing Blockchain for Secure and Decentralized Identity Management Solutions in." Available at SSRN 5136783 (2024).
Lee, Janothan. "Optimizing Machine Learning Workflows: A Scalable Cloud-Based Data Analytics Framework." Available at SSRN 5140155 (2020).
Immaneni, J. (2021). Scaling Machine Learning in Fintech with Kubernetes. International Journal of Digital Innovation, 2(1).
Pentyala, Dillep Kumar. "Cloud-Centric Data Engineering: AI-Driven Mechanisms for Enhanced Data Quality Assurance." International Journal of Modern Computing 2.1 (2019): 1-25.
Patel, Piyushkumar, and Deepu Jose. "Green Tax Incentives and Their Accounting Implications: The Rise of Sustainable Finance." Journal of Artificial Intelligence Research and Applications 4.1 (2024): 627-48.
Kumar, Manoj. "The Future of AI in Big Data: Cloud Platforms are Evolving to Support Machine Learning and Analytics." ESP International Journal of Advancements in Computational Technology (2023).
Mishra, Sarbaree, and Sairamesh Konidala. “A Polyglot Data Integration Framework for Seamless Integration of Heterogeneous Data Sources and Formats”. International Journal of Emerging Trends in Computer Science and Information Technology, vol. 5, no. 4, Dec. 2024, pp. 70-81
Chaganti, Krishna. "Adversarial Attacks on AI-driven Cybersecurity Systems: A Taxonomy and Defense Strategies." Authorea Preprints.
Zeydan, Engin, Suayb S. Arslan, and Madhusanka Liyanage. "Managing distributed machine learning lifecycle for healthcare data in the cloud." IEEE Access (2024).
Immaneni, J. (2023). Detecting Complex Fraud with Swarm Intelligence and Graph Database Patterns. Journal of Computing and Information Technology, 3.
Theodorou, Vasileios, et al. "MEDAL: An AI-driven data fabric concept for elastic cloud-to-edge intelligence." International Conference on Advanced Information Networking and Applications. Cham: Springer International Publishing, 2021.
Shaik, Babulal. "Network Isolation Techniques in Multi-Tenant EKS Clusters." Distributed Learning and Broad Applications in Scientific Research 6 (2020).
Yachamaneni, Thulasiram, Uttam Kotadiya, and Amandeep Singh Arora. "Enhancing Data Throughput and Latency in Distributed In-Memory Systems for AI-Driven Applications across Public Cloud Infrastructure." International Journal of AI, BigData, Computational and Management Studies 2.4 (2021): 69-79.
Mishra, Sarbaree. “The Lifelong Learner - Designing AI Models That Continuously Learn and Adapt To New Datasets”. International Journal of Artificial Intelligence, Data Science, and Machine Learning, vol. 5, no. 1, Mar. 2024, pp. 68-78
Nookala, G., Gade, K. R., Dulam, N., & Thumburu, S. K. R. (2024). Post-quantum cryptography: Preparing for a new era of data encryption. MZ Computing Journal, 5(2), 012077.
Chaganti, Krishna. "Adversarial Attacks on AI-driven Cybersecurity Systems: A Taxonomy and Defense Strategies." Authorea Preprints.
Motamary, Shabrinath. "Data Engineering Strategies for Scaling AI-Driven OSS/BSS Platforms in Retail Manufacturing." BSS Platforms in Retail Manufacturing(December 10, 2024) (2024).
Guntupalli, Bhavitha. “Clean Code in the Real World: Principles I Actually Use”. International Journal of Emerging Trends in Computer Science and Information Technology, vol. 1, no. 1, Mar. 2020, pp. 66-74
Jani, Parth. "AI AND DATA ANALYTICS FOR PROACTIVE HEALTHCARE RISK MANAGEMENT." INTERNATIONAL JOURNAL 8.10 (2024).
Lalith Sriram Datla. “Smarter Provisioning in Healthcare IT: Integrating SCIM, GitOps, and AI for Rapid Account Onboarding”. Journal of Artificial Intelligence & Machine Learning Studies, vol. 8, Dec. 2024, pp. 75-96
Talakola, Swetha, and Sai Prasad Veluru. “Managing Authentication in REST Assured OAuth, JWT and More”. International Journal of Emerging Trends in Computer Science and Information Technology, vol. 4, no. 4, Dec. 2023, pp. 66-75
Abdul Jabbar Mohammad. “Biometric Timekeeping Systems and Their Impact on Workforce Trust and Privacy”. Journal of Artificial Intelligence & Machine Learning Studies, vol. 8, Oct. 2024, pp. 97-123
Arugula, Balkishan. “Leading Multinational Technology Teams: Lessons from Africa, Asia, and North America”. International Journal of Emerging Research in Engineering and Technology, vol. 4, no. 3, Oct. 2023, pp. 53-61
Shaik, Babulal. "Automating Zero-Downtime Deployments in Kubernetes on Amazon EKS." Journal of AI-Assisted Scientific Discovery 1.2 (2021): 355-77.
Mishra, Sarbaree. “Incorporating Automated Machine Learning and Neural Architecture Searches to Build a Better Enterprise Search Engine”. International Journal of Artificial Intelligence, Data Science, and Machine Learning, vol. 4, no. 4, Dec. 2023, pp. 65-75
Jani, Parth. “Azure Synapse + Databricks for Unified Healthcare Data Engineering in Government Contracts”. Los Angeles Journal of Intelligent Systems and Pattern Recognition, vol. 2, Jan. 2022, pp. 273-92
Patel, Piyushkumar. "AI and Machine Learning in Tax Strategy: Predictive Analytics for Corporate Tax Optimization." African Journal of Artificial Intelligence and Sustainable Development 4.1 (2024): 439-57.
Rane, Nitin Liladhar, et al. "Machine learning and deep learning for big data analytics: A review of methods and applications." Partners Universal International Innovation Journal 2.3 (2024): 172-197.