You start with identifying the data source and work on different ways to collect or gather the data. The data could come from one or more existing databases or streaming datasources such as log files. It could also come from public domain. The challenge here is to determine how large is the dataset that you would want to gather. Based upon what you want to achieve out of data analysis, this could be a smaller or larger datasets. Some of the factors that comes into consideration for data collection are following:
The next step is to explore the data and prepare it for further analysis. Following are some of the steps you would want to perform in this step:
Next step is to identify the machine learning algorithm that will be used to train the model. As you gain expertise, this could as well be first step and comes quite easily. You may note that these algorithms come as R packages that could be downloaded from CRAN and installed/loaded for further usage.
Once the algorithm is identified, it is used to train the model. For this purpose, you may want to use training dataset. One of the important step is to identify the R package and related commands that you would want to use in this stage. As an outcome of this step, one should be able to understand the result very well such that when using test datasets, one may be able to evaluate the model performance. Some of the following becomes clear at this stage:
The primary objective of this phase is to test the model effectiveness or assess the identified machine learning algorithm on test datasets. There are different approaches to this phase. For various models, summary() command comes very handy. Based on the package that you use, there could be additional commands which could be used to evaluate the model performance on test datasets.
Following is the objective of improving or optimizing machine learning algortihm:
Artificial Intelligence (AI) agents have started becoming an integral part of our lives. Imagine asking…
In the ever-evolving landscape of agentic AI workflows and applications, understanding and leveraging design patterns…
In this blog, I aim to provide a comprehensive list of valuable resources for learning…
Have you ever wondered how systems determine whether to grant or deny access, and how…
What revolutionary technologies and industries will define the future of business in 2025? As we…
For data scientists and machine learning researchers, 2024 has been a landmark year in AI…