This article represents sample source code which could be used to extract random training and test data set from a data frame using R programming language. The R code below could prove very handy while you are working to create a model using any machine learning algorithm. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos.
# Read the data from a file; The command below assumes that the working
# directory has already been set. One could set working directory using
# setwd() command.
sample_df <- read.csv("glass.data", header=TRUE, stringsAsFactors=FALSE)
# get a vector comprising of all indices starting from 1 and ending with row number
index <- 1:nrow(sample_df)
# Get random indices of size n from index vector; In command below, the
# size n is determined using trunc(length(index))/3
randindex <- sample(index, trunc(length(index))/3)
# Get the training set consisting of all the items except one represented using
# randindex
trainset <- sample_df[-randindex,]
# Get the test set represented using random index
testset <- sample_df[randindex,]
Ajitesh Kumar
Ajitesh has been recently working in the area of AI and machine learning. Currently, his research area includes Safe & Quality AI. In addition, he is also passionate about various different technologies including programming languages such as Java/JEE, Javascript and technologies such as Blockchain, mobile computing, cloud-native technologies, application security, cloud computing platforms, big data etc.
He has also authored the book, Building Web Apps with Spring 5 and Angular.
He has also authored the book, Building Web Apps with Spring 5 and Angular.
Latest posts by Ajitesh Kumar (see all)
- 13 Programming Languages used for Machine Learning - January 6, 2019
- Linear Regression Explained with Real Life Example - December 29, 2018
- Top 5 Machine Learning Introduction Slides for Beginners - December 24, 2018