PART A
1)
Develop an r program to quickly explore a given dataset , including categorical analysis using the graph_by() command, and visualize the findings using ggplot2.
what we will do In this program, we will:
Load the required libraries and dataset.
Explore the structure of the dataset.
Convert a numerical variable into a categorical variable.
Perform categorical analysis using
group_by()andsummarize().Visual the result using
ggplot2.
step1 : Load required libraries and dataset
tidyverseis a collection of packages for data science.dplyris used for grouping and summarizing data.
Step 2: Explore the dataset
Before performing any analysis, we should understand the dataset.
we will check:
Number of rows and columns
Columns name
Data types
Summary statistics
First few rows
Step 3: Convert numeric variable to categorical
the variabe cyl represent the number of cylinders in a car.
ASlthought it is numeric (4,6,8), it reepresent categoreies.
for categorical analysis,we convert it into factor
Step 4:perform categorical analysis
we calculate the average mile per gallon(mpg) for each cylinder category.
how this function work together
`%>% passes output from one function to the next.
group_by(cyl)splits the datasets into groups.summarize()calculate statistics per group.mean(mpg)computes average milage..groups = "drop"removes grouping afterward.
step 6:Visualize using a barplot
write an r script to create scatter pplot, incorporating categorical anlysis through colour-coded data points representing diffrent groups, using ggplot 2.
Step 1: Load libraries
We load two libraries:
ggplot 2is used to build plots layer by layer (we will use it to create the scatter plot).dplyrprovides functions for exploring and summarizeing sata (we will use it to understand the categoreis in the dataset).
Step 2:Load the dataset(iris)
we use the built-in dataset `iris.
what this dataset contain:
Each row in one flower sample (an observation)
there are 150 total observations.
the columns
speciesis a categorical variable with 3 groups:setosaversicolorverginica
the columns
Sepal.LenghtandSepal.widthare numeric measurements that we will plot.
Step 5:Create a basic scatter plot(no ctaegories yet)
A scatter plot shows the relationship between two numerical varible.
here we plot:
x-axis:
Sepal.Lengthy-axis:
Sepal.Width
important point:
Each dot represent one flower (one row in the dataset).
Step 6: Add a categorical grouping using color + species
Now we include categorical variable:
color = Speciestells ggplot to assign a different color to each species.
What changes?
the plot now visually separate the three species based on color.
this is the main "categorical analysis" isea: we can see if different groups clusters differently.
Step 7: Improve point visibality (size and transperency)
we adjust how point looks:
suizw + 3makes each dot bigger, so it is easier to see.alpha = 0.7makes dot slightly transparent,which helps when point overlap.
why transparency helps:
If many points overlap in same region, transparency makes dance areas more visible.
Step 8: add informative labels(title , axes, legend)
good plots should clearly commmunicate the what viewer is seeing.
labs() adds;
titlefor the plot headingxandyaxis labelscolorlegend title(so the legend has meaningful name)
Step 9: Apply a clean theme and move the legend
themes control the background, grids , and text styling.
theme_minimal()removes heavy background and gives a clean look.theme(legend.positiion = "Top)moves the legend apove the plot.
Why move the legend?
When the legend is at the top, it is often easier to notice and read ,especially in presentations.
Problem statement : Develop a R script to produce a bar graph displaying the frequency distribution of categorical data, grouped by a specific variable using ggplot2
##Step 1:Load required Libraries
##Step 2: LOad and inspect the dataset
We load the builtin dataset mtcars and view the first few row to understand its structure
##step3 : Exploratory data analysis
Before creating any visualization, we explore teh dataset to understand the variable and types.
str(data)helps us to identify the data type of each variablesummary(data)provides statistical summaries
Step 4: convert the number variable to factors
To correctly visualize categorical data, we convert relevant variable into factors
Step 5 : Examine frequency Distribution
Before ploptting, we analyze how the data is distributed across categories
Helps us to undderstand the count of each category
provide insight into relationships between variable
Prepare us for interpreting the visualization
Step 6: Create the bar graph
5)
Write an R script to construct a box plot showcasing the distribution of a continuous variable , grouped by a categorical variable , using ggplote's fill aesthetic.
step 1 : load Required Library
Step 2: Explore the Inbuilt Dateset
Step 3: Construct Box plot with Grouping
step 3.1:
Propgram
Develop a function in R to plot a function curve based on a mathematical equation provided as input, with different curve style for each year , using ggplot2.
Objectives
Step1: Load required Library
We use the ggplot2 package because it allows elegant and flexible plotting. It supports layering and grouping very well.
##Step2: create data for the functions
Comments
Post a Comment