PART A

Develop an r program to quickly explore a given dataset , including categorical analysis using the graph_by() command, and visualize the findings using ggplot2.

what we will do In this program, we will:

Load the required libraries and dataset.
Explore the structure of the dataset.
Convert a numerical variable into a categorical variable.
Perform categorical analysis using group_by() and summarize().
Visual the result using ggplot2.

step1 : Load required libraries and dataset

tidyverse is a collection of packages for data science.
dplyr is used for grouping and summarizing data.

{r} library(tidyverse)

library(dplyr)

data <- mtcars

Step 2: Explore the dataset

Before performing any analysis, we should understand the dataset.

we will check:

Number of rows and columns
Columns name
Data types
Summary statistics
First few rows

{r} # Dimension (rows and columns) dim(data) # Column names names(data) # structure of dataset str(data) # Summary statistics summary(data) # First six rows head(data)

Step 3: Convert numeric variable to categorical

the variabe cyl represent the number of cylinders in a car.

ASlthought it is numeric (4,6,8), it reepresent categoreies.
for categorical analysis,we convert it into factor

{r} #convert 'cyl' to factor

data$cyl <- as.factor(data$cyl)

# Confirm conversion

str(data$cyl)

levels(data$cyl)

Step 4:perform categorical analysis

we calculate the average mile per gallon(mpg) for each cylinder category.

how this function work together

`%>% passes output from one function to the next.
group_by(cyl) splits the datasets into groups.
summarize() calculate statistics per group.
mean(mpg) computes average milage.
.groups = "drop" removes grouping afterward.

{r} summary_data <- data %>%

group_by(cyl) %>%

summarize(

avg_mpg =mean(mpg),

.groups = "drop" )

summary_data

step 6:Visualize using a barplot

{r} ggplot(summary_data,aes(x=cyl,y= avg_mpg , fill = cyl))+ geom_bar(stat = "identity")+ labs( title ="average MPG by Cylinder count", x="number of cylinders", y="average MPG" ) + theme_minimal()

̥--- title: "prg2" author: "kiran" format: docx editor: visual ---

write an r script to create scatter pplot, incorporating categorical anlysis through colour-coded data points representing diffrent groups, using ggplot 2.

Step 1: Load libraries

We load two libraries:

ggplot 2 is used to build plots layer by layer (we will use it to create the scatter plot).
dplyr provides functions for exploring and summarizeing sata (we will use it to understand the categoreis in the dataset).

{r} library(ggplot2) library(dplyr)

Step 2:Load the dataset(iris)

we use the built-in dataset `iris.

what this dataset contain:

Each row in one flower sample (an observation)
there are 150 total observations.
the columns species is a categorical variable with 3 groups:
- setosa
- versicolor
- verginica
the columns Sepal.Lenght and Sepal.width are numeric measurements that we will plot.

{r} data = iris head(data,10)

{r} tail(data)

{r} str(data)

{r} summary(data)

{r} names(data)

{r} dim(data)

{r} data[1]

{r} data$Sepal.Length

{r} typeof(data$Sepal.Length)

{r} typeof(data[1])

{r} data[][3]

{r} data[150,5]

{r} data[1:5,1:3]

{r} data[][5]

{r} data$Species

{r} table(data$Species)

Step 5:Create a basic scatter plot(no ctaegories yet)

A scatter plot shows the relationship between two numerical varible.

here we plot:

x-axis: Sepal.Length
y-axis: Sepal.Width

important point:

Each dot represent one flower (one row in the dataset).

{r} ggplot(data , aes(x = Sepal.Length,y = Sepal.Width)) + geom_point()

Step 6: Add a categorical grouping using color + species

Now we include categorical variable:

color = Species tells ggplot to assign a different color to each species.

What changes?

the plot now visually separate the three species based on color.
this is the main "categorical analysis" isea: we can see if different groups clusters differently.

{r} ggplot(data , aes(x = Sepal.Length,y = Sepal.Width, color = Species)) + geom_point()

Step 7: Improve point visibality (size and transperency)

we adjust how point looks:

suizw + 3 makes each dot bigger, so it is easier to see.
alpha = 0.7 makes dot slightly transparent,which helps when point overlap.

why transparency helps:

If many points overlap in same region, transparency makes dance areas more visible.

{r} ggplot(data , aes(x = Sepal.Length,y = Sepal.Width, color = Species)) + geom_point(size = 3,alpha = 0.7)

Step 8: add informative labels(title , axes, legend)

good plots should clearly commmunicate the what viewer is seeing.

labs() adds;

title for the plot heading
x and y axis labels
color legend title(so the legend has meaningful name)

{r} ggplot(data , aes(x = Sepal.Length,y = Sepal.Width, color = Species)) + geom_point(size = 3,alpha = 0.5) + labs( title = "Scatter plot of sepal dimensions", x = "Sepal length", y = "Sepal width", color = "Species" )

Step 9: Apply a clean theme and move the legend

themes control the background, grids , and text styling.

theme_minimal() removes heavy background and gives a clean look.
theme(legend.positiion = "Top) moves the legend apove the plot.

Why move the legend?

When the legend is at the top, it is often easier to notice and read ,especially in presentations.

{r} ggplot(data , aes(x = Sepal.Length,y = Sepal.Width, color = Species)) + geom_point(size = 3,alpha = 0.7) + labs( title = "Scatter plot of sepal dimensions", x = "Sepal length", y = "Sepal width", color = "Species" )+ theme_minimal()+ theme(legend.position = "Top")

--- title: "prg4" author: "Kiran T L" format: docx editor: visual ---̥

Problem statement : Develop a R script to produce a bar graph displaying the frequency distribution of categorical data, grouped by a specific variable using ggplot2

##Step 1:Load required Libraries

{r} #insatll.packages('ggplot2') library(ggplot2)

##Step 2: LOad and inspect the dataset

We load the builtin dataset mtcars and view the first few row to understand its structure

{r} data=mtcars head(mtcars)

##step3 : Exploratory data analysis

Before creating any visualization, we explore teh dataset to understand the variable and types.

{r} str(data)

{r} summary(data)

str(data) helps us to identify the data type of each variable
summary(data) provides statistical summaries

Step 4: convert the number variable to factors

To correctly visualize categorical data, we convert relevant variable into factors

{r} data$cyl

{r} table(data$cyl)

{r} data$gear

{r} table(data$gear)

{r} class(data$cyl)

{r} class(data$gear)

{r} data$cyl=as.factor(data$cyl) data$gear=as.factor(data$gear)

{r} class(data$cyl)

{r} data$cyl

{r} class(data$gear)

{r} data$gear

{r} summary(data)

{r} str(data)

Step 5 : Examine frequency Distribution

Before ploptting, we analyze how the data is distributed across categories

{r} table(data$cyl)

{r} table(data$gear)

{r} table(data$cyl ,data$gear )

Helps us to undderstand the count of each category
provide insight into relationships between variable
Prepare us for interpreting the visualization

Step 6: Create the bar graph

{r} ggplot(data, aes(x=cyl, fill=gear ))

{r} ggplot(data, aes(x=cyl, fill=gear ))+geom_bar()

{r} ggplot(data, aes(x=cyl, fill=gear ))+geom_bar(position="dodge")

{r} ggplot(data, aes(x=cyl, fill=gear ))+geom_bar(position="dodge")+ theme_minimal()+ theme(legend.position = 'top')

{r} ggplot(data, aes(x=cyl, fill=gear ))+geom_bar(position="dodge")+ theme_minimal()+ labs( title="bar dispalying the frequency distribution of categorical data", y="Count", x="number of cylinders")+ theme_minimal()+ theme(legend.position = 'top')

5)

# Load library

library(ggplot2)

# Use built-in dataset

data <- iris

# Convert Species to factor (grouping variable)

data$Species <- as.factor(data$Species)

# Create histogram with density curves

ggplot(data, aes(x = Sepal.Length, fill = Species, color = Species)) +

# Histogram (density scaled)

geom_histogram(aes(y = after_stat(density)),

position = "identity",

alpha = 0.4,

bins = 30) +

# Density curves for each group

geom_density(alpha = 0.8, linewidth = 1) +

# Labels

labs(

title = "Distribution of Sepal Length with Density Curves by Species",

x = "Sepal Length",

y = "Density"

) +

# Theme

theme_minimal()

--- title: "prg6" author: "Kiran T L" format: docx editor: visual ---

Write an R script to construct a box plot showcasing the distribution of a continuous variable , grouped by a categorical variable , using ggplote's fill aesthetic.

step 1 : load Required Library

{r} #Load ggplot2 package for visualization library(ggplot2)

Step 2: Explore the Inbuilt Dateset

{r} #use the built-in 'iris' dataset # 'Petal.Width' is a continuous variable # 'species' is a categorical grouping variiable str(iris) head(iris)

Step 3: Construct Box plot with Grouping

step 3.1:

{r} # Initialize ggplot withh data and aesthitic mappings p= ggplot(data = iris, aes(x = Species, y = Petal.Width, fill = Species)) p

{r} # add the box plot layer p = p + geom_boxplot() p

{r} p = p + geom_boxplot()+ theme_minimal()+ theme(legend.position = 'top')+ labs(title = "IRIS box plot", x='Species',y = 'petal.Width') p

--- title: "prg7" author: "Kiran T L" format: docx editor: visual ---

Propgram

Develop a function in R to plot a function curve based on a mathematical equation provided as input, with different curve style for each year , using ggplot2.

Objectives

{r} # Load ggplot2 package for advanced plotting library(ggplot2)

We use the ggplot2 package because it allows elegant and flexible plotting. It supports layering and grouping very well.

##Step2: create data for the functions

{r} #create a sequence of x values ranging from -2pai to 2pai x <- seq(-2*pi,2*pi, length.out = 500) #evaluate sin() and cos(x) over the x range y1 <- sin(x) y2 <- cos(x) #combine data into one data frame df <- data.frame( x = rep(x,2), y = c(y1,y2), group = rep(c("sin(x)","cos(x)"), each = length(x)) )

Step 3.1: Initialize the ggplot Object

{r} ##Strat building the ggplot using the dataframe and aesthetics p = ggplot(df, aes(x = x, y = y, color = group, linetype = group)) p

Step 3.2: Add the line Geometry

{r} # Add smooth lines to represent each function curve p = p + geom_line(size = 1.2) p

Step 3.3

{r} # Add title, ax is labels, and legends p <- p + labs(title = "Funtion Curves: sin(x) and cos(x)", x = "x", y = "y = f(x)", color = "Function", linetype = "Function") p

Step 3.4: Apply a Clean Theme

{r} # Use a clean and simple background theme p <- p + theme_minimal() p

Search This Blog

id,

PART A

what we will do In this program, we will:

step1 : Load required libraries and dataset

Step 2: Explore the dataset

Step 3: Convert numeric variable to categorical

Step 4:perform categorical analysis

how this function work together

step 6:Visualize using a barplot

Step 1: Load libraries

Step 2:Load the dataset(iris)

Step 5:Create a basic scatter plot(no ctaegories yet)

Step 6: Add a categorical grouping using color + species

Step 7: Improve point visibality (size and transperency)

Step 8: add informative labels(title , axes, legend)

Step 9: Apply a clean theme and move the legend

Step 4: convert the number variable to factors

Step 5 : Examine frequency Distribution

Step 6: Create the bar graph

5)

step 1 : load Required Library

Step 2: Explore the Inbuilt Dateset

Step 3: Construct Box plot with Grouping

step 3.1:

Step 3.1: Initialize the ggplot Object

Step 3.2: Add the line Geometry

Step 3.3

Step 3.4: Apply a Clean Theme

Comments

Post a Comment

Popular posts from this blog

PART B

..