Categorical statistics is a branch of statistics that deals with the analysis and interpretation of categorical data, which are variables that have a finite set of possible values. This includes variables such as gender, occupation, and marital status. Categorical statistics involves a range of techniques for describing and summarizing categorical data, such as frequency tables, contingency tables, and bar charts. These techniques can be used to explore relationships between different categories and identify patterns and trends in the data.
Statistical models are also used in categorical statistics to test hypotheses and make predictions about categorical data. For example, logistic regression is a commonly used model for predicting binary outcomes based on demographic and behavioral characteristics. Categorical statistics is widely used in various fields such as social sciences, marketing, healthcare, and finance. Understanding customer behavior, public opinion, and consumer preferences often rely on the analysis of categorical data.
import pandas as pd import seaborn as sns # Load data from a CSV file data = pd.read_csv('my_data.csv') # Create a frequency table of categorical variables freq_table = pd.crosstab(index=data['gender'], columns='count') print(freq_table) # Create a contingency table of two categorical variables cont_table = pd.crosstab(index=data['occupation'], columns=data['marital_status']) print(cont_table) # Visualize the frequency distribution of a categorical variable sns.countplot(x='gender', data=data) plt.show() # Perform logistic regression on binary outcome variable import statsmodels.api as sm # Define the dependent and independent variables X = data[['age', 'income']] y = data['buy_or_not'] # Fit the logistic regression model logit_model = sm.Logit(y, X).fit() # Print the model summary print(logit_model.summary())
# Load data from a CSV file data <- read.csv("my_data.csv") # Create a frequency table of categorical variables freq_table <- table(data$gender) print(freq_table) # Create a contingency table of two categorical variables cont_table <- table(data$occupation, data$marital_status) print(cont_table) # Visualize the frequency distribution of a categorical variable library(ggplot2) ggplot(data, aes(x=gender)) + geom_bar() + labs(x="Gender", y="Count") # Perform logistic regression on binary outcome variable library(statsmodels) # Define the dependent and independent variables X <- data[, c("age", "income")] y <- data$buy_or_not # Fit the logistic regression model logit_model <- glm(y ~ ., data=X, family=binomial(link='logit')) # Print the model summary print(summary(logit_model))
- Google Developers. (n.d.). Transform Categorical Data. Retrieved May 5, 2023, from https://developers.google.com/machine-learning/data-prep/transform/transform-categorical