Navigation Menu

Search code, repositories, users, issues, pull requests..., provide feedback.

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly.

To see all available qualifiers, see our documentation .

  • Notifications You must be signed in to change notification settings

Contains Solutions and Notes for the Machine Learning Specialization By Stanford University and Deeplearning.ai - Coursera (2022) by Prof. Andrew NG

greyhatguy007/Machine-Learning-Specialization-Coursera

Folders and files, repository files navigation, machine learning specialization coursera.

regression models assignment 3

Contains Solutions and Notes for the Machine Learning Specialization by Andrew NG on Coursera

Note : If you would like to have a deeper understanding of the concepts by understanding all the math required, have a look at Mathematics for Machine Learning and Data Science

Course 1 : Supervised Machine Learning: Regression and Classification

  • Practice quiz: Regression
  • Practice quiz: Supervised vs unsupervised learning
  • Practice quiz: Train the model with gradient descent
  • Model Representation
  • Cost Function
  • Gradient Descent
  • Practice quiz: Gradient descent in practice
  • Practice quiz: Multiple linear regression
  • Numpy Vectorization
  • Multi Variate Regression
  • Feature Scaling
  • Feature Engineering
  • Sklearn Gradient Descent
  • Sklearn Normal Method
  • Linear Regression
  • Practice quiz: Cost function for logistic regression
  • Practice quiz: Gradient descent for logistic regression
  • Classification
  • Sigmoid Function
  • Decision Boundary
  • Logistic Loss
  • Scikit Learn - Logistic Regression
  • Overfitting
  • Regularization
  • Logistic Regression

Certificate Of Completion

Course 2 : advanced learning algorithms.

  • Practice quiz: Neural networks intuition
  • Practice quiz: Neural network model
  • Practice quiz: TensorFlow implementation
  • Practice quiz : Neural Networks Implementation in Numpy
  • Neurons and Layers
  • Coffee Roasting
  • Coffee Roasting Using Numpy
  • Neural Networks for Binary Classification
  • Practice quiz : Neural Networks Training
  • Practice quiz : Activation Functions
  • Practice quiz : Multiclass Classification
  • Practice quiz : Additional Neural Networks Concepts
  • Multiclass Classification
  • Neural Networks For Handwritten Digit Recognition - Multiclass
  • Practice quiz : Advice for Applying Machine Learning
  • Practice quiz : Bias and Variance
  • Practice quiz : Machine Learning Development Process
  • Advice for Applied Machine Learning
  • Practice quiz : Decision Trees
  • Practice quiz : Decision Trees Learning
  • Practice quiz : Decision Trees Ensembles
  • Decision Trees

Certificate of Completion

Course 3 : unsupervised learning, recommenders, reinforcement learning.

  • Practice quiz : Clustering
  • Practice quiz : Anomaly Detection
  • Anomaly Detection
  • Practice quiz : Collaborative Filtering
  • Practice quiz : Recommender systems implementation
  • Practice quiz : Content-based filtering
  • Collaborative Filtering RecSys
  • RecSys using Neural Networks
  • Practice quiz : Reinforcement learning introduction
  • Practice Quiz : State-action value function
  • Practice Quiz : Continuous state spaces
  • Deep Q-Learning - Lunar Lander

Specialization Certificate

Stargazers over time.

Stargazers over time

Course Review :

This Course is a best place towards becoming a Machine Learning Engineer. Even if you're an expert, many algorithms are covered in depth such as decision trees which may help in further improvement of skills.

Special thanks to Professor Andrew Ng for structuring and tailoring this Course.

An insight of what you might be able to accomplish at the end of this specialization :

Write an unsupervised learning algorithm to Land the Lunar Lander Using Deep Q-Learning

  • The Rover was trained to land correctly on the surface, correctly between the flags as indicators after many unsuccessful attempts in learning how to do it.
  • The final landing after training the agent using appropriate parameters :

Write an algorithm for a Movie Recommender System

  • A movie database is collected based on its genre.
  • A content based filtering and collaborative filtering algorithm is trained and the movie recommender system is implemented.
  • It gives movie recommendentations based on the movie genre.

movie_recommendation

  • And Much More !!

Concluding, this is a course which I would recommend everyone to take. Not just because you learn many new stuffs, but also the assignments are real life examples which are exciting to complete .

Happy Learning :))

Code of conduct

Contributors 8.

@greyhatguy007

  • Jupyter Notebook 97.5%
  • Python 2.5%

STATS191 - Home

Assignment 3

Assignment 3 #.

You may discuss homework problems with other students, but you have to prepare the written assignments yourself.

Please combine all your answers, the computer code and the figures into one file, and submit a copy in your dropbox on Gradescope.

Due date: 11:59 PM, May 10, 2024.

Grading scheme: 10 points per question, total of 40.

Building PDF #

If you have not installed LaTeX on your computer. After running the below commands ( once is enough ), then using either Quarto or RMarkdown formats should hopefully be sufficient to build directly to PDF.

RStudio: RMarkdown , Quarto

Question 1 #

We revisit Tomasetti’s and Vogelstein’s study on cancer incidence across tissues from Assignment 2. The second part of their paper deals with the existence of two clusters in the dataset: According to the authors, D-tumours (D for deterministic) can be attributed to some degree to environmental and genetic factors, while the risk of R-tumours (R for replicative) is affected mainly by random mutations occuring during replication of stem cells.

The dataset also includes a column Cluster according to the classification of that tumour as Deterministic or Replicative. Fit a linear model as in Assignment 2, but with a different slope for D- and R-tumours.

Make a scatterplot including the two regression lines.

Conduct a F-test to compare the regression model above to the regression model which does not account for this classification. What is the p-value?

Question 2 #

Use the Carseats dataset from the packages ISLR2 for this problem

Fit a multiple regression model to predict Sales using Advertising , CompPrice , Price , Urban and US .

Provide an interpretation of each coefficient in the model. Be careful – some of the variables in the model are categorical.

For which of the predictors can you reject the null hypothesis \(\beta_j=0\) at \(\alpha=0.05\) ?

On the basis of the previous question, fit a smaller model that only uses the predictors for which there is evidence of association with the outcome.

How well do the models in parts 1. and 4. fit the data?

Using the model from part 4., obtain 90% confidence intervals for the coefficients.

Is there any evidence of outliers or high leverage observations in the model from part 4.?

Question 3 #

The dataset  state.x77  in R contains the following statistics (among others) related to the 50 states in the USA:

Population : population estimate (1975)

Income : per capita income (1974)

Illiteracy : illiteracy (1970, percent of population)

HS.Grad : percent high school graduates (1970)

Make it into a data frame using:

We are interested in the relation between  Income  and the other 3 variables.

1. Produce a 4 by 4 scatter plot of the variables above.

2. Fit a multiple linear regression model to the data with  Income  as the outcome and  Population ,  Illiteracy ,  HS.Grad  as the independent variables.

Compare this model to the that uses only Population as a covariate.

5. Produce standard diagnostic plots of the multiple regression fit in part 2. Summarize the results.

6. Find states with outlying predictors by looking at the leverage values using hatvalues . Use a cutoff of 0.2.

7. Find outliers, if any, in the response. Remove them from the data and refit a multiple linear regression model and compare the result with the previous fit.

Question 4 #

The dataset iris in R gives the measurements in centimeters of the variables sepal length and width and petal length and width, respectively, for 50 flowers from each of 3 species of iris.

Fit a multiple linear regression model to the data with sepal length as the dependent variable and sepal width, petal length and petal width as the independent variables.

Test the reduced model of \(H_0: \beta_{\texttt sepal width}=\beta_{\texttt petal length} = 0\) with an F-test at level \(\alpha=0.05\)

Test \(H_0: \beta_{\texttt sepal width} = \beta_{\texttt petal length}\) at level \(\alpha=0.05\)

Test \(H_0: \beta_{\texttt sepal width} < \beta_{\texttt petal length}\) at level \(\alpha=0.05\) .

Shalabh [email protected] [email protected] Department of Mathematics & Statistics Indian Institute of Technology Kanpur , Kanpur - 208016 ( India )

MTH 416 : Regression Analysis

Syllabus: Simple and multiple linear regression, Polynomial regression and orthogonal polynomials, Test of significance and confidence intervals for parameters. Residuals and their analysis for test of departure from the assumptions such as fitness of model, normality, homogeneity of variances, detection of outliers, Influential observations, Power transformation of dependent and independent variables. Problem of multicollinearity, ridge regression and principal component regression, subset selection of explanatory variables, Mallow's Cp statistic. Nonlinear regression, different methods for estimation (Least squares and Maximum likelihood), Asymptotic properties of estimators. Generalised Linear Models (GLIM), Analysis of binary and grouped data using logistic and log-linear models.  

Grading Scheme : Quizzes: 20%, Mid semester exam: 30%, End semester exam: 50%

Books:  1. Introduction to Linear Regression Analysis by Douglas C. Montgomery, Elizabeth A. Peck, G. Geoffrey Vining (Wiley), Low price Indian edition is available.

                2. Applied Regression Analysis by Norman R. Draper, Harry Smith (Wiley), Low price Indian edition is available.

                3. Linear Models and Generalizations - Least Squares and Alternatives by  C.R. Rao, H. Toutenburg, Shalabh, and C. Heumann (Springer, 2008)

                4. A Primer on Linear Models by John F. Monahan (CRC Press, 2008)

                5. Linear Model Methodology by Andre I. Khuri (CRC Press, 2010)

Assignaments:

Assignment 1

Assignment 2

Assignment 3

Assignment 4

Assignment 5

Assignment 6

Assignment 7

Assignment 8

Lecture notes for your help (If you find any typo, please let me know)

Lecture Notes 1 : Introduction

Lecture Notes 2 : Simple Linear Regression Analysis

Lecture Notes 3 : Multiple Linear Regression Model

Lecture Notes 4 : Model Adequacy Checking

Lecture Notes 5 : Transformation and Weighting to Correct Model Inadequacies

Lecture Notes 6  : Diagnostic for Leverage and Influence

Lecture Notes 7  : Generalized and Weighted Least Squares Estimation

Lecture Notes 8  : Indicator Variables

Lecture Notes 9  : Multicollinearity

Lecture Notes 10  : Heteroskedasticity

Lecture Notes 11  :  Autocorrelation

Lecture Notes 12  : Polynomial Regression Models

Lecture Notes 13  : Variable Selection and Model Building

Lecture Notes 14  : Logistic Regression Models

Lecture Notes 15  : Poisson Regression Models

Lecture Notes 16  : Generalized Linear Models

  • Installing and starting up R and RStudio
  • Creating a report using rmarkdown
  • Structuring an analysis
  • R scripts and R markdown files
  • Import data
  • Make summaries of data
  • Making figures
  • Making tables
  • Version control and GitHub
  • Introduction to statistical inference
  • Statistical inference, p-values and confidence intervals
  • Regression models part 1
  • Regression models part 2: Lactate threshold analysis
  • Regression models and correlations
  • Analyzing pre- to post-experiments
  • Analyzing trials with mixed-model
  • More on mixed models (visualizing models)
  • Importing data
  • Summarise data
  • Group exercises
  • Descriptive data
  • Feedback assignment 1
  • Feedback assignment 2
  • Regression models
  • Feedback assignment 3
  • Repeated measures studies

Assignment 3: Regression models

Introduction.

Regression models are extremely flexible and the working horse of statistics! We will attempt to understand them better in this course as they are very common in data analysis. This weeks assignment is composed of several parts, one of which is optional (meaning you do not have to do them).

Part 1: t-tests and regression analysis

In the last assignment you were expected to perform a t-test to compare the HIGH and LOW group in some variable. Here I would like you to compare AVG_CSA_T1 between the HIGH and LOW group with a simple t-test and compare your results to a regression model using CLUSTER as a predictor.

Make sure that you use var.equal = TRUE in your t-test when comparing with the regression model. How do you interpret the results of the regression model? Do you get the same answer?

Write up the results with a short introduction including a question (and some background information if you like). The results from the regression model can be displayed as a table with the coefficients, SE, t- and p-values. See if you find inspiration in the broom package.

Part 2: Is there a relationship between muscle size and strength

The Haun et al. 2019 data set contains a variable for muscle strength Squat_3RM_kg . In this assignment I want you to estimate the relationship between muscle mass and strength. There are several muscle mass measures in the data set at time 1 (T1). Select one or more of these variables and estimate the relationship between muscle mass and strength. Think about the relationship as a bigger muscle could produce more force, muscle mass is the predictor and muscle strength is the dependent variable. Use the whole data set!

Part 3: Calculate lactate thresholds (Optional)

Using the data you collected in the physiology lab. Estimate the exercise intensity at 4 mmol blood lactate. Make use of the code in the second lesson this week.

How to hand in?

As always, I prefer a github submission. But a R-markdown file AND an html-file on canvas is OK. I would like to have a report (html-file) without any code and the R-markdown file for reference (where I can read the code). Try to make the report and figures to look elegant!