Exploring Exoplanets
Regression Analysis in R Studio
This research project applies regression analysis in R Studio to astronomical data from the Open Exoplanet Catalogue, exploring how characteristics of host stars influence the orbital periods of their exoplanets. Using forward selection, backward elimination, and all-possible regressions, the project identifies the most statistically significant predictors while addressing the challenges of heavy-tailed distributions and large-scale astronomical measurements. The findings demonstrate the power of statistical modeling in understanding planetary systems beyond our own.
Key Metrics
-
Analyzed a dataset of 9,000+ observations of exoplanets and host stars.
-
Conducted data cleaning and restructuring to handle missing values and large-scale astronomical units.
-
Built multiple regression models using forward selection, backward elimination, and all-possible regressions.
-
Selected a final model with five significant predictors and an R² of 0.96.
-
Evaluated residuals and diagnostics to account for outliers, kurtosis, and non-normality in the data.
Insights
Modeling
How can regression diagnostics reveal limitations in astronomical data while still producing usable models?
Understanding
How do the mass and radius of a host star affect the orbital period of its exoplanets?
Predictions
Can semi-major axis and inclination reliably predict orbital timing across different planetary systems?
.png)

.png)