Simply put, feature engineering is just a fancy term that means optimizing your dataset so that the machine learning models can perform at their absolute best. Whether it’s extracting hidden patterns, reshaping your values to fit better into the models, or getting rid of the useless junk so the models can focus on only what’s important, we’ve baked lots of goodies into our Feature Engineering section. We’ll help you squeeze every last drop of predictive power out of your data just like it’s a tube of toothpaste…you know exactly what we’re talking about.
No, we aren’t talking about what you have to do when your friend sends you that “this is the worst date I’ve ever been on, please come get me immediately” text. We’re talking about calendar dates. Do you know how much information is in a simple date? A lot! You can pull out the day of the week, fiscal quarter, month, etc. all of which may help you better predict your target variable. If there is any sort of time-based trend in your data, our Date Extraction section will find it.
We’re all a little be weird, and that’s ok. However, it helps make models perform better if your target variable isn’t. If you’ve ever taken a statistics class, you may remember your professor talking about a “normal bell curve” at nauseum. Whether you remember any of that stuff or not, it doesn’t matter because our Normalization section will help transform your target to be more normal… but not you. Sorry, it can’t help make you (or more importantly the CitizenDS staff) any more normal.
You ever try to fit a square peg through a circle hole? Yeah, it’s not easy unless you have a giant hammer and a lot of pent up anger from the darn printer never working. DON’T LIE TO ME, I KNOW THERE’S INK IN THERE BECAUSE I JUST PUT SOME IN! Data is pretty similar in that certain “shapes” of data will work better with your target than others. Our Optimization section will help you transform your data to be in the best possible shape to maximize predictive power.
Sometimes things that happened in the past can be really useful in prediction, like if I’m trying to predict how much money I’ll have tonight, a pretty good indicator may be how much money I had last night (assuming today isn’t Black Friday and I’m not at Best Buy). Our Time Lags section is designed to help you determine if previous values of different variables are useful in predicting your target.
Just like your dorm-room neighbors who prevented you from taking glorious midday naps, data can be noisy. But also like your dorm-room neighbors, just because it’s noisy, doesn’t mean that it can’t be helpful (especially when it comes to moving all that furniture around). Our Moving Averages section allows you to smooth out that noisy data to get to the heart of what’s going on.
PCA/Class Reduction/Weak Reduction/Redundant Reduction
Sometimes, less is more, and that’s what our PCA, Class Reduction, Weak Reduction, and Redundant Reduction sections are all about. These sections are like that garage sale you desperately need to have: they keep the useful and throw out the unnecessary things. You don’t want to clog up the machine learning models with junk, so you’ll definitely want to visit these sections before you go to model.