To steal a phrase from the late and great Kobe Bryant, CitizenDS Predictor is a “different animal and the same beast.” This is the whole enchilada built for a family of 12. This version has got ALLLLLLL the goods. You won’t be left wanting more, but you may want a napkin.

Starting with all the goodness of the Explorer and Explorer Pro Versions

It goes without saying, all the fun in both the Explorer and Explorer Pro versions are included in here. If you want a quick refresher on that awesomeness, review them both here:

Explorer Version

Explorer Pro Version

Upload Data

With CitizenDS Predictor, we allow you to upload as many columns and as many rows as you want (but be careful! the more data you upload, the longer things will take to process).

Let’s start by talking about ALL the Automation

We’re just going to get right to the point here: Predictor has a lot of automation built into it…like…A LOT. When we started building this software, we had one goal in mind: give the power of data science to everyone, yet still allow employees to do data science in a manner meaningful to their skill level. It was out of this goal that we developed our three-levels of automation.

Do Everything

The first and highest level of automation is what we like to call the “do everything” button. If you’re entirely new to data science, you’re going to love this. You give us the data, tell us what you’re interested in predicting, and hit the giant “do everything” button. Go grab a cup of coffee or take a lap around the office while our software cleans, optimizes, and trains the best machine learning model to fit your data. Congratulations, you’re now a data scientist…except you didn’t have to spend years of your life reading crazy Egyptian hieroglyphics (ie: math papers) or learning to code and spending HOURS looking for one stupid, misplaced semi-colon. We’re envious… trust us.

Section Specific

Ok, so you’re not totally new to the whole data game, but you’re also not yet a wizard. That’s where our second level of automation comes into play. Let’s say you’re comfortable with cleaning data, but don’t have a lot of experience with optimizing the data (or as us data nerds would say: “feature engineering”). Not a problem, you can tell us to go ahead and just take care of that whole section so you don’t have to. Our second level of automation allows you to pick what sections you do and don’t want to do. Think of it like your high school cafeteria…except for data…and without the terrible food.

Variable Specific

The last level is for those data gurus who know exactly how their data should be handled and enjoy getting their hands dirty. Our software will let you tell us what you want done for all of your variables, and we’ll do it… but there may be times where you really aren’t sure what the best move is in a particular section for a particular variable, and that’s where our last level of automation shows up. Let’s say you’re in deep in the feature engineering section and you’re optimizing each and every variable using your domain knowledge, but then you get to variable X, and you hit a roadblock. Take the log form? No, convert it to a z-score. No! Square root it… why don’t you let us go ahead and take care of this one for you.

Date Extraction

No, we aren’t talking about what you have to do when your friend sends you that “this is the worst date I’ve ever been on, please come get me immediately” text. We’re talking about calendar dates. Do you know how much information is in a simple date? A lot! You can pull out the day of the week, fiscal quarter, month, etc. all of which may help you better predict your target variable. If there is any sort of time-based trend in your data, our Date Extraction section will find it.

Normalization

We’re all a little be weird, and that’s ok. However, it helps make models perform better if your target variable isn’t. If you’ve ever taken a statistics class, you may remember your professor talking about a “normal bell curve” at nauseum. Whether you remember any of that stuff or not, it doesn’t matter because our Normalization section will help transform your target to be more normal… but not you. Sorry, it can’t help make you (or more importantly the CitizenDS staff) any more normal.

Optimization

You ever try to fit a square peg through a circle hole? Yeah, it’s not easy unless you have a giant hammer and a lot of pent up anger from the darn printer never working. DON’T LIE TO ME, I KNOW THERE’S INK IN THERE BECAUSE I JUST PUT SOME IN! Data is pretty similar in that certain “shapes” of data will work better with your target than others. Our Optimization section will help you transform your data to be in the best possible shape to maximize predictive power.

Time Lags

Sometimes things that happened in the past can be really useful in prediction, like if I’m trying to predict how much money I’ll have tonight, a pretty good indicator may be how much money I had last night (assuming today isn’t Black Friday and I’m not at Best Buy). Our Time Lags section is designed to help you determine if previous values of different variables are useful in predicting your target.

Moving Averages

Just like your dorm-room neighbors who prevented you from taking glorious midday naps, data can be noisy. But also like your dorm-room neighbors, just because it’s noisy, doesn’t mean that it can’t be helpful (especially when it comes to moving all that furniture around). Our Moving Averages section allows you to smooth out that noisy data to get to the heart of what’s going on.

PCA/Class Reduction/Weak Reduction/Redundant Reduction

Sometimes, less is more, and that’s what our PCA, Class Reduction, Weak Reduction, and Redundant Reduction sections are all about. These sections are like that garage sale you desperately need to have: they keep the useful and throw out the unnecessary things. You don’t want to clog up the machine learning models with junk, so you’ll definitely want to visit these sections before you go to model.

Oh yeah, we’re bringing back the report card just like it’s middle school all over again. Before you send your data into the machine learning models, we “grade” your data for certain aspects of quality to let you know what’s good and what’s maybe not so good about your data. Unfortunately, unlike your dad who may have slipped you a couple of bucks for every ‘A’ you got, you’ll only get a thumbs up from us for those good grades.

Just because you have some data and a defined problem, doesn’t mean that the data is useful. If we asked you to predict the score of the next football game and gave you data about the chemical composition of the planets in our solar system to help you out, you probably wouldn’t get very far. Before you waste your time training the machine learning algorithms, you deserve to have realistic expectations of what’s to come, and that’s why we do a quick check to see if your data has predictive power before you even get to modelling.

We won’t lie to you, we’re pretty darn excited about this section. We’ve managed to cram over 30 different types of ML models into our software for you to play with that span five different major types of problems. Just like the rest of our software, you can interact with the models as much or as little as you want: if you want us to do all of the training and tuning for every model, no problem, there’s a button for that. If you want us to optimize a specific model for you, sure, we can do that, too. If you want to sit in the driver’s seat and play with the levers and dials (uh, definitely no coding skills required) like you’re Han Solo flying the Millennium Falcon by yourself, well, you can do that, too!

With every model, we’ll visually show you how the model performs and give you some easy-to-understand performance metrics. Unlike some of our competitors that overload you with a bunch of hyper technical jargon (hyperparametric tuning, F1 & F2 scores, blah blah blah), we’ll let you know what’s going on in plain English. We also know that you’re going to want to understand what’s driving the predictions, so we’ve included a lot of charts to help you understand which variables were most important in generating the predictions.

So let’s recap real quick: Automated machine learning? Check. Performance that’s easy to understand? Check. Explanations that tell you what variables were important in the model? Check. Fresh brewed coffee while the models are training? Well, that’s on you, sorry.

It’s been a long, strange journey, but you made it: The training data has been cleaned and optimized and the models have been trained. You’re ready to use the models to see into the future and predict the unknown! With our prediction engine, you can easily load new data into the models you’ve trained and get predictions fast.

But wait, there’s more! We’ve got a special treat for you if you train and save multiple models. We can take the predictions of every model and ensemble them together to create a prediction of predictions! Sure, we’ll give you a simple ensembled average (or mode depending on the problem type), but we’ll also give you a custom “model of models” ensembled result that generates a prediction based on the performance and interactions of all your saved models… it’s getting pretty meta in here, someone call in the philosophy majors.