Starting with all the goodness of the Explorer Version

Well of course the Explorer Pro version would include all of the fun found in the Explorer Version. If you forgot just how much fun we packed in there, go ahead and review it here.

Data is kind of like a diamond. When you first get it out of the ground, it’s dirty, it’s misshapen, and sometimes you’re not even sure if what you’re holding has any value in it. It’s not until you clean and refine it that it becomes something of value. We’ve added an entire cleaning section into the Explorer Pro version that has the right tools for you to take your dirty data and turn it into something shiny and useful. What’s in in that section, you ask?

Uploading Data

With the Explorer Pro, you get additional ways to upload your data! You can connect CitizenDS to your on prem SQL server, Cloud SQL server, or Microsoft Access database. You’ll also be allowed to upload up to 40 columns and 250K rows of data.

Duplicates

The name pretty much sums it up. We get find and get rid of duplicated data for you… and that’s all we have to say about that.

Scrubbing Text

Oh boy, oh boy, have we seen it all in our day: whether it is the accidental fat-finger that causes something to be misspelled, mixed-and-matched data (like the dreaded “first name last name” vs “last name, first name”), or just that one employee who legitimately can’t spell (yes, Wilson, we’re positive that ‘cat’ only has one ‘t’ in it). Our text scrubbing section helps you find and convert those errors so that you don’t have go line by line and manually convert things yourself. This isn’t just an ordinary spell-check system – oh no! It reads through your data and uses some brainpower to find issues.

Outliers

Remember that one time you were an intern and everyone reported their sales numbers in millions and you submitted your numbers in thousands and your business unit stuck out like a sore thumb? No??? Oh wait, that was one of us. But that kind of stuff happens! Whether it’s an accidental incorrect unit conversion, a fat finger, or some faulty test equipment reading, sometimes you’ll have some ridiculous numbers that just don’t make sense in your data. Our outliers section will help you find and fix those values.

Missing Values

There’s nothing quite like data that looks like Swiss cheese. You know what we’re talking about…all those holes from missing values. While we like may like Swiss cheese, sadly, the machines that learn don’t, and missing values can cause a whole host of problems when you go to train your models. Our missing values section will help you fill those holes with the best possible guess so your data looks more cheddar than it does Swiss.

Full Auto Cleanse

So you’re short on time or you just aren’t yet comfortable cleaning the data yourself. No problem. We’ve built in a one-click option to automatically run all of the cleansing options above for all of your variables. Click and poof! Your data is now cleaner than our office desks (yeesh, we need to do some cord management). If you only want one or two sections to be performed automatically, that’s also doable. We let you pick what sections you do and don’t want us to do. Think of it like your high school cafeteria…except for data…and without the terrible food.

Variable Specific Auto Cleanse

The last level of automation is for those data gurus who know exactly how their data should be handled and enjoy getting their hands dirty. Our software will let you tell us what you want done for all of your variables, and we’ll do it… but there may be times where you really aren’t sure what the best move is in a particular section for a particular variable, and that’s where this level of automation comes in handy. Let’s say you’re in deep in the outliers section and you’re cleaning each and every variable using your domain knowledge, but then you get to variable X, and you hit a roadblock. Oh gosh, is it an outlier at 17,532? No, it’s gotta be 18,493. Maybe 20,294? … why don’t you let us go ahead and take care of this one for you.

In addition to cleansing, we thought it’d be nice to add the ability to create and alter some of your existing data.

Raw data is often really hard to interpret unless you’ve got the magic decoder ring. For example, maybe you’ve got data encoded numerically (0, 1, 2, 3, etc.) that represents a category (no degree, high school degree, undergraduate degree, etc.), but it’d be far easier to interpret the data if it was explicit. Well, with our “Realias” section, you can make those changes.

Your raw data isn’t always going to have the right information or shape. Sometimes, you’ll need to combine and/or transform your data to get it into the optimal form. Have a column for house price and another column for square footage? Why not create a new variable for price PER square foot. Have a column for GDP where every number has far too many zeros to read? Why not divide by 1M and change the unit base. Our “New Measures” section will let you combine your data, perform mathematical operations on existing data, annnnd we’ve even thrown in some cool features like extracting information from dates like the day of the week.