First l would like to I am not the one who wrote this Article, I copied it from Zapier. I found it on my favorite App Zapier.com I love the App, I use it daily and I highly recommend it to anyone who wants to work less and make more money than Bill Gates. How to Automatically Clean Up Spreadsheet Data with OpenRefine! Ever had to manually edit dusty, messy, years-old information from some obsolete software? I once worked for a company that stored paperwork offsite for 60 years. Materials were indexed in a document table. Most records had a box number, storage date, storage vendor receipt number, and a rough idea of the contents. Most, mind you. Over 60 years the list got … messy. Storage contracts changed several times—so the box codes and vendor receipts varied over time. Add in the random mistakes that added up over time, and you had quite a mess. My job was transferring everything to yet another contractor—which meant cleaning up thousands of records to play nice with the new vendor’s fancy online inventory. It was quite the chore—a chore many of us face when trying to organize data. The good news is, if you can get your messy data into a spreadsheet, you can clean up and reformat it. My favourite tool for this is called OpenRefine, and its specialty is “reconciling” or “normalizing”—making it easy to find typos, variations on phrases, formatting errors, extra spaces, and other things that are hard to spot in rows upon rows of information. What is OpenRefine? OpenRefine bills itself, simply, as “a powerful tool for working with messy data.” Originally released 2010 as “Freebase Gridworks,” it was later called “Google Refine” after being acquired by the search giant. Today it’s a community-run, open-source project to, well, refine your data. To you, this could mean a number of things. You sales team could want to export old store data, reorganize it, and import it into a new eCommerce app. Your accounting staff might have legacy data floating around from years ago. Your PR staff could have multiple email lists from campaigns past you want to merge, modify, or de-duplicate. Maybe your survey results are messy, your app exports are confusing, or your analytics data needs combined from multiple sources. OpenRefine was built especially with those types of bulk operations in mind. It may just be what you need to finally finish that data project you’ve been putting off. Getting Started With OpenRefine Getting started is easy. Just download OpenRefine—it works on Windows, Mac, and Linux—and start the program. It’ll open up a browser tab that looks much like other Google Apps, and will ask you to create a project, or open a project you’ve already started. You’ll need some data for OpenRefine to work with—and it open any data in a spreadsheet format: CSV, XLS, or even a Google Sheets spreadsheet online. It can also take XML and JSON files, if that’s your jam. OpenRefine can directly import your spreadsheet files from the web Let’s start a new project. This exercise is going to use a set of publicly available data from the Government of Ontario—which, like much public data, is a bit messy. Let’s go with a subject near and dear to my heart: Beer. Copy the link to the XLSX file, which includes details about Ontario microbrewers and brands. Switch to your OpenRefine tab, start a new project, select the Web Addressoption, and paste in your spreadsheet link. As soon as you input a dataset, OpenRefine generates a preview to ensure it’s displayed properly. You can do some preliminary cleanup—remove empty rows, set the first row as a header with column names, or convert columns into specific data types (dates, integers, and so on). Click “Create Project” when you’ve made sure the data is displaying correctly, and you’ll be brought to the screen where all the magic happens. The first thing you’ll notice is that OpenRefine doesn’t display your data like a spreadsheet with a long list of rows. Instead, it shows a maximum of 50 rows at a time, essentially just enough of a preview for you to think about what you’re working with. You can page through your data if you need to, but I think you’ll soon get comfortable with being less overwhelmed. Clean Up Data with OpenRefine Facets The first step is to learn about facets. These show precisely which values are used in a column, so you can find typos or variations in things that are supposed to be identical. Let’s start with the manufacturer’s name. Click the dropdown button next to the header, select Facet, then Text Facet. You’ll be presented with a column like this, showing a count of the times each item appears in the dataset: We can see, for example, that Big Rig Brewery has 13 different beers; Big Rock Brewery, 6 different beers. We can already see some messy data here—“Black Swan Brewing Company” and “BLACK SWAN BREWING COMPANY INC.” are the same company, but with slightly different names in this spreadsheet. To fix this, hover your mouse over the name you want to change, click “edit,” and type in the new name. Click Apply and it automatically edits all the matching entries in the dataset. Let’s speed up the process by automatically identifying all of the facets that are similar and merging them—without any typing—by clustering the data. Click the Cluster button at the top of the facet display, and you’ll see all of the similar entries identified by OpenRefine: For some of these, it’s just an extra space (as at the end of “Square Timber Brewing Company”) or an extra comma (as in Blood Brothers Brewing), or liberal use of capslock. As you can see in the “Bevin Palmateer” entry, OpenRefine also identifies words that are out of order. Check the Merge boxes for anything you want to fix. If you don’t like the suggested new value—for example, the capitalized name suggested for NITA BEER—you can just click the lowercase option and it will change that field. If you don’t like any of the options, just type in your preferred name. Click Merge Selected & Re-Cluster to do another check. When the check finds no results, try another clustering method to look for more (you should find “Walkervile” and “Walkerville”). It’s data-mining, but you don’t have to learn advanced data-mining theory to get results: Just click through all the options. You’ll start to see false positives (for example, “Bell City” isn’t “River City”), which you can just ignore. There are also some common transform tools you can use to clean stuff up, like eliminating all the spaces before and after text. Let’s also get rid of all the uppercase brewery names by transforming the whole column to Titlecase. Click again on the dropdown menu for the column, go to Edit cells, and read through all the possibilities. Categorize Data Automatically in OpenRefine The next step is to do clever things with all this data. Let’s pretend these beers are our product data, and we want to add categories of beer to our catalogue. We don’t want to manually label each entry, so let’s save some time by identifying beer types from the beers’ names. We can do a quick check for one type of beer using a Custom Text Facet. We’ll look for all cell values that contain “Porter” (this is also case-sensitive, but now that we’ve put everything in titlecase, the capital P should catch everything). A Custom Text Facet on the Manufacturer’s Brand column brings up this window, into which we enter a filter: value.contains(“Porter”) This function returns true and false—and true here means 25 beers are porters in the list. (There are also 79 breweries without any actual beers available—the (blank) category—but let’s ignore that for now.) These filters are great when you want to manipulate a subset of your spreadsheet without having to delete the rest, or keep your focus rows selected. You can apply a filter, do a bunch of operations, and then remove it later. OpenRefine even includes some common recipes to format data, such as standarizing date formats or transforming “Firstname Lastname” into “Lastname, Firstname.” Let’s use that to transform our data into something useful. We’ll add a new column based on the “Manufacturer’s Brand” column, using text analysis to guess what type of beer it is. It won’t work on all entries, but for beers that have “IPA”, “lager”, “stout”, “lime”, “red”, “wheat”, and so on right in their name, we’ll have some success. As with all bulk data work, sometimes mistakes happen. For example, there’s one beer in this list named “More Portly Than Stout Porter.” If we search for “Stout,” we’ll get a false positive. Keep that in mind, and always set aside time for quality-control! Start by clicking on “Manufacturer’s Brand.” Select Edit Column then choose Create column based on this column. To look for “lager” and replace the entirety of the Beer types value with “lager” where applicable, we use an if statement: if(value.contains(“Lager”),”lager”,value) If statements here are straightforward: If the first part is true, transform the whole value to “lager;” otherwise, replace the cell value with itself (or, do nothing). If we want to categorize a big set of beer types at once, we nest a series of if statements inside each other. It looks a bit silly, but gets the job done: if(value.contains(“Lager”),”Lager”,if(value.contains(“IPA”),”IPA”,if(value.contains(“Wheat”),”Wheat”,if(value.contains(“Pilsner”),”Pilsner”,if(value.contains(“Brown”),”Brown”,if(value.contains(“Kolsch”),”Kolsch”,if(value.contains(“Light”),”Light”,if(value.contains(“Red”),”Red”,if(value.contains(“English”),”English”,if(value.contains(“Stout”),”Stout”,if(value.contains(“Porter”),”Porter”,value))))))))))) Essentially, if “Lager” wasn’t found, then try “IPA,” then try “Wheat,” then try “Pilsner,” etc., etc. It’s not standard programming syntax, but gets the job done. Apply that transformation, then check the facets of the column to see our progress. While we’re at it, let’s clean the results up. Reconcile “I.P.A.” and “India Pale Ale” to “IPA” with the steps you learned above. Also keep in mind that the operations work in order: You’ll want to convert “India Pale Ale” before you reformat “Pale Ale.” Because these transformations are also case-sensitive, transforming to lowercase “India pale ale” would also protect your work when you search for “Pale Ale” later on. With a bit of categorizing, we can start to see the spread of beer types in Ontario. (Try them all today!) This is definitely faster than labelling them all by hand, and it should give you an idea how to make OpenRefine filters work for you. Replace the Image Below If this was a product list for our online store, we’d want to export our cleaned-up and value-added spreadsheet from OpenRefine and import it into our eCommerce store. The Export button’s your friend. You can export your data as a spreadsheet with a range of options and data forms. You can also upload the data directly to a new Google Sheets spreadsheet or Google Fusion table. Do More with OpenRefine There are a few other useful OpenRefine tools. The Undo/Redo option gives you detailed information about all your activities instead of just undoing your mistakes—which is super helpful in learning how to get more out of OpenRefine. Also remember: OpenRefine is designed around databases so you can use its records and rows seperately to organize your data. Now it’s your turn to try it out. Have messy data from an app export, or an old spreadsheet full of confusing data? One great way is to use OpenRefine to organize your contacts: Find typos and formatting errors in email addresses, phone numbers, or company names before importing the data into a new app. I’ve used it to reformat old Mailchimp data when we changed the designs of our signup forms—super handy. Don’t spend hours formatting your data again. OpenRefine can do it for you in minutes. “Zapier helps me build processes and automation into my business like a programmer without having to learn to code.” Lawrence Watkins, co-founder of Great Black Speakers About the Author Allana Mayer is an archivist, writer, and media manager in the Toronto area.
The Little “Secret” of Process Color at Press: Printing to Gray Balance Printing to Gray Balance, written by Dan Remaley, Senior Technical Consultant in Process Controls for PIA/GATF. He invites anyone who would like further information to contact him at 412-259-1814 (o), 412-889-7643 (c), or dremaley@piagatf.org
The idea has been gaining traction for months as some early adapting publishers have touted audience engagement as a powerful tool for selling ads. Recent developments — including advertisers’ growing disdain for banner ads that are never seen or clicked on — suggest the idea may be catching on at an accelerating pace. Last month, Google issued a report that said 56% of ads it serves aren’t “viewable,” a term that suggests ads are too far down on the site or that readers aren’t scrolling down far enough. The result reinforced the notion that display ads are deeply flawed, says Jonah Goodhart, CEO of ad tech firm Moat. (USA TODAY is a client of Moat.) “Advertisers want the time and attention of the right audience to get the right message across,” says Tony Haile, CEO of web traction research firm Chartbeat, who’s been one of the loudest proponents of the quality-over-quantity approach. “To date, they’ve been given proxies like page views and click-through rates. We should value and deliver exactly what they want: (readers’) time and attention.” SEEKING SUPER JOURNALISTS This month, Gawker– known for aggregating and respinning highly viral stories with its own sharp voice – told readers its front page will be updated less often to give selected stories longer shelf life on its most prominent piece of real estate. Borrowing a retro play from newspapers, Gawker said its main page will be used more like a front page, displaying what its editors consider to be the best reported or simply most engaging stories — the ones often lost in the shuffle because of a constant flow of new posts. In the new arrangement, the highlighted front-page stories will be accompanied by larger images and headlines. Other stories will be placed on “sub-blogs” that pertain to the story topic – a tech story on Valleywag.com, for example. “Traffic will take a hit,” Gawker editor in chief Max Read says. “Page views is one measure, but it doesn’t tell the whole story.” Click Here to read the rest of the story
The purpose of a coating is to protect the printed piece from dirt, smudges, fingerprints, scratching, etc. Coating also provides scuff resistance. And, yes, it can improve the visual appeal of the piece by providing a glossier and smoother finish. Protective coating provides surface protection for that postcard you did for your top account to arrive looking the way it looked when it was mailed. Printers who are aware that the postal service delivers a licking that keeps on ticking to printed pieces – and use that info to sell against unaware competitors – have an advantage.
Paper bags are one of the most utilitarian items in our culture. They house lunches, groceries, and the goodies we purchase at retail stores. Before there were paper bags (or sacks if you live in the mid-west), one either brought their own container, usually a basket, or a shop owner rolled and glued some paper, known as a cornucopia, to hold one’s purchase.
CDs are made by being pressed from a mold, while CD-Rs and CD-RWs are made by burning the information to the disc with a laser. Though they are not physically identical, they work just the same. However, you cannot record to pressed CDs, only to CD-R/RW discs.
iPad Mini Won’t Have Retina Display Manufacturers in China have begun assembling a new, smaller tablet computer for Apple, the Wall Street Journal reported Wednesday. Unnamed executives from component suppliers said that production of the tablet is under way. Production typically begins several weeks before Apple brings its new devices to market.
Apple introduced iOS 6 along with the new phone last month. And the hardware understandably captured most of the hubbub and fan chatter, at least on the day of the phone’s announcement. Casual fans who weren’t paying close attention might have missed the fact that the new software also would be available to older phones even before the launch sale date of iPhone 5.
128,000 Dominoes – Falling into past – a journey around the world (2 Guinness World Records) Sometimes we look at a project and wonder if it can ever be completed on time and within budget. This video is a great example that anything can be done with a great team effort.
Facebook users download free apps like kids used to collect baseball trading cards. But in raking in dozens — if not hundreds — of apps, some consumers may expose themselves to privacy risks. (This entails apps on smartphones and tablets, and those that only run on Facebook. It also includes websites that have Facebook Connect included.)