One of my clients recently asked me if I could 'just simply add some column headings to the CSV'.
They were referring to a CSV file produced by the INTO OUTFILE feature of MySQL, and I naively wandered off to find the option to enabled that handy feature.
Alas, no option. No feature. No enabling. No headings. It turns out that lots of people (well, for small values of lots) ask how to do this periodically ... and the answer is seldom 'here, use this code'.
So ... here, use this code.
First, the caveats.
Obviously this is an 'as is' lump of code. If you can't understand what it's doing and the many, many ways that it may go horribly wrong, you probably shouldn't use it. This isn't stackoverflow.
Your SQL script needs to be stored in a file, and must meet these requirements:
- The SELECT and FROM must be on their own lines
- Every element you're selecting must be on its own line
- Every element you're selecting must have the AS sensible_name construct (the sensible_name is the csv heading)
And here's the code fragment.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 |
|
Explanation is provided in the code – it's not rocket surgery by any stretch, and it's easy to adjust to taste. Also I'm quite confident that filtering ugliness could be reduced in size, but I found it easier to debug and adjust in that format. There's little gain to be had from improving the performance of something like this, as you're typically dealing with a few hundred lines at most.
Obviously it's not immediately useful in this form. On my systems I've embedded this code in the cron script that does my extracts.
MySQL's INTO OUTFILE will choke if the OUTFILE already exists, so I produce both the data file, and this headings file, then cat them together into the actual output (and then bzip2 and biabam them off to the lucky winner).
A good-looking MySQL script will be formatted like this. Names have been changed to protect the crufty.
-- 2013-05-07
-- jedd
--
-- Present the ... data (from second tab on the WC web site) into
-- a format useful for ...
--
-- This will be accompanied by a vanilla dump (also in csv format) of ...
-- [ firstname , surname , postcode ] to uniquely identify a person.
-- Timestamp here bewilderingly isn't an actual timestamp, but a bigint,
-- so we convert on the fly with : from_unixtime(preferenceTimestamp/1000)
SELECT
person.id AS person_id,
PreferentialOption.id AS po_id,
PreferentialOption.reference AS po_reference,
-- PreferentialOption.feedbackTimestamp AS po_when_raw,
from_unixtime(PreferentialOption.feedbackTimestamp/1000) AS po_when_actual,
SecondTable.firstname AS firstname,
SecondTable.surname AS surname,
SecondTable.postcode AS address_postcode
FROM
...
The above will generate output that looks like this:
"person_id","po_id","po_reference","po_when_actual","firstname","surname","address_postcode"
Bish bash bosh. Job done.