Hi there
Last week we were given a very big file to translate: 18K lines, 1.2 million words. It was in Excel 2007 (xlsx) format. When we tried to open it with the Tag Editor, it just stayed there for ages and it never seemed to end. At least we didn't have the patience to wait. So we decided to split it into manageable chunks. "What a drag!" I hear you say? Not at all.
First we saved it as a simple csv. Then we used the unix command "split" to ... well ... to split it into files with a fixed number of lines. It can be done in other ways, just type "man split" in a terminal.
Code:
pabloa$ split -d -l 1234 big_file.csv small_file
pabloa$ wc -l small_file*
1234 small_file00
1234 small_file01
1234 small_file02
1234 small_file03
1234 small_file04
1234 small_file05
1234 small_file06
1234 small_file07
1234 small_file08
1234 small_file09
1234 small_file10
1234 small_file11
1234 small_file12
1234 small_file13
1139 small_file14
18415 total
The -d is used to get numbers instead of letters as suffixes. Otherwise it creates files called small_fileaa, small_fileab, etc. The last word is the prefix we want the files to have. The last file gets the remainder of the lines, as it can be seen in this case.
And then, to convert them back to xlsx, I created a macro in Excel with the output of "for x in small_file*; do csv2xlsx $x; done" where "csv2xlsx" is the following script:
Code:
#!/bin/bash
source=$(pwd_windows)"$1"
target=$(pwd_windows)"$1.xlsx"
echo " Workbooks.Open Filename:=\"$source\""
echo " ActiveWorkbook.SaveAs Filename:=\"$target\" _"
echo " , FileFormat:=xlCSV, CreateBackup:=False"
Here "pwd_windows" is our old friend, described in a previous post: http://www.english-spanish-translato...ows-style.html
Cheers.
P.