ruby on rails - How to edit docx with nokogiri and rubyzip -
i'm using combination of rubyzip , nokogiri edit .docx file. i'm using rubyzip unzip .docx file , using nokogiri parse , change body of word/document.xml file ever time close rubyzip @ end corrupts file , can't open or repair it. unzip .docx file on desktop , check word/document.xml file , content updated changed other files messed up. me issue? here code:
require 'rubygems' require 'zip/zip' require 'nokogiri' zip = zip::zipfile.open("test.docx") doc = zip.find_entry("word/document.xml") xml = nokogiri::xml.parse(doc.get_input_stream) wt = xml.root.xpath("//w:t", {"w" => "http://schemas.openxmlformats.org/wordprocessingml/2006/main"}).first wt.content = "new text" zip.get_output_stream("word/document.xml") {|f| f << xml.to_s} zip.close
i ran same corruption problem rubyzip last night. solved copying new zip file, replacing files necessary.
here's working proof of concept:
#!/usr/bin/env ruby require 'rubygems' require 'zip/zip' # rubyzip gem require 'nokogiri' class wordxmlfile def self.open(path, &block) self.new(path, &block) end def initialize(path, &block) @replace = {} if block_given? @zip = zip::zipfile.open(path) yield(self) @zip.close else @zip = zip::zipfile.open(path) end end def merge(rec) xml = @zip.read("word/document.xml") doc = nokogiri::xml(xml) {|x| x.noent} (doc/"//w:fldsimple").each |field| if field.attributes['instr'].value =~ /mergefield (\s+)/ text_node = (field/".//w:t").first if text_node text_node.inner_html = rec[$1].to_s else puts "no text node #{$1}" end end end @replace["word/document.xml"] = doc.serialize :save_with => 0 end def save(path) zip::zipfile.open(path, zip::zipfile::create) |out| @zip.each |entry| out.get_output_stream(entry.name) |o| if @replace[entry.name] o.write(@replace[entry.name]) else o.write(@zip.read(entry.name)) end end end end @zip.close end end if __file__ == $0 file = argv[0] out_file = argv[1] || file.sub(/\.docx/, ' merged.docx') w = wordxmlfile.open(file) w.force_settings w.merge('first_name' => 'eric', 'last_name' => 'mason') w.save(out_file) end
Comments
Post a Comment