Thursday, June 11, 2009

MS Word processing or parsing using ruby and Apache POI

I am looking for MS word document processing to get text information using ruby. But could not find right gems in ruby. Some win32 ole libraries are there but will not work in linux. So thought about using Poi java api to process word document.

I hv downloded poi jar files from here.
http://www.apache.org/dyn/closer.cgi/poi/release/

changed the name of the directory to poi folder .

Check the following ruby script to get the parsed text .. it uses java interface code in WordSampleReader.java   this file should be compiled and should be available under poi folder where all jar files are there.

CONFIG = {}
class WordReader
  #include Config
  CONFIG['host'] = 'mswin32'
  def self.generate_text(filename)
   
    interface_classpath=Dir.getwd+"/poi"
  
    case CONFIG['host']
      when /mswin32/
        Dir.foreach("poi") do |file|
          interface_classpath << ";#{Dir.getwd}/poi/"+file if (file != '.' and file != '..' and file.match(/.jar/))
        end
        path = "java -cp \"#{interface_classpath}\" WordSampleReader "+filename
      else
        Dir.foreach(Dir.getwd+"/poi/") do |file|
          interface_classpath << ":#{Dir.getwd}/poi/"+file if (file != '.' and file != '..' and file.match(/.jar/)) 
        end
        path = "java -cp \"#{interface_classpath}\" WordSampleReader "+filename
      end
    result = ""
    IO.popen(path, "w+b" ) { |x| result= x.read }
    result
  end
end

puts reader = WordReader.generate_text('poi/test.doc')


Though i dont know java used sample code found in google..  The file called WordSampleReader.java.

The source code file is ...

//package com.informit.poi;

// Import POI classes
import org.apache.poi.poifs.filesystem.*;
import org.apache.poi.hwpf.*;
import org.apache.poi.hwpf.extractor.*;

// Import Java classes
import java.io.*;
import java.util.*;

public class WordSampleReader
{
  public static void main( String[] args )
  {
   if( args.length == 0 )
   {
     System.out.println( "Usage: WordSampleReader " );
     System.exit( 0 );
   }

   String filename = args[ 0 ];
   try
   {
     // Create a POI File System object; this is the main class for the POIFS file system
     // and it manages the entire lifecycle of the file system
     POIFSFileSystem fs = new POIFSFileSystem( new FileInputStream( filename ) );

     // Create a document for this file
     HWPFDocument doc = new HWPFDocument( fs );

     // Create a WordExtractor to read the text of the word document
     WordExtractor we = new WordExtractor( doc );
    
     // Extract all paragraphs in the document as strings
     String[] paragraphs = we.getParagraphText();

     // Output the document
     //System.out.println( "Word Document has " + paragraphs.length + " paragraphs" );
    // for( int i=0; i
     //{
      //System.out.println( paragraphs[ i ] );
     //}
    //  output text
      System.out.println( we.getText() );

   }
   catch( Exception e )
   {
     e.printStackTrace();
   }
  }
}

Monday, May 11, 2009

QuickBase autheticate ruby method bug in quickbase client library

For subsequent calls with the help of ruby quickbase api , it should consider the old token number.

Existing method :
def authenticate( username, password, hours = nil )

      @username, @password, @hours = username, password, hours

      if username and password

         @ticket = nil
         xmlRequestData = toXML( :hours, @hours ) if @hours
         sendRequest( :authenticate )
         @userid = getResponseValue( :userid )
         return self if @chainAPIcalls
         return @ticket, @userid

      elsif username or password
         raise "authenticate: missing username or password"
      elsif @ticket
         raise "authenticate: #{username} is already authenticated"
      end
   end

The above one missing token variable and which is very important for next or immediate calls.
so add this line in the above method.
@ticket = getResponseValue( :ticket )

Saturday, April 11, 2009

Import Gmail Contacts - Ruby on Rails

Tried with my own ruby script to get contact information from google but requires some more extra work to make request in secure way.

First, you need to register your domain with Google. This can be as easy as uploading a temporary file to your domain’s root directory to verify control of the domain. The steps for doing this are clearly outlined at Registration for Web-Based Applications.  

I find good  article to import google contact information. Use the same code but check the code for error free.. you should have same patience to make it work :-)

http://blog.guitarati.com/2008/08/google-accounts-authentication-using.html


 

Wednesday, December 17, 2008

nil? problem on object methods in ruby use Object#try

Object#try is a solution that I like. However, it uses Object#respond_to?
Ruby Community uses this solution recently in github.
class Object
##
# @person ? @person.name : nil
# vs
# @person.try(:name)
def try(method)
send method if respond_to? method
end
end

But it does not solve the problem of args and blocks.
The right solution for generic try method is :
#By Jagan Reddy

class Object

def try(method, *args, &block)

send(method, *args, &block) if respond_to?(method, true)

end
end

difference between lambda and Proc.new (Closures)

The #lambda is not the same as Proc#new. It’s ok to say the same, but not exactly.
1: First Difference
$ irb irb(main):001:0>
Proc.new{|x,y| }.call 1,2,3
=>
nil


irb(main):002:0>
lambda {|x,y| }.call 1,2,3

ArgumentError: wrong number of arguments (3 for 2) from (irb):2 from (irb):2:in `call' from (irb):2 from :0
irb(main):003:0>

2: Second Difference
def test_ret_procnew
ret = Proc.new { return 'Returned' }
ret.call “This is not reached”
end
# prints 'Returned'
puts test_ret_procnew


While return from lambda acts more conventionally, returning to its caller:

def test_ret_lambda
ret = lambda { return “Returned” }
ret.call
“This is printed”
end
# prints “This is printed”

puts test_ret_lambda

Friday, December 12, 2008

Sass & Haml AutoCompile option

If you are using Sass & Haml, then you may wanted to compile files for each request =>

Just add the following small chunk of  code to your development.rb file.

Sass::Plugin.options[:always_update] = true
 
 

Wednesday, December 10, 2008

Ruby language features

When started learning ruby in 2004, api helped me to understand the features.
Though I used most of rugular expressions and classes, rails gave me enough experience to make use of
many ruby unique features like jayson expalined in the following article.

http://www.rubytips.org/2008/04/07/10-unique-ruby-language-features/

Also look athe Features of Ruby by Michael Neumann

http://www.ntecs.de/old-hp/s-direktnet/ruby_en.html