Appengine datastore uses too much CPU

Google App Engine is supposed to be miracle solution for everything. You can build as scalable solutions as you want to and it is almost free. After investigating technology more I am not so thrilled anymore. I was working on importing ip geolocation data to datastore and it was really tricky. Inserting one ip range that includes long fromip, long toip and string countrycode takes around 40ms and serving full request uses something like 700ms of CPU.

IP geolocation database has around 100k ip ranges and CSV file is 7MB. In separate server i used unix command split to split that CSV file to smaller files with 500 lines each. Now appengine uses urlfetch to download that file, parses this and enters all ip ranges with pm.makePersistentAll() . This processing takes around 4s and uses 20-40s of CPU. To import full DB it takes 200x30s that 1,5h of CPU whilst you only get 6,5h free per day. And thats not all. Superscalable appengine should be able to process all these batches simultaneously but it could not saying:

Request was aborted after waiting too long to attempt to service your request. Most likely, this indicates that you have reached your simultaneous dynamic request limit. This is almost always due to excessively high latency in your app. Please see for more details.

Its impossible to process all ip ranges at once. Instead task had to be separated into many smaller tasks and asynchronous task queue used. When trying to process all at once you get most severel critical error:

com.sun.faces.context.ExceptionHandlerImpl log: JSF1073: javax.faces.event.AbortProcessingException caught during processing of INVOKE_APPLICATION 5 : UIComponent-ClientId=j_idt4:j_idt12, This request (efab013ed2a2d05f) started at 2010/02/06 20:06:34.903 UTC and was still executing at 2010/02/06 20:07:03.599 UTC.

Its sayd that reading data in BigTable is much faster but i cant confirm that either. There are bit over 200 countries and loading and displaying them all takes some 3000ms on CPU.

For comparison in local dev server persisting that batch of 500 ip ranges takes in order of 10x less time and recource.

Leave a Reply

Your email address will not be published.