Page 1 of 2

Configuration settings for large catalog

Posted: Wed Mar 13, 2013 6:14 am
by fdboles
I am attempting to build a Proof Of Concept solution in Broadleaf using the demo site and importing a large catalog sample. I am importing using an ETL to insert rows into BLC_CATEGORY, BLC_CATEGORY_XREF, BLC_PRODUCT, BLC_SKU AND BLC_CATEGORY_PRODUCT_XREF. This is a very basic import with a single SKU per product and name, description and long description.

Initial attempts at > 20,000 products results in a long running process (I believe this to be Solr indexing) that results in several 'GC overhead limit exceeded' exceptions followed eventually by a complete system hang. I have completed the import of < 5,000 products using the demo configuration with increased Xmx and XX:MaxPermSize values. What would be a suitable configuration for a larger (100,000+) product catalog?

Thank you.

Re: Configuration settings for large catalog

Posted: Thu Mar 14, 2013 6:06 am
by RapidTransit
What database are you using? Also are you using the built in auto-create-drop property?

Reason I ask is I'm no Java expert (Actually 2 weeks experience) but, Broadleaf utilizes an in memory database if I'm not mistaken for its Demo and I know in other languages there's usually 2 ways to read a file, load it all into memory or load it partially to memory, I assume its the former because the demo data isn't to large.

Re: Configuration settings for large catalog

Posted: Thu Mar 14, 2013 9:31 am
by fdboles
Thanks for the suggestion. I am using the out-of-the-box development configuration. The HSQLDB used by default can be in memory or on disk. Not sure which is configured but I can easily add 500,000+ products as long as I'm not setting any SKU data. After setting the SKU data, I receive the GC exceptions noted. The stack trace indicates that the issue is in the index rebuild scheduled task (Solr?). I don't find anything regarding Solr configuration options in the BL documents. I will be switching the product database to MySql to eliminate this from concern. Can anyone point me in the right direction regarding Solr config?

Thanks

Re: Configuration settings for large catalog

Posted: Thu Mar 14, 2013 10:37 am
by phillipuniverse
Solr configuration docs are located here: http://docs.broadleafcommerce.org/curre ... earch.html

Re: Configuration settings for large catalog

Posted: Tue Mar 19, 2013 2:39 pm
by fdboles
After setting up a stand alone instance of Solr I am able to import 1.2M+ products - very nice. However, the Solr indexing doesn't seem to be updating any more. I have the Broadleaf instance configured to up date every 10 minutes. Prior to the product import, the indexes appear to be updating on schedule with a new version number every 10 minutes and the last modified updates accordingly. After completing the import, though, the version number remains unchanged and the last modified ages well past 10 minutes. Is there any way to determine if this is a Broadleaf or Solr issue?

Re: Configuration settings for large catalog

Posted: Tue Mar 19, 2013 3:46 pm
by fdboles
Noticed the following exception in the Tomcat log. Suggestions for configuration to address this exception?

[ERROR] 20:30:19 JobRunShell - Job DEFAULT.rebuildIndexJobDetail threw an unhandled Exception:
org.springframework.scheduling.quartz.JobMethodInvocationFailedException: Invocation of method 'rebuildIndex' on target class [class $Proxy115] failed; nested exception is java.lang.OutOfMemoryError: GC overhead limit exceeded
at org.springframework.scheduling.quartz.MethodInvokingJobDetailFactoryBean$MethodInvokingJob.executeInternal(MethodInvokingJobDetailFactoryBean.java:320)
at org.springframework.scheduling.quartz.QuartzJobBean.execute(QuartzJobBean.java:113)
at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:525)
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
at org.hibernate.type.descriptor.sql.VarcharTypeDescriptor.getExtractor(VarcharTypeDescriptor.java:58)
at org.hibernate.type.AbstractStandardBasicType.nullSafeGet(AbstractStandardBasicType.java:254)
at org.hibernate.type.AbstractStandardBasicType.nullSafeGet(AbstractStandardBasicType.java:250)
at org.hibernate.type.AbstractStandardBasicType.nullSafeGet(AbstractStandardBasicType.java:230)
at org.hibernate.type.AbstractStandardBasicType.hydrate(AbstractStandardBasicType.java:331)
at org.hibernate.persister.entity.AbstractEntityPersister.hydrate(AbstractEntityPersister.java:2283)
at org.hibernate.loader.Loader.loadFromResultSet(Loader.java:1527)

Re: Configuration settings for large catalog

Posted: Tue Mar 19, 2013 5:15 pm
by phillipuniverse
How much memory are you giving to Tomcat (the -Xmx jvm flag)? This is the configuration we have for the Jetty maven plugin from build.xml:

Code: Select all

<target name="jetty-demo-no-db">
        <delete dir="war/WEB-INF/lib"/>
        <artifact:mvn mavenHome="${maven.home}" fork="true">
              <jvmarg value="-XX:MaxPermSize=256M" />
              <jvmarg value="-Xmx512M" />
              <jvmarg value="-Xdebug" />
              <jvmarg value="-Xrunjdwp:transport=dt_socket,address=8000,server=y,suspend=n" />
              <arg value="compile"/>
              <arg value="war:exploded"/>
              <arg value="jetty:run"/>
        </artifact:mvn>
 </target>


This is equivalent to setting the following in bin/setenv.sh in your Tomcat deployment folder:

Code: Select all

export CATALINA_OPTS="-XX:MaxPermSize=256M -Xmx512M -Xdebug -Xrunjdwp:transport=dt_socket,address-8000,server=y,suspend=n"


The -Xmx512M sets the max heap size of the JVM to 512MB. You could try increasing this value to 1024M.

Re: Configuration settings for large catalog

Posted: Wed Mar 20, 2013 10:45 am
by fdboles
Thank you for the additional info. I have now increased heap size to 1G and it took longer to reach this exception. I increased further to 6G and increased max perm size to 1G. The exception did not occur after two hours of processing, but the index build still did not complete. I realized that I am using a config with a 10 minute index refresh time. Now I believe that the issue is that multiple index rebuilds are running over time leading to overload. Is there a guideline for how much time to allow between refreshing based on catalog size?

Re: Configuration settings for large catalog

Posted: Wed Mar 20, 2013 12:05 pm
by phillipuniverse
What version of Broadleaf are you on? We have a change in that SolrSearchServiceImpl that attempts to pull back paginated results from the database to avoid the problem that you are having. It's possible that we did not backport this change to your version.

Re: Configuration settings for large catalog

Posted: Wed Mar 20, 2013 8:43 pm
by fdboles
I downloaded the Eclipse Workspace package. POM states version 2.0.2-GA. It appears that this is still the current Eclipse Workspace download is still this version. I see that GitHub has version 2.2.0-GA. I have not attempted to work with the GitHub version yet. I can try cloning the GitHub and building a configuration based on this.