|
Hi,all, I want save a state (with 10,000,000 people)'s EHR(electronic health record) into exist-db,that's mean there are large number of xml file to manage .so I want test the existdb's performance on managing large number of small file(1-50kb). I wonder if there are sb having this experience for share? or good idea? problems: more than 100000000 files to manage. query performance? versioning one's EHR file or update the EHR ? how about the update performance? index: how to index persion info (name,id,address,tel,etc);how to index ehr section(diagnosis,drug),reindex the whole collection is impossible now ,based on my testing. backup: I plan to use linux brtfs to save data,use its snapshot to backup,it's OK? thxs. easy 网易Lofter,专注兴趣,分享创作! ------------------------------------------------------------------------------ For Developers, A Lot Can Happen In A Second. Boundary is the first to Know...and Tell You. Monitor Your Applications in Ultra-Fine Resolution. Try it FREE! http://p.sf.net/sfu/Boundary-d2dvs2 _______________________________________________ Exist-open mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/exist-open |
|
This post was updated on .
I will do the test step by step ,reported for all:
!1) plan to test on exist-db 1.4.2 and 2.1-dev from branch, use ubuntu 11, 16G ram ,x64 4 core hardware. conf.xml: changed: --------------------------- <db-connection cacheSize="888M" collectionCache="256M" database="native_cluster"(2.1-dev is native) files="e:\eXist\webapp\WEB-INF\data" pageSize="4096" nodesBuffer="2000" doc-ids="default"> -------------------------------- start server set -Xmx2048m ---->(should increase for 16G ram?) ------------------------------- 2) get the test data from other xml dabase website: basex.org,which report has done more than 1 billion file. BaseX has a statistic of usr using : http://docs.basex.org/wiki/Statistics. btw,exist-db has some data of this? I plan use this data to test: http://www.mpi-inf.mpg.de/departments/d5/software/inex. Statistics ■50.7 GiB uncompressed size ■2,666,190 articles I still use exist-db ,because when I load the data into basex, loading quickly, but quickly it's consume out of memory and loading break. maybe I have some mistake,but I still like exist-db,maybe I know exist-db firstly,and used it 5 years ago. p:) |
|
hi,
On Sunday, April 22, 2012 at 10:28 , lin_xd wrote:
As valid for any database software, you 'll need to tune eXist-db ; what is your strategy? cheers Dannes -- Dannes Wessels eXist-db Open Source Native XML Database w: http://www.exist-db.org ------------------------------------------------------------------------------ For Developers, A Lot Can Happen In A Second. Boundary is the first to Know...and Tell You. Monitor Your Applications in Ultra-Fine Resolution. Try it FREE! http://p.sf.net/sfu/Boundary-d2dvs2 _______________________________________________ Exist-open mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/exist-open |
|
In reply to this post by lin_xd
Load files by java client in embedded mode - the fastest way to do.
On Sun, Apr 22, 2012 at 1:28 PM, lin_xd <[hidden email]> wrote: -- I will do the test step by step ,reported for all: Dmitriy Shabanov ------------------------------------------------------------------------------ For Developers, A Lot Can Happen In A Second. Boundary is the first to Know...and Tell You. Monitor Your Applications in Ultra-Fine Resolution. Try it FREE! http://p.sf.net/sfu/Boundary-d2dvs2 _______________________________________________ Exist-open mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/exist-open |
|
In reply to this post by Dannes Wessels-3
I will do some test and try first, do some basic config change ,such as jvm ,cache, index mask, collection structure,etc, any sugguests of you? -- 此致 easy At 2012-04-22 20:47:33,"Dannes Wessels-3 [via eXist]" <[hidden email]> wrote:
网易Lofter,专注兴趣,分享创作! |
|
> I will do some test and try first, do some basic config change ,such as jvm
> ,cache, index mask, collection structure,etc, > any sugguests of you? I used half of the 50gb data set you mentioned to test the performance of my recent changes to the lucene indexing. Uploading and querying does work with a few tweaks: Most important, the size calculation for the collection cache has a bug, so to store and query a larger number of documents and collections, you need to specify a high value for collection cache in conf.xml. I used collectionCache="10000M". In reality it consumes a lot less, but the higher value is needed to work around the bug. I'll try to fix the issue. Also, I set cacheSize="1024M" because I wanted to create a bunch of indexes. -Xmx was set to 8192m. Wolfgang ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Exist-open mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/exist-open |
|
In reply to this post by lin_xd
I have successfully stored over 16,000,000 XML documents in eXist-db
in the past, these were all small between 1KB and 4KB each. Querying was still possible, but managing the database with the Java admin client was difficult, however some improvements have been made since then and there are other ways to manage the database. But I suggest you really do your own testing... 2012/4/22 easy <[hidden email]>: > Hi,all, > I want save a state (with 10,000,000 people)'s EHR(electronic health > record) into exist-db,that's mean there are large number of xml file to > manage .so I want test the existdb's performance on managing large number of > small file(1-50kb). > I wonder if there are sb having this experience for share? or good idea? > > problems: > more than 100000000 files to manage. > query performance? > versioning one's EHR file or update the EHR ? how about the update > performance? > index: how to index persion info (name,id,address,tel,etc);how to index > ehr section(diagnosis,drug),reindex the whole collection is impossible now > ,based on my testing. > backup: I plan to use linux brtfs to save data,use its snapshot to > backup,it's OK? > > > thxs. > > > easy > > > ________________________________ > 网易Lofter,专注兴趣,分享创作! > ------------------------------------------------------------------------------ > For Developers, A Lot Can Happen In A Second. > Boundary is the first to Know...and Tell You. > Monitor Your Applications in Ultra-Fine Resolution. Try it FREE! > http://p.sf.net/sfu/Boundary-d2dvs2 > _______________________________________________ > Exist-open mailing list > [hidden email] > https://lists.sourceforge.net/lists/listinfo/exist-open > -- Adam Retter eXist Developer { United Kingdom } [hidden email] irc://irc.freenode.net/existdb ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Exist-open mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/exist-open |
|
In reply to this post by Dmitriy Shabanov
hi,thanks. but how to do in embeded mode?
|
|
Change mode from "remote" to "embedded" at startup form.
On Sat, Apr 28, 2012 at 2:54 PM, lin_xd <[hidden email]> wrote: -- hi,thanks. but how to do in embeded mode? Dmitriy Shabanov ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Exist-open mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/exist-open |
|
thanks。
when I ran ,get this: java.lang.NullPointerException at org.exist.client.InteractiveClient.getGroupName(InteractiveClient.jav a:356) at org.exist.client.InteractiveClient.getResources(InteractiveClient.jav a:390) at org.exist.client.InteractiveClient.run(InteractiveClient.java:2472) at org.exist.client.InteractiveClient.main(InteractiveClient.java:284) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl. java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces sorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.exist.start.Main.invokeMain(Main.java:137) at org.exist.start.Main.run(Main.java:463) at org.exist.start.Main.main(Main.java:59) |
|
I GOT IT ,THE Data has error.
|
|
In reply to this post by lin_xd
In my laptop, more than 10hours for about 600000 files.
more than 2hours now,just 19%. use embeded mode ,-X512mx, win7 ,java x32. 1.6.0_31.read from d:\ disk,save to e:\ logical disk |
|
~16.6 files per second ... not too bad....
On Sat, Apr 28, 2012 at 9:27 PM, lin_xd <[hidden email]> wrote: -- In my laptop, more than 10hours for about 600000 files. Dmitriy Shabanov ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Exist-open mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/exist-open |
|
Hi easy, I'm extremely interested in your results. Thanks for keeping us all posted! Cheers, Patrick
On Sat, Apr 28, 2012 at 12:50 PM, Dmitriy Shabanov <[hidden email]> wrote: ~16.6 files per second ... not too bad.... Patrick Bosek Jorsek Software Cell (585) 820 9634 Office (877) 492 2960 Jorsek.com ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Exist-open mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/exist-open |
|
> I'm extremely interested in your results. Thanks for keeping us all posted!
+1 ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Exist-open mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/exist-open |
|
In reply to this post by lin_xd
I think you may need a bit more than 512m main memory. Even though
eXist might be able to crawl along with a small configuration, you will see a major performance breakdown if the cache gets too small to keep the inner btree index pages in memory. The type and number of indexes is very important as well. In particular, the lucene index can be a bottleneck. The changes I committed yesterday (update to Lucene 3.2) fix this issue and the upload becomes a lot smoother. In my last test yesterday I created range and lucene indexes on some header fields plus the entire body of each article. This generates a rather comprehensive full text index due to the amount of text. Uploading 665000 articles (12gb of xml) took about 4.5 hours using the client in embedded mode with -Xmx8192m on a windows server. I also changed the following settings in conf.xml: <db-connection cacheSize="768M" checkMaxCacheSize="true" collectionCache="10000M" database="native" files="webapp/WEB-INF/data" pageSize="4096" nodesBuffer="-1" cacheShrinkThreshold="10000" doc-ids="default"> <module id="lucene-index" buffer="768" class="org.exist.indexing.lucene.LuceneIndex" /> I guess this configuration would also work with 4096m or less main memory. The important settings are cacheSize, collectionCache and nodesBuffer. The cacheSize needs to be large enough or eXist will start crawling. Note: I was using latest eXist-db trunk. Wolfgang 2012/4/28 lin_xd <[hidden email]>: > In my laptop, more than 10hours for about 600000 files. > more than 2hours now,just 19%. use embeded mode ,-X512mx, > win7 ,java x32. 1.6.0_31.read from d:\ disk,save to e:\ logical disk ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Exist-open mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/exist-open |
|
badly, I updated the trunk. but get :
org.exist.EXistException: database instance 'exist' is not available at org.exist.storage.BrokerPool.getInstance(BrokerPool.java:299) at org.exist.storage.BrokerPool.getInstance(BrokerPool.java:285) at org.exist.jetty.JettyStart.run(JettyStart.java:179) at org.exist.jetty.JettyStart.main(JettyStart.java:70) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl. java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces sorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.exist.start.Main.invokeMain(Main.java:137) at org.exist.start.Main.run(Main.java:463) at org.exist.start.Main.main(Main.java:59) |
|
> badly, I updated the trunk. but get :
> org.exist.EXistException: database instance 'exist' is not available > at org.exist.storage.BrokerPool.getInstance(BrokerPool.java:299) This usually means eXist could not start the database due to 1) another db already running, 2) lock files not being released, 3) other internal errors. The log files (exist.log) should contain more information. Wolfgang ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Exist-open mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/exist-open |
|
> This usually means eXist could not start the database due to 1)
> another db already running, 2) lock files not being released, 3) other > internal errors. The log files (exist.log) should contain more > information. Sorry, I just see that rev 16318 is broken due to an incomplete commit. Rev 16317 is ok. Wolfgang ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Exist-open mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/exist-open |
|
This post was updated on .
Thanks。 There is no other instance running. all lock file removed , the log:
2012-05-02 21:14:36,636 [main] DEBUG (ConfigurationHelper.java [getExistHome]:61) - Could not retrieve instance of brokerpool: database instance 'exist' is not available 2012-05-02 21:14:36,636 [main] DEBUG (ConfigurationHelper.java [getExistHome]:68) - Got eXist home from system property 'exist.home': E:\opt\exist\trunk\eXist 2012-05-02 21:14:36,683 [main] DEBUG (ConfigurationHelper.java [getExistHome]:61) - Could not ....... 2012-05-02 21:14:45,159 [main] DEBUG (NativeStructuralIndex.java [open]:51) - Creating 'structure.dbx'... 2012-05-02 21:14:45,159 [main] INFO (IndexManager.java [initIndex]:99) - Registered index org.exist.storage.structural.NativeStructuralIndex as structural-index 2012-05-02 21:14:45,221 [main] ERROR (BrokerPool.java [configure]:247) - Unable to initialize database instance 'exist': no database defined java.lang.RuntimeException: no database defined at org.exist.storage.BrokerFactory.getInstance(BrokerFactory.java:51) at org.exist.storage.BrokerPool.createBroker(BrokerPool.java:1340) at org.exist.storage.BrokerPool.get(BrokerPool.java:1445) at org.exist.storage.BrokerPool.initialize(BrokerPool.java:814) at org.exist.storage.BrokerPool.<init>(BrokerPool.java:690) at org.exist.storage.BrokerPool.configure(BrokerPool.java:228) at org.exist.xmldb.DatabaseImpl.configure(DatabaseImpl.java:108) at org.exist.xmldb.DatabaseImpl.getLocalCollection(DatabaseImpl.java:183) at org.exist.xmldb.DatabaseImpl.getCollection(DatabaseImpl.java:163) at org.exist.xmldb.DatabaseImpl.getCollection(DatabaseImpl.java:158) at org.xmldb.api.DatabaseManager.getCollection(Unknown Source) at org.exist.client.InteractiveClient.connect(InteractiveClient.java:315) at org.exist.client.InteractiveClient.connectToDatabase(InteractiveClient.java:2338) at org.exist.client.InteractiveClient.run(InteractiveClient.java:2425) at org.exist.client.InteractiveClient.main(InteractiveClient.java:284) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.exist.start.Main.invokeMain(Main.java:137) at org.exist.start.Main.run(Main.java:463) at org.exist.start.Main.main(Main.java:59) -- 此致 easy 莫愁前路无知己,天下谁人不识君。 在 2012-05-02 20:43:08,"Wolfgang Meier-2 [via eXist]" <ml-node+s2174344n4603089h43@n4.nabble.com> 写道: > This usually means eXist could not start the database due to 1) > another db already running, 2) lock files not being released, 3) other > internal errors. The log files (exist.log) should contain more > information. Sorry, I just see that rev 16318 is broken due to an incomplete commit. Rev 16317 is ok. Wolfgang ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Exist-open mailing list [hidden email] https://lists.sourceforge.net/lists/listinfo/exist-open If you reply to this email, your message will be added to the discussion below: http://exist.2174344.n4.nabble.com/Using-existdb-to-manage-our-state-s-EHR-the-problem-to-resolve-tp4577618p4603089.html To unsubscribe from Using existdb to manage our state's EHR , the problem to resolve., click here. NAML |
| Powered by Nabble | Edit this page |
