*********************************** * XWHEP-5.6.3 * *---------------------------------* * Released date : 28 avril 2009 * * Author : Oleg Lodygensky * * lodygens@lal.in2p3.fr * *********************************** -A- Corrections -A.0- To avoid confusion between XtremWeb and XWHEP, that last now installs everything in /opt/XWHEP-server, /opt/XWHEP-worker and /opt/XWHEP-bridge. XWHEP packages (Debian, RedHat, Apple) install XWHEP-something package sand not xtremweb-something package. All previously existing files (scripts, configuration files etc.) keep their "old" names (xtremweb.something) to not disturb those who have already developed over XWHEP. -A.1- default socket timeout set to 0 -A.2- HTTPLauncher, which try to find last JAR version and start the worker has been corrected : * it now uses the last JAR on local FS. * it now stops immediatly if it can't write downloaded JAR to local FS * it has been reported that "-server" java option is not available on all JVM; HTTPLauncher then first tries to launch the worker with that option. On error, it retries without that option. -A.3- On server side, inter threads deadlocks found on communication layer that lead to "unreachable" or "timed out" communication errors on workers and clients sides. -A.4- A bug corrected on server side : it was impossible to reuse an application name, even if the application was deleted -A.5- The worker now stores its own UID to its config file so that it does not appear several times in server DB -A.6- installer corrected (RPM, DPKG, Apple PKG) because FS tree must belong to worker so that it can write downloaded JAR, if any -B- New feature -B.1- The worker can now manage dynamically linked application (and not statically ones only) ********************************** * XWHEP-5.6.2 * *--------------------------------* * Released date : 6 avril 2009 * * Author : Oleg Lodygensky * * lodygens@lal.in2p3.fr * ********************************** This version corrects "black hole computing" where under certain circumstances correctly inserted datas where not downloadable This was due to the fact that InetAddress.getCanonicalHostName() may return IP address under certain network security issues. I then replaced InetAddress.getCanonicalHostName() by InetAddress.getHostName(). This clearly means that servers (scheduler, data server...) addresses MUST be fully qualified in config files (e.g. auger9 is not enough, we must set it to auger9.lal.in2p3.fr) =================================================================== M src/xtremweb/common/util.java =================================================================== ********************************** * XWHEP-5.6.1 * *--------------------------------* * Released date : 12 mars 2009 * * Author : Oleg Lodygensky * * lodygens@lal.in2p3.fr * ********************************** =================================================================== Sometimes, MySQL is strict on comment header and accepts only "-- " and nothing else (e.g. neither "---" nor "--" ) M src/scripts/xwsetupdb.sql.in =================================================================== Several bugs corrected Now creates an ZIP file to ease client deployment; this zip file contains client configuration template only where LOGIN and PASSWORD must be set. A bug corrected on group and user creation : * debian commands : addgroup and adduser * redhat (Sci LInux) commands : groupadd and useradd Now tries to create all packages : RPM , Debian, Apple depending of the paltform running the script itself M src/scripts/xwconfigure Several bugs corrected M src/scripts/xtremwebconf.sh M src/scripts/xtremweb =================================================================== Next correct a bug for the HTTPLauncher : the launcher needs to know where is its JAR file M src/scripts/xtremweb.bridge.pl =================================================================== Improve log outputs M src/xtremweb/dispatcher/DBInterface.java M src/xtremweb/common/Cache.java M src/xtremweb/worker/CommManager.java =================================================================== 1. Use TCP SO_TIMEOUT (default 60s) to be able to check when a server hangs 2. Inserting new command XMLRPCCommandVersion to retreive server server. With this new command we don't need to use alive signal to retreive server version Using alive signal lead to see as twice as connected workers. 3. Better exception usage for a better feedback to user (not finished yet) User is now aware of SSL handshake errors (wrong public key) M src/xtremweb/communications/* =================================================================== Use absolute path only : this mainly solves paths in config files M src/xtremweb/dispatcher/Data.java M src/xtremweb/common/util.java M src/xtremweb/common/XWConfigurator.java M src/xtremweb/common/Zipper.java M src/xtremweb/archdep/XWExecPureJava.java M src/xtremweb/archdep/XWUtilSolaris.java M src/xtremweb/archdep/ArchDepFactory.java M src/xtremweb/worker/ClassLauncherANTLR.java M src/xtremweb/worker/ThreadWork.java M src/xtremweb/worker/Work.java =================================================================== A new property : XWCP contains the original full path to xtremweb.jar (needed by HTTPLauncher) SO_TIMEOUT default is 60s M src/xtremweb/common/XWPropertyDefs.java =================================================================== A new property : XWCP default values depending on OS M src/xtremweb/common/XWOSes.java =================================================================== Using XWCP in case of downloading error M src/xtremweb/common/HTTPLauncher.java =================================================================== 1. When submitting a job, --xwenv now accept an URI, an UID, a zip file, a file or a directory These two last are then zipped by the client. 2. Cache usage bug resolved M src/xtremweb/client/Client.java =================================================================== A bug corrected for the HTTPLauncher : the launcher needs to know where is its JAR file M build/installers/macosx/xtremweb.worker/installer/PckRoot/usr/local/bin/xtremweb.worker =================================================================== Manage the OS user that run the xtremweb.worker A build/installers/macosx/xtremweb.worker/installer/PckRoot/private/etc/xtremweb.worker/adduser.sh A build/installers/macosx/xtremweb.worker/installer/PckRoot/private/etc/xtremweb.worker/rmuser.sh A build/installers/macosx/xtremweb.worker/installer/PckRoot/private/etc/xtremweb.worker/del-group.sh A build/installers/macosx/xtremweb.worker/installer/PckRoot/private/etc/xtremweb.worker/del-user4group.sh =================================================================== Removed because it has never been used nor tested D build/installers/win32/innoSetup/server.iss =================================================================== Version is now defined at installation time M build/installers/win32/innoSetup/worker.iss =================================================================== Updated M build/installers/win32/innoSetup/README.txt =================================================================== Next don't need to be in SVN D build/installers/macosx/xtremweb.worker/installer/PckRoot/private/etc/xtremweb.worker/xwhepworker.keys D build/installers/macosx/xtremweb.worker/installer/PckRoot/private/etc/xtremweb.worker/xtremweb.jar =================================================================== Debian DPKG to install and remove xtremweb.worker M build/dpkg/xtremweb-worker.control A build/dpkg/xtremweb-worker.postinst A build/dpkg/xtremweb-worker.postrm A build/dpkg/xtremweb-worker.preinst A build/dpkg/xtremweb-worker.prerm =================================================================== A simple example of an rpmmacro file M build/rpm/rpmmacro =================================================================== A bug corrected M build/rpm/xtremweb-worker.spec =================================================================== Version is now defined at installation time M build/dpkg/xtremweb-worker.control M build/rpm/xtremweb-worker.spec M build/rpm/xtremweb-server.spec M build/rpm/xtremweb-src.spec =================================================================== Next defines version as 5.6.1 M build/VERSION =================================================================== Exclude unecessary files from tar.gz M build/xwtarexclude.txt =================================================================== Introducing Debian DPKG and debug xwconfigure Creates a client config template M build/install.xml =================================================================== Updated M INSTALL ********************************** * Feb 16th 2009 : XWHEP 5.6.0 ********************************** + this version reintroduce the launcher to help deployment and upgrades; + this version introduces "binary package". ********************************** * Feb 4th 2009 : XWHEP 5.5.0 ********************************** + a memory leak bug solved on server side. ********************************** * Jan 14th 2009 : XWHEP 5.4.0 ********************************** + a bug corrected on I/O layer ********************************** * Dec 16th 2008 : XWHEP 5.3.0 ********************************** + on server side, event dates are now corrects (submission date, execution date, completion date etc.) + a bug corrected on the bridge. ********************************** * Dec 4th, 2008 : XWHEP 5.2.0 ********************************** + a bug corrected on server side, on communication handling : handlers does not hang anymore. ********************************** * Dec 1st, 2008 : XWHEP 5.1.0 ********************************** + the EDGeS XWHEP to EGEE bridge now periodiaclly sends a signal to XWHEP servers to facilitate global monitoring; + a new script "xtremweb.jra2" has been implemented to monitor known XWHEP servers; this script has specifically been developed for EDGeS JRA2 activity; + a bug corrected on server side, on communication handling; it seems that synchronized methods management is JVM implementation dependant; + client scripts cleaned; + worker configuration page can be used to manage a local pool of workers. ********************************** * Novembre 21st, 2008 : ********************************** + EDGeS bridge from XWHEP to EGEE allowing EGEE resource usage is operational; it is monitored from here. ********************************** * Novembre 17th, 2008 : XWHEP 5.0.0 ********************************** + server uses a connection pool to avoid memory starvation. Server can manage up to 500 simultaneous connections. Above the pool size, incoming connections are pending for an available handler; + client has a new option —xwxml to provide an XML description file; + communication layer has been simplified : there is only one send message for all object kinds; + EGEE bridge has been stabilized. ********************************** * Octobre 16th, 2008 : XWHEP 4.1.0 ********************************** + a bug corrected on worker side regarding job directory setup ********************************** * Octobre 15th, 2008 : XWHEP 4.0.0 ********************************** + client can connect to different servers : # client does not include any passphrase and does not code passwords; # passwords are not coded any more in config file, nor in database; # this does not introduce security hole since communications are encrypted; it is the user responsability to ensure config file security; # this lightens compilation which does not require SQL access any more. + a bug has been corrected on worker side, regarding data download; + notions of groups and sesssions are (re)introduced. Groups and sessions aggragate jobs. Sessions are automatically removed on client disconnection (client disconnects at shut down or user switch); + there is a bug on the client GUI : deleting and downloading several rows is now disabled. This is due to a bug in table sorter; we don’t correct that since Java 6 introduces native table sorters. This will then be corrected when our package will be ported to Java 6; + our package is now Java 6 (even if we don’t use Java 6 specific features -see above) and 64Bits compatible. ********************************** * Sept 25th 2008 : XWHEP 3.1.0 ********************************** + in the configuration file, the SLKeystore variable can now contain a relatif path; + resource owner can open http://localhost:4324 to configure their worker; + cache management improved and lightly modified : # in general, informations stored in cache are not downloaded from server. There are three exceptions: works, tasks and hosts are always redownloaded from server since these informations are subect to change often; # client keeps its cache from run to run. A new command is introduced (xwclean) to clean client cache; this command is also available in the Comm menu of the GUI; # the worker cleans its local disk on shut down. Hence, the worker does not keep cache from run to run. + a bug solved on server side : memory consumption more stable. The JAR file is now also provided. To update you have to + copy the JAR file in lib directory and restart your server + copy the JAR file where launcher.url, in worker config file, points to; on next reboot workers will automatically download it. Next figures show server memory consumption. The 1st one shows a starving behaviour: after 1H30 only, the amount of allocated memory is as up as 78Mb. The 2nd one shows a more stable consumption : the amount is still the initial one at only 31Mb after two hours. ********************************** * Sept 10th 2008 : XWHEP 3.0.0 ********************************** + X509 certificate proxy usage to enable resource sharing with institutional grids; + synchronization improved: each message now expects an answer from server; + performance degradation solved on server side. The two next figures show 1000 submissions received by server. We can see the performance degradation on the first one; the 2nde figure shows that degradation is now solved. Total execution time is 2.5 times higher because messages now expect an answer from server : this increases synchronization. ********************************** * Septembre 1st, 2008 : XWHEP 2.1.0 ********************************** + the scheduler has been modified to improve performances. It is not a simple round-robin any longer : it now searches the full task set to try to find a task that could fit worker needs; + the autotest script now submits group tasks too. The following SQL command shows result more readably select apps.name as app,label, hex(works.accessrights),hex(apps.accessrights), works.status,users.login as worker_login, users.rights as worker_level from works,apps,tasks,hosts,users where works.appuid=apps.uid and works.uid=tasks.uid and tasks.hostuid=hosts.uid and hosts.owneruid=users.uid *rder by works.label; We can see that public worker (which login is worker) has run public jobs only (which labels are public...); private workers (which logins are user...) has run jobs of their own identity only (which labels end by their own login). "Résultats de l'auto test ********************************** * August 29th, 2008 :2.0.0 ********************************** + bugs solved on cache management; + bugs solved on users and application management, on server side; + bugs solved on client GUI; + bugs solved on server certificate management. To install this version, you must reinstall the database. The server is now certified by an autosigned SSL key which must be generated by createKeys. Next version will use X.509 certificate certified by a CA. Installation and deployment needs following actions (in that order : createKeys must be executed before install). $> make removeDB $> make installDB $> make clean $> make $> make createKeys $> make install For a production deployment, keys must be safelly stored, otherwise (if you lose or accidentally regenerate keys) a full re-deployment is necessary. Electronic key usage has a cost in terms of communication. Figure on the left shows the necessary TCP packet amount without SSL. Figure on the right shows the one with SSL : the packet amount is as twice. TCP paquets amount without SSL TCP paquets amount with SSL A script to auto test the platform is now provided in bin repertory: $> xtremweb.tests.pl You must have the platform privileged rights to run the script (as provided by the default client config file). The script does the following: + insert a new public application + insert two new user groups + insert 6 new users : two users per group and two users with no group + insert a private application per user + insert 12 jobs : on private and one public per user + launch one public and 6 private workers on local host + jobs monitoring At the end of the script, we can see that all jobs are COMPLETED. We can verify this with the following SQL command (we can’t check this with the client because the client does not show worker identity for each job) : select works.status,works.label,hosts.name,users.login \ from users,works,tasks,hosts \ where tasks.uid=works.uid \ and tasks.hostuid=hosts.uid and \ hosts.owneruid=users.uid \ *rder by users.login; Next figure shows auto test results. We can see that public worker (which login is worker) has run public jobs only (which labels are public...); private workers (which logins are user...) has run jobs of their own identity only (which labels end by their own login). Autotest results ********************************** * July 30th, 2008: you can download the MSDev deploiement solution ********************************** ********************************** * July 24th 2008 : 1.2.0 ********************************** A bug found on worker side, in standard input (stdin) management. There is still a problem if user application test stdin availability. There is great chance that the platform has not had time to set stdin correctly before the application test. This is due to Java langage used to develop the platform. User application developpers should not test stdin but just read it. If data availability from stdin is only an option for the application, please use a text file instead. Example: Next works correctly if myApp reads from stdin, but does not test stdin $> myApp < aFile *therwise, if data from stdin is optionnal, please modify your application so that you can retreive your optionnal datas from a text file; just like in: $> myApp -f aFile Sorry for inconveniences. ********************************** * July 21st 2008 : 1.1.0 ********************************** + introducing X.509 proxy management + a bug solved on client side, on data download To install this version, you must reinstall the database $> make removeDB $> make installDB $> make install ********************************** * July 17th 2008 :1.0.31 ********************************** + a bug found on worker side, on error management + a bug found on input/output + the worker HTML page is now customizable + performances improved thanks to a better cache management Next figures show 1000 job submissions. We can see that I/O correction and cache usage need four times less internal calls for an execution 14 times faster. ********************************** * Jun 19th 2008 : 1.0.30 ********************************** A major bug found on task management, on server side. ********************************** * May 22nd 2008 : 1.0.29 ********************************** Three bugs corrected + config output when inserting a new user; + tasks downloads; + datas URI. ********************************** * May 7th, 2008 : 1.0.28 ********************************** Minor clean up only. ********************************** * Apr 28th, 2008 : version 1.0.27 successfully tested on Grid5000 : 12000 jobs executed over 200 workers ********************************** ********************************** * Apr, 25th 2008 : 1.0.27 ********************************** Bug resolved in result storage protocol. ********************************** * Apr, 23th 2008 : 1.0.26 ********************************** Bug resolved in data submission on client side. ********************************** * Apr, 17th 2008 : 1.0.25 ********************************** A database access bug resolved. ********************************** * Apr, 15th 2008 : 1.0.24 ********************************** We don’t use log4j any more, since we suspect memory leaks. ********************************** * Apr, 8th 2008 : 1.0.23 ********************************** A bug resolved on result upload on worker side. ********************************** * Apr, 4th 2008 : 1.0.22 ********************************** A bug resolved on communication layer. ********************************** * Apr, 3rd 2008 : 1.0.21 ********************************** A bug resolved on worker side. ********************************** * Apr, 2nd 2008 : 1.0.20 ********************************** A bug resolved on client side; result download works corretly now. ********************************** * Apr, 1st 2008 : 1.0.19 ********************************** The client automatically creates datas when submiting jobs with stdin and/or environment. ********************************** * Mar 17th 2008 : a bug on communication delays resolved. ********************************** ********************************** * Feb 20th 2008 : a bug on HTTP layer is under expertize; TCP layer is the default one until further notification. ********************************** ********************************** * Feb 7th 2008 : two bugs solved : ********************************** + inter thread synchronization on server side; + public/private IP resolution. ********************************** * Jan 25th 2008 : the scheduler has been debugged; it now correctly manages private, group and public applications and workers: ********************************** + public worker has "WORKER_USER" user rights; it can manage public jobs only (jobs access rights includes o+rx); + group worker is a public worker (with "WORKER_USER" user rights) belonging to a group; it can manage group jobs only (jobs access rights includes g+rx); + private worker is a non public worker (without "WORKER_USER" user rights) and can manage its own user jobs only (jobs access rights includes u+rx) ********************************** * Jan 16th 2008 : a bug found on client level; ********************************** ********************************** * Jan 11th 2008 : a bug found on DB management; ********************************** ********************************** * Jan 8th 2008 : installers ready; client debugged; ********************************** ********************************** * Dec 21th 2007 : The middleware answers to requirements. Deeper tests under process; ********************************** ********************************** * Nov 26th 2007 : The middleware is on last testings. ********************************** ********************************** * XWHEP-1.0.13.tar.bz2 : 22/11/2007 ********************************** - debbuging: everything is fine with TCP, but file odload/download don't work with HTTP ********************************** * XWHEP-1.0.12.tar.bz2 : 22/11/2007 ********************************** - cache removed from client; the cache is now at the communication level in CommClient.java - the client GUI is on its last rewriting step ********************************** * XWHEP-1.0.11.tar.bz2 : 22/11/2007 ********************************** - still rewriting client GUI ********************************** * XWHEP-1.0.10.tar.bz2 : 19/11/2007 ********************************** - rewriting client GUI ********************************** * XWHEP-1.0.9.tar.bz2 : 13/11/2007 ********************************** - the activator is not in worker any more but in threadlaunch itself - the activator can be changed by HTTPStatHadler - any modification is saved on config file ********************************** * XWHEP-1.0.8.tar.bz2 : 8/11/2007 ********************************** - The new scheduling is in MatchingScheduler and not SimpleScheduler - worker owner can confiure its worker thrgough HTTP (to be continued) ********************************** * XWHEP-1.0.7.tar.bz2 : 6/11/2007 ********************************** - The schedulling understands public/private workers ********************************** * XWHEP-1.0.6.tar.bz2 : 26/10/2007 ********************************** - There is only one XMLRPCCommandGet replacing XMLRPCCommandGetApp, XMLRPCCommandGetData etc. - Remove one-jar: Using one-jar is not compatible with HTTPLauncher, nor Java Web Start since one-jar jar file must be in the classpath ********************************** * XWHEP-1.0.5.tar.bz2 : 22/10/2007 ********************************** - Introducing one-jar