User Tools

Site Tools


products:duplicates

====== Differences ====== This shows you the differences between two versions of the page.

Link to this comparison view

products:duplicates [2009/11/26 21:14]
sebastian created
products:duplicates [2023/11/19 22:46] (current)
Line 10: Line 10:
  
 ===== Usage ===== ===== Usage =====
-To initially assign the duplicate aspect to all documents run the '​Update Duplicates'​ action on the root folder. This can take a lot of time. So you may want to check with a single folder first. 
 As soon as the duplicates aspect is assigned to a document you can see the number of duplicates in the documents details. As soon as the duplicates aspect is assigned to a document you can see the number of duplicates in the documents details.
 Make a copy of a document that has the duplicate aspect. The number of duplicates should now be other then 0. Additionally there is a blue arrow on each document that has duplicates in the web clients'​ folder view. The blue arrow can also be seen in the documents details view. Click the button to see all the documents duplicates. Make a copy of a document that has the duplicate aspect. The number of duplicates should now be other then 0. Additionally there is a blue arrow on each document that has duplicates in the web clients'​ folder view. The blue arrow can also be seen in the documents details view. Click the button to see all the documents duplicates.
  
-===== Problems ​===== +===== Initially assign the duplicates aspect to all documents ​===== 
-In Internet Explorer 8 the blue arrow indicating that a document ​has duplicates ​is sometimes not displayed ​in the folder ​view.+Using behaviours ​the duplicates aspect (hashsum and duplicate count) is automatically assigned to a document ​as soon as it is created or its content is changed. But if you install the Duplicate Finder ​in a database that already has lots of documents then all these documents have to processed by the Duplicate Finder initially. This chapter describes how to do this initial processing of all  documents already in the database. If you install the Duplicate Finder in a fresh database you can skip this. 
 +To initially process all docs in the database you have to start the UpdateDuplicatesAction as a scheduled action. This is quite simple: 
 +  * Install the duplicates.amp into your alfresco.war (see previous chapter) 
 +  * Go to the <​TOMCAT_ROOT>/​shared/​classes/​alfresco/​extension ​folder 
 +  * Create a new file named scheduled-action-services-context.xml 
 +  * Put the following lines of code into this file 
 +<​code>​ 
 +<?xml version='​1.0'​ encoding='​UTF-8'?>​ 
 +<​!DOCTYPE beans PUBLIC '​-//​SPRING//​DTD BEAN//​EN'​ '​http://​www.springframework.org/​dtd/​spring-beans.dtd'>​
  
 +<​beans>​
 +    ​
 +    <!--
 +    Define the model factory used to generate object models suitable for use with freemarker templates. ​
 +    -->
 +    <bean id="​templateActionModelFactory"​ class="​org.alfresco.repo.action.scheduled.FreeMarkerWithLuceneExtensionsModelFactory">​
 +        <​property name="​serviceRegistry">​
 +            <ref bean="​ServiceRegistry"/>​
 +        </​property>​
 +    </​bean>​
 +    ​
 +    <!-- An action that adds the duplicates aspect, the hashsum and the duplicates count to a node. -->
 +    <bean id="​addDuplicatesTemplateActionDef"​
 +        class="​org.alfresco.repo.action.scheduled.SimpleTemplateActionDefinition">​
 +        <​property name="​actionName">​
 +            <​value>​updateDuplicatesAction</​value>​
 +        </​property>​
 +        <​property name="​parameterTemplates">​
 +            <map>
 +            </​map>​
 +        </​property>​
 +        <​property name="​templateActionModelFactory">​
 +            <ref bean="​templateActionModelFactory"​ />
 +        </​property>​
 +        <​property name="​dictionaryService">​
 +            <ref bean="​DictionaryService"​ />
 +        </​property>​
 +        <​property name="​actionService">​
 +            <ref bean="​ActionService"​ />
 +        </​property>​
 +        <​property name="​templateService">​
 +            <ref bean="​TemplateService"​ />
 +        </​property>​
 +    </​bean>​
  
 +    <bean id="​addDuplicatesCron"​
 +        class="​org.alfresco.repo.action.scheduled.CronScheduledQueryBasedTemplateActionDefinition">​
 +        <​property name="​transactionMode">​
 +            <​value>​ISOLATED_TRANSACTIONS</​value>​
 +        </​property>​
 +        <​property name="​compensatingActionMode">​
 +            <​value>​IGNORE</​value>​
 +        </​property>​
 +        <​property name="​searchService">​
 +            <ref bean="​SearchService"​ />
 +        </​property>​
 +        <​property name="​templateService">​
 +            <ref bean="​TemplateService"​ />
 +        </​property>​
 +        <​property name="​queryLanguage">​
 +            <​value>​lucene</​value>​
 +        </​property>​
 +        <​property name="​stores">​
 +            <​list>​
 +                <​value>​workspace://​SpacesStore</​value>​
 +            </​list>​
 +        </​property>​
 +        <!-- Find all nodes that do not have the aspect -->
 +        <​property name="​queryTemplate">​
 +            <​value>​PATH:"//​\*"​
 +                -TYPE:"​{http://​www.alfresco.org/​model/​content/​1.0}systemfolder"​
 +                -TYPE:"​{http://​www.alfresco.org/​model/​content/​1.0}folder"​
 +                -TYPE:"​{http://​www.alfresco.org/​model/​application/​1.0}folderlink"​
 +                -TYPE:"​{http://​www.alfresco.org/​model/​application/​1.0}filelink"​
 +                -TYPE:"​{http://​www.alfresco.org/​model/​content/​1.0}category_root"​
 +                -TYPE:"​{http://​www.alfresco.org/​model/​content/​1.0}category"​
 +                -TYPE:"​{http://​www.alfresco.org/​model/​content/​1.0}dictionaryModel"​
 +                -TYPE:"​{http://​www.alfresco.org/​model/​content/​1.0}link"​
 +                -TYPE:"​{http://​www.alfresco.org/​model/​content/​1.0}person"​
 +                -TYPE:"​{http://​www.alfresco.org/​model/​action/​1.0}actioncondition"​
 +                -TYPE:"​{http://​www.alfresco.org/​model/​content/​1.0}authorityContainer"​
 +                -TYPE:"​{http://​www.alfresco.org/​model/​rule/​1.0}rule"</​value>​
 +        </​property>​
 +        <​property name="​cronExpression">​
 +            <​value>​0 14 15 * * ?</​value>​
 +        </​property>​
 +        <​property name="​jobName">​
 +            <​value>​jobDuplicates</​value>​
 +        </​property>​
 +        <​property name="​jobGroup">​
 +            <​value>​jobGroup</​value>​
 +        </​property>​
 +        <​property name="​triggerName">​
 +            <​value>​triggerDuplicates</​value>​
 +        </​property>​
 +        <​property name="​triggerGroup">​
 +            <​value>​triggerGroup</​value>​
 +        </​property>​
 +        <​property name="​scheduler">​
 +            <ref bean="​schedulerFactory"​ />
 +        </​property>​
 +        <​property name="​actionService">​
 +            <ref bean="​ActionService"​ />
 +        </​property>​
 +        <​property name="​templateActionModelFactory">​
 +            <ref bean="​templateActionModelFactory"​ />
 +        </​property>​
 +        <​property name="​templateActionDefinition">​
 +            <ref bean="​addDuplicatesTemplateActionDef"​ />
 +        </​property>​
 +        <​property name="​transactionService">​
 +            <ref bean="​TransactionService"​ />
 +        </​property>​
 +        <​property name="​runAsUser">​
 +            <​value>​System</​value>​
 +        </​property>​
 +    </​bean>​
 +    ​
 +</​beans>​
 +</​code>​
 +  * Have a look at this line
 +<​code>​
 +<​property name="​cronExpression">​
 +    <​value>​0 15 23 * * ?</​value>​
 +</​property>​
 +</​code>​
 +Thats where the start time for the action must be given. In my example the action will start at 15 minutes past 11pm. To know more about cron expressions have a look at [[http://​wiki.alfresco.com/​wiki/​Scheduled_Actions#​Cron_Explained|Scheduled Actions - Alfresco Wiki]].
 +  * Start Alfresco.
 +The processing of the nodes should start in the background at the time given. If you want to have some info about the processing done by the update duplicates action, set the logger for the action on info. Therefore add this line
 +<​code>​
 +log4j.logger.de.hmedia.alfresco.actions=info
 +</​code>​
 +at the end of your <​TOMCAT_ROOT>/​webapps/​alfresco/​WEB-INF/​classes/​log4j.properties file.
products/duplicates.1259270083.txt.gz · Last modified: 2023/11/19 22:45 (external edit)