Index your Alfresco content material utilizing the brand new Amazon Kendra Alfresco connector
Amazon Kendra is a extremely correct and simple-to-use clever search service powered by machine studying (ML). Amazon Kendra presents a collection of information supply connectors to simplify the method of ingesting and indexing your content material, wherever it resides.
Useful knowledge in organizations is saved in each structured and unstructured repositories. An enterprise search resolution ought to have the ability to index and search throughout a number of structured and unstructured repositories.
Alfresco Content material Providers gives open, versatile, extremely scalable enterprise content material administration (ECM) capabilities with the added advantages of a content material companies platform, making content material accessible wherever and nevertheless you’re employed via straightforward integrations with the enterprise purposes you employ each day. Many organizations use the Alfresco content material administration platform to retailer their content material. One of many key necessities for enterprise clients utilizing Alfresco is the power to simply and securely discover correct info throughout all of the saved paperwork.
We’re excited to announce you could now use the brand new Amazon Kendra Alfresco connector to go looking paperwork saved in your Alfresco repositories and websites. On this put up, we present how one can use the brand new connector to retrieve paperwork saved in Alfresco for indexing functions and securely use the Amazon Kendra clever search operate. As well as, the ML-powered clever search can precisely discover info from unstructured paperwork with pure language narrative content material, for which key phrase search just isn’t very efficient.
What’s new within the Amazon Kendra Alfresco connector
The Amazon Kendra Alfresco connector presents help for the next:
- Primary and OAuth2 authentication mechanisms for the Alfresco On-Premises (On-Prem) platform
- Primary and OAuth2 authentication mechanisms for the Alfresco PaaS platform
- Side-based crawling of Alfresco repository paperwork
Answer overview
With Amazon Kendra, you possibly can configure a number of knowledge sources to supply a central place to go looking throughout your doc repositories and websites. The answer on this put up demonstrates the next:
- Retrieval of paperwork and feedback from Alfresco personal websites and public websites
- Retrieval of paperwork and feedback from Alfresco repositories utilizing Amazon Kendra-specific features
- Authentication in opposition to Alfresco On-Prem and PaaS platforms utilizing Primary and OAuth2 mechanisms, respectively
- The Amazon Kendra search functionality with entry management throughout websites and repositories
If you’ll use solely one of many platforms, you possibly can nonetheless comply with this put up to construct the instance resolution; simply ignore the steps comparable to the platform that you’re not utilizing.
The next is a abstract of the steps to construct the instance resolution:
- Add paperwork to the three Alfresco websites and the repository folder. Ensure that the uploaded paperwork are distinctive throughout websites and repository folders.
- For the 2 personal websites and repository, use document-level Alfresco permission administration to set entry permissions. For the general public website, you don’t must arrange permissions on the doc degree. Notice that permissions info is retrieved by the Amazon Kendra Alfresco connector and used for entry management by the Amazon Kendra search operate.
- For the 2 personal websites and repository, create a brand new Amazon Kendra index (you employ the identical index for each the personal websites and the repository). For the general public website, create a brand new Amazon Kendra index.
- For the On-Prem personal website, create an Amazon Kendra Alfresco knowledge supply utilizing Primary authentication, inside the Amazon Kendra index for personal websites.
- For the On-Prem repository paperwork with Amazon Kendra-specific features, create a knowledge supply utilizing Primary authentication, inside the Amazon Kendra index for personal websites.
- For the PaaS personal website, create a knowledge supply utilizing Primary authentication, inside the Amazon Kendra index for personal websites.
- For the PaaS public website, create a knowledge supply utilizing OAuth2 authentication, inside the Amazon Kendra index for public websites.
- Carry out a sync for every knowledge supply.
- Run a take a look at question within the Amazon Kendra index meant for personal websites and the repository utilizing entry management.
- Run a take a look at question within the Amazon Kendra index meant for public websites with out entry management.
Stipulations
You want an AWS account with privileges to create AWS Identity and Access Management (IAM) roles and insurance policies. For extra info, see Overview of access management: Permissions and policies. You could have a primary data of AWS and how one can navigate the AWS Management Console.
For the Alfresco On-Prem platform, full the next steps:
- Create a non-public website or use an present website.
- Create a repository folder or use an present repository folder.
- Get the repository URL.
- Get Primary authentication credentials (consumer ID and password).
- Ensure that authentication are a part of the
ALFRESCO_ADMINISTRATORS
group. - Get the general public X509 certificates in .pem format and reserve it domestically.
For the Alfresco PaaS platform, full the next steps:
- Create a non-public website or use an present website.
- Create a public website or use an present website.
- Get the repository URL.
- Get Primary authentication credentials (consumer ID and password).
- Get OAuth2 credentials (consumer ID, consumer secret, and token URL).
- Affirm that authentication customers are a part of the
ALFRESCO_ADMINISTRATORS
group.
Step 1: Add instance paperwork
Every uploaded doc should have 5 MB or much less in textual content. For extra info, see Amazon Kendra Service Quotas. You’ll be able to add instance paperwork or use present paperwork inside every website.
As proven within the following screenshot, now we have uploaded 4 paperwork to the Alfresco On-Prem personal website.
Now we have uploaded three paperwork to the Alfresco PaaS personal website.
Now we have uploaded 5 paperwork to the Alfresco PaaS public website.
Now we have uploaded two paperwork to the Alfresco On-Prem repository.
Assign the side awskendra:indexControl
to a number of paperwork within the repository folder.
Step 2: Configure Alfresco permissions
Use the Alfresco Permissions Administration characteristic to offer entry rights to instance customers for viewing uploaded paperwork. It’s assumed that you’ve got some instance Alfresco consumer names, with e-mail addresses, that can be utilized for setting permissions on the doc degree in personal websites. These customers usually are not used for crawling the websites.
Within the following instance for the On-Prem personal website, now we have offered customers My Dev User1 and My Dev User2 with site-consumer entry to the instance doc. Repeat the identical process for the opposite uploaded paperwork.
Within the following instance for the PaaS personal website, now we have offered consumer Kendra Consumer 3 with site-consumer entry to the instance doc. Repeat the identical process for the opposite uploaded paperwork.
For the Alfresco repository paperwork, now we have offered consumer My Dev user1 with client entry to the instance doc.
The next desk lists the location or repository names, doc names, and permissions.
Platform | Web site or Repository Identify | Doc Identify | Consumer IDs |
On-Prem | MyAlfrescoSite | ChannelMarketingBudget.xlsx | My Supervisor User3 |
On-Prem | MyAlfrescoSite | wellarchitected-sustainability-pillar.pdf | My Dev User1, My Dev User2 |
On-Prem | MyAlfrescoSite | WorkDocs.docx | My Dev User1, My Dev User2, My Supervisor User3 |
On-Prem | MyAlfrescoSite | WorldPopulation.csv | My Dev User1, My Dev User2, My Supervisor User3 |
PaaS | MyAlfrescoCloudSite2 | DDoS_White_Paper.pdf | Kendra User3 |
PaaS | MyAlfrescoCloudSite2 | wellarchitected-framework.pdf | Kendra User3 |
PaaS | MyAlfrescoCloudSite2 | ML_Training.pptx | Kendra User1 |
PaaS | MyAlfrescoCloudPublicSite | batch_user.pdf | Everybody |
PaaS | MyAlfrescoCloudPublicSite | Amazon Easy Storage Service – Consumer Information.pdf | Everybody |
PaaS | MyAlfrescoCloudPublicSite | AWS Batch – Consumer Information.pdf | Everybody |
PaaS | MyAlfrescoCloudPublicSite | Amazon Detective.docx | Everybody |
PaaS | MyAlfrescoCloudPublicSite | Pricing.xlsx | Everybody |
On-Prem | Repo: MyAlfrescoRepoFolder1 | Polly-dg.pdf (side awskendra:indexControl) | My Dev User1 |
On-Prem | Repo: MyAlfrescoRepoFolder1 | Transcribe-api.pdf (side awskendra:indexControl) | My Dev User1 |
Step 3: Arrange Amazon Kendra indexes
You’ll be able to create a brand new Amazon Kendra index or use an present index for indexing paperwork hosted in Alfresco personal websites. To create a brand new index, full the next steps:
- On the Amazon Kendra console, create an index known as
Alfresco-Personal
. - Create a brand new IAM function, then select Subsequent.
- For Entry Management, select Sure.
- For Token Kind¸ select JSON.
- Hold the consumer title and group as default.
- Select None for consumer group growth as a result of we’re assuming no integration with AWS IAM Identity Center (successor to AWS Single Signal-On).
- Select Subsequent.
- Select Developer Version for this instance resolution.
- Select Create to create a brand new index.
The next screenshot reveals the Alfresco-Personal
index after it has been created.
- You’ll be able to confirm the entry management configuration on the Consumer entry management tab.
- Repeat these steps to create a second index known as
Alfresco-Public
.
Step 4: Create a knowledge supply for the On-Prem personal website
To create a knowledge supply for the On-Prem personal website, full the next steps:
- On the Amazon Kendra console, navigate to the
Alfresco-Personal
index. - Select Knowledge sources within the navigation pane.
- Select Add knowledge supply.
- Select Add connector for the Alfresco connector.
- For Knowledge supply title, enter
Alfresco-OnPrem-Personal
. - Optionally, add an outline.
- Hold the remaining settings as default and select Subsequent.
To hook up with the Alfresco On-Prem website, the connector wants entry to the general public certificates comparable to the On-Prem server. This was one of many stipulations.
- Use a distinct browser tab to add the .pem file to an Amazon Simple Storage Service (Amazon S3) bucket in your account.
You utilize this S3 bucket title within the subsequent steps.
- Return to the info supply creation web page.
- For Supply, choose Alfresco server.
- For Alfresco repository URL, enter the repository URL (created as a prerequisite).
- For Alfresco consumer software URL, enter the identical worth because the repository URL.
- For SSL certificates location, select Browse S3 and select the S3 bucket the place you uploaded the .pem file.
- For Authentication, choose Primary authentication.
- For AWS Secrets and techniques Supervisor secret, select Create and add new secret.
A pop-up window opens to create an AWS Secrets Manager secret.
- Enter a reputation in your secret, consumer title, and password, then select Save.
- For Digital Personal Cloud (VPC), select No VPC.
- Flip the id crawler on.
- For IAM function, select Create a brand new IAM function.
- Select Subsequent.
You’ll be able to configure the info supply to synchronize contents from a number of Alfresco websites. For this put up, we sync to the on-prem personal website.
- For Content material to sync, choose Single Alfresco website sync and select
MyAlfrescoSite
. - Choose Embody feedback to retrieve feedback along with paperwork.
- For Sync mode, choose Full sync.
- For Frequency, select Run on demand (or a distinct frequency choice as wanted).
- Select Subsequent.
- Map the Alfresco doc fields to the Amazon Kendra index fields (you possibly can hold the defaults), then select Subsequent.
- On the Assessment and Create web page, confirm all the knowledge, then select Add knowledge supply.
After the info supply has been created, the info supply web page is displayed as proven within the following screenshot.
Step 5: Create a knowledge supply for the On-Prem repository paperwork with Amazon Kendra-specific features
Equally to the earlier steps, create a knowledge supply for the On-Prem repository paperwork with Amazon Kendra-specific features:
- On the Amazon Kendra console, navigate to the
Alfresco-Personal
index. - Select Knowledge sources within the navigation pane.
- Select Add knowledge supply.
- Select Add connector for the Alfresco connector.
- For Knowledge supply title, enter
Alfresco-OnPrem-Facets
. - Optionally, add an outline.
- Hold the remaining settings as default and select Subsequent.
- For Supply, choose Alfresco server.
- For Alfresco repository URL, enter the repository URL (created as a prerequisite).
- For Alfresco consumer software URL, enter the identical worth because the repository URL.
- For SSL certificates location, select Browse S3 and select the S3 bucket the place you uploaded the .pem file.
- For Authentication, choose Primary authentication.
- For AWS Secrets and techniques Supervisor secret, select the key you created earlier.
- For Digital Personal Cloud (VPC), select No VPC.
- Flip the id crawler off.
- For IAM function, select Create a brand new IAM function.
- Select Subsequent.
For this scope, the connector retrieves solely these On-Prem server repository paperwork which were assigned a facet known as awskendra:indexControl
.
- For Content material to sync, choose Alfresco features sync.
- For Sync mode, select Full sync.
- For Frequency, select Run on demand (or a distinct frequency choice as wanted).
- Select Subsequent.
- Map the Alfresco doc fields to the Amazon Kendra index fields (you possibly can hold the defaults), then select Subsequent.
- On the Assessment and Create web page, confirm all the knowledge, then select Add knowledge supply.
After the info supply has been created, the info supply web page is displayed as proven within the following screenshot.
Step 6: Create a knowledge supply for the PaaS personal website
Observe related steps because the earlier sections to create a knowledge supply for the PaaS personal website:
- On the Amazon Kendra console, navigate to the
Alfresco-Personal
index. - Select Knowledge sources within the navigation pane.
- Select Add knowledge supply.
- Select Add connector for the Alfresco connector.
- For Knowledge supply title, enter
Alfresco-Cloud-Personal
. - Optionally, add an outline.
- Hold the remaining settings as default and select Subsequent.
- For Supply, choose Alfresco cloud.
- For Alfresco repository URL, enter the repository URL (created as a prerequisite).
- For Alfresco consumer software URL, enter the identical worth because the repository URL.
- For Authentication, choose Primary authentication.
- For AWS Secrets and techniques Supervisor secret, select Create and add new secret.
- Enter a reputation in your secret, consumer title, and password, then select Save.
- For Digital Personal Cloud (VPC), select No VPC.
- Flip the id crawler off.
- For IAM function, select Create a brand new IAM function.
- Select Subsequent.
We are able to configure the info supply to synchronize contents from a number of Alfresco websites. For this put up, we configure the info supply to sync from the PaaS personal website MyAlfrescoCloudSite2
.
- For Content material to sync, choose Single Alfresco website sync and select
MyAlfrescoCloudSite2
. - Choose Embody feedback.
- For Sync mode, select Full sync.
- For Frequency, select Run on demand (or a distinct frequency choice as wanted).
- Select Subsequent.
- Map the Alfresco doc fields to the Amazon Kendra index fields (you possibly can hold the defaults) and select Subsequent.
- On the Assessment and Create web page, confirm all the knowledge, then select Add knowledge supply.
After the info supply has been created, the info supply web page is displayed as proven within the following screenshot.
Step 7: Create a knowledge supply for the PaaS public website
We comply with related steps as earlier than to create a knowledge supply for the PaaS public website:
- On the Amazon Kendra console, navigate to the Alfresco-Public index.
- Select Knowledge sources within the navigation pane.
- Select Add knowledge supply.
- Select Add connector for the Alfresco connector.
- For Knowledge supply title, enter
Alfresco-Cloud-Public
. - Optionally, add an outline.
- Hold the remaining settings as default and select Subsequent.
- For Supply, choose Alfresco cloud.
- For Alfresco repository URL, enter the repository URL (created as a prerequisite).
- For Alfresco consumer software URL, enter the identical worth because the repository URL.
- For Authentication, choose OAuth2.0 authentication.
- For AWS Secrets and techniques Supervisor secret, select Create and add new secret.
- Enter a reputation in your secret, consumer ID, consumer secret, and token URL, then select Save.
- For Digital Personal Cloud (VPC), select No VPC.
- Flip the id crawler off.
- For IAM function, select Create a brand new IAM function.
- Select Subsequent.
We configure this knowledge supply to sync to the PaaS public website MyAlfrescoCloudPublicSite
.
- For Content material to sync, choose Single Alfresco website sync and select
MyAlfrescoCloudPublicSite
. - Optionally, choose Embody feedback.
- For Sync mode, select Full sync.
- For Frequency, select Run on demand (or a distinct frequency choice as wanted).
- Select Subsequent.
- Map the Alfresco doc fields to the Amazon Kendra index fields (you possibly can hold the defaults) and select Subsequent.
- On the Assessment and Create web page, confirm all the knowledge, then select Add knowledge supply.
After the info supply has been created, the info supply web page is displayed as proven within the following screenshot.
Step 8: Carry out a sync for every knowledge supply
Navigate to every of the info sources and select Sync now. Full just one synchronization at a time.
Watch for synchronization to be full for all knowledge sources. When every synchronization is full for a knowledge supply, you see the standing as proven within the following screenshot.
You may also view Amazon CloudWatch logs for a particular sync beneath Sync run historical past.
Step 9: Run a take a look at question within the personal index utilizing entry management
Now it’s time to check the answer. We first run a question within the personal index utilizing entry management:
- On the Amazon Kendra console, navigate to the
Alfresco-Personal
index and select Search listed content material.
- Enter a question within the search subject.
As proven within the following screenshot, Amazon Kendra didn’t return any outcomes.
- Select Apply token.
- Enter the e-mail tackle comparable to the My Dev User1 consumer and select Apply.
Notice that Amazon Kendra entry management works primarily based on the e-mail tackle related to an Alfresco consumer title.
- Run the search once more.
The search leads to a doc listing (containing wellarchitected-sustainability-pillar.pdf
within the following instance) primarily based on the entry management setup.
In the event you run the identical question once more and supply an e-mail tackle that doesn’t have entry to both of those paperwork, you shouldn’t see these paperwork within the outcomes listing.
- Enter one other question to go looking within the paperwork primarily based on the side
awskendra:indexControl
. - Select Apply token, enter the e-mail tackle comparable to My Dev User1 consumer, and select Apply.
- Rerun the question.
Step 10: Run a take a look at question within the public index with out entry management.
Equally, we will take a look at our resolution by working queries within the public index with out entry management:
- On the Amazon Kendra console, navigate to the Alfresco-Public index and select Search listed content material.
- Run a search question.
As a result of this instance Alfresco public website has not been arrange with any entry management, we don’t use an entry token.
Clear up
To keep away from incurring future prices, clear up the sources you created as a part of this resolution. Delete newly added Alfresco knowledge sources inside the indexes. In the event you created new Amazon Kendra indexes whereas testing this resolution, delete them as nicely.
Conclusion
With the brand new Alfresco connector for Amazon Kendra, organizations can faucet into the repository of data saved of their account securely utilizing clever search powered by Amazon Kendra.
To find out about these prospects and extra, discuss with the Amazon Kendra Developer Guide. For extra info on how one can create, modify, or delete metadata and content material when ingesting your knowledge from Alfresco, discuss with Enriching your documents during ingestion and Enrich your content and metadata to enhance your search experience with custom document enrichment in Amazon Kendra.
In regards to the Authors
Arun Anand is a Senior Options Architect at Amazon Internet Providers primarily based in Houston space. He has 25+ years of expertise in designing and growing enterprise purposes. He works with companions in Power & Utilities phase offering architectural and finest observe suggestions for brand spanking new and present options.
Rajnish Shaw is a Senior Options Architect at Amazon Internet Providers, with a background as a Product Developer and Architect. Rajnish is captivated with serving to clients construct purposes on the cloud. Outdoors of labor Rajnish enjoys spending time with household and pals, and touring.
Yuanhua Wang is a software program engineer at AWS with greater than 15 years of expertise within the know-how trade. His pursuits are software program structure and construct instruments on cloud computing.