Quantcast
Channel: Infrastucture – Adobe Experience Manager Podcast
Viewing all 36 articles
Browse latest View live

A Year in the Desert of AEM: The Publish

$
0
0

05287_AEM_A-Year-in-the-Desert-of-AEM_Publish
You’re tired, thirsty and maybe a little cranky. I get it; Adobe Experience Manager can be a little frustrating and overwhelming. It might make you feel like you have been left in a desolate wasteland with nothing but your wits and a half empty—or half full—canteen. Never fear. With these articles you will have another tool in your belt to make it across the desert. Today we will cover the Publish server in AEM; and as mentioned in my introductory article, the Publish server is the workhorse of AEM. What do you mean you haven’t read my first two articles? The first article covers some basics about Adobe Experience Manager. The second article covers the Author server and its role within the AEM stack. I do sometimes reference back to those articles and It could be helpful to read them before moving on, but I won’t make you.

Just like the Author, the Publish also runs as a Java application; in fact, most of the information about setup and startup from the Author applies to the Publish as well (they even use the same jar). This means that the steps to get a Publish server up and running are going to feel like déjà vu. You will obviously want to use the installation run mode of publish, and the default port changes as well (Author :4502, Publish :4503). Other than those changes, you are looking at the same process. Let’s move on and learn more about the Publish instance.

The Desert Oasis – A Return to Familiar Ground
The Publish server is your return to an oasis during your journey as you’re well equipped for the task, having already set up the Author. The Publish server in the AEM stack is a render host and is the first way it differs from the Author instance. The Author pushes new content to the Publish server and it ingests that content. The Dispatcher sends requests to the Publish to be rendered and returned so it can cache and deliver them. The point I want to get across here is that the Publish is not starting these communications; instead it’s processing requests as they come in from the rest of the stack. Now, just because the Publish is zen regarding communication doesn’t mean it isn’t doing anything. In fact, most of the time the Publish will be the busiest of the two Java applications. It has to render all that code and process the logic you or the developers created and turn it into static content; i.e., HTML, CSS, JavaScript, etc. It also handles other tasks such as queries coming through the Dispatcher. The Publish is also where requests to flush the Dispatcher originate. These requests to the Dispatcher are normally only triggered after content has been ingested.

The second difference is that the Publish does not have the Authoring UI. As such, making changes directly to the Publish server should not be done. Normally I am not going to tell you how to manage your stack; instead I will give examples and suggestions. For this I am just going to be blunt and tell you—do not create anything directly on the Publish server. The Publish server has its purposes in the stack; manual configuration and changes is not one of them. Instead, changes or config settings should be created on Author and then replicated to Publish. Remembering this will help you avoid situations where content or configs are mysteriously overwritten. I am going to cover a little bit about configs and touch briefly on those run modes again a little later on.

Scaling for Your Journey’s Load
Having regrouped at the oasis, it’s time to determine if you will need to bring on other resources to help with the load. As the Publish server normally deals with a larger load than the Author, it’s nice that it’s also easier to scale than the Author. In the Introduction article there was a graphic that showed the standard Adobe Experience Manager architecture we employ here at Axis41. If you refer back to that image, you will see that we have two Publish servers, which could just as easily be changed to three, four, or more depending on the load and needs of your project. As long as you remember to create a Publish agent on your Author, you can keep more than one Publish up to date and ready to ingest and serve content. Besides scaling for load, the other reason we recommend running with two or more Publish/Dispatcher servers is for higher availability. This allows you to load balance across different geographic locations. It also allows you to perform maintenance on one of the Publish/Dispatcher lines without reducing availability for end users. This same technique could also be used to have a disaster recovery option available as a hot standby.

Note: Scaling Publish servers, or creating a hot standby, may include additional costs for licensing and you should check with your Sales Rep to make sure you are in compliance.

The Refetching Mirage
In my Author article, I talked a little bit about letting the Publish handle flush requests. There are really two types of flush agents available to you on the Publish—the flush and re-fetch agents. In simple terms, the flush agent sends a request to the Dispatcher to invalidate its cache, so on the next request the Dispatcher will get a new render from the Publish. The re-fetch agent sends a POST request to the Dispatcher that will delete the cache and then trigger an immediate re-cache event so that there is no extra wait time when that resource is next requested. The re-fetch agents sounds pretty great, however there is a cost associated with this agent. The major issue here is that the request made by the Dispatcher isn’t the same as a request made from a browser; this can cause irregularities when a page is cached. The re-fetch also triggers an immediate re-cache; if there are many resources being flushed, this can put a large amount of load on the Publish, creating a snowball effect and making the Publish slow down for all tasks.

The other agent you may see or want to use on Publish is the reverse replication agent. Remembering that the Publish is a render host, also means that it will not try to push content to Author, nor should it. This would appear to cause issues with user-generated content such as comments, reviews, etc., getting back to Author or the other Publish servers. AEM, therefore, has the option to enable reverse replication. The Publish stores user-generated content in a special outbox. The Author periodically requests from the Publish anything in the reverse replication outbox, and pulls that content in. The Author can then pass this content through an approval workflow, if needed, and replicate the user-generated content to the other Publish instances.

Publish Configs along Your Route
Now that we’ve talked about scaling your load and visited the refetching mirage, it’s time to cover what the last leg of your Publish journey will entail: configurations. We’ll begin by discussing configurations that normally you’ll only want configured on the Publish instance, or maybe you want them working differently on Publish. First, the way to make these configs only apply to Publish is by accessing those run modes we talked about in the Author article. Remember there are installation run modes, two of which are Author and Publish. This means you can use these run modes to identify which servers to apply the configs to. Adobe Experience Manager has a feature where if you place a directory named config. under the app/project directory, it will automagically apply any configs in that directory only if the run modes match. Armed with this knowledge, you can take it one step further and use multiple run modes, such as config.publish.production, which will only apply to a Publish server also using the production run mode.

So if you have settings that only apply to production, such as mail server settings for contact forms, then you can make sure that only applies in the right situations. Creating these config nodes can be a little bit of a challenge, but luckily for you there is a post right here on AEM Podcast that covers this process.

“Etc” Map Rules to Your Destination
The final configurations that you normally apply only to Publish are etc map rules, so named because of the default path by which they are found in AEM: /etc/map. Think of them as markers along your desert map.

These configurations are officially called Resource Resolver Mapping, and they allow you to rewrite hrefs on pages to strip out paths that you may not want exposed to an end user. A classic example is removing /content/sitename/en from links for SEO as well as security purposes. I am not really going to cover all the specifics of setting up etc map rules; instead, I will cover how to make etc map rules only apply in the same way that other configs do—by leveraging a sling OSGI configuration node inside of a run mode named config directory.
config

You will need to have a config node named “org.apache.sling.jcr.resource.internal.JcrResourceResolverFactoryImpl.sitename” created under config.publish.
config-node-name

Next you will need to add or modify a property on that JcrResourceResolver config, resource.resolver.map.location. Set it as a string, and then for the value enter a unique etc map path; I would suggest something like /etc/map.publish so as not to confuse it with anything else. What this does is tell the Resource Resolver that it should load its rules from the directory you defined.
property-on-JcrResourceResolver-config

You can then set up your etc map rules under that same directory and they will only apply to Publish servers. You can also take this one step further and create the JcrResourceResolver config under config.publish.production and create a map location of /etc/map.publish.production and have rules that apply only to a server that is both Publish and production.

Like the Author article, we really only covered the basics of the Publish server and its configurations. I hope this has shed some light into how the Publish server works and some of the customizations you may need or want to set up. Our next, and final, article will cover the DIspatcher; which for some may be the most frustrating of the servers to set up and configure. Make sure you subscribe and come back as we finish our journey through the desert of AEM, and finally make it out.


A Year in the Desert of AEM: The Dispatcher

$
0
0

05287_AEM_A-Year-in-the-Desert-of-AEM_Dispatcher

You know how sometimes, you get close to the end of your journey and it makes it feel like something short is something long. This will not be one of those cases as the last leg of this journey talks about the Dispatcher, and there is a lot to cover in a short space. For those of you who may have stumbled upon this article without reading the others in this series let me first say welcome. You are joining us at the tail end of our journey; a four part series about the Adobe Experience Manager infrastructure. To recap what you may have missed. The first article covers some AEM terms and basics, the second article covers the Author server, and the third article covers the Publish server. If you have no prior AEM knowledge it may be worthwhile to give those a look. It’s ok we will be right here when you get back. Civilization is near my friends. So sit down, strap in, and hold on, we’re taking our final ride.

What is the Dispatcher?
To start, the Dispatcher unlike the Author and Publish server is not a Java application. It is actually an httpd module, and it’s proprietary. The Dispatcher pulls double duty in the AEM stack, it serves as both a caching server, as well as something similar to a Web Application Firewall (WAF). When it comes down to it the Dispatcher is similar to other Apache run web servers, you can use Apache mods to change how traffic is handled and how the static content is served. The part that is special is the Dispatcher handler which eventually is passed the request and then it goes to work. The Dispatcher first checks the cache root, which should be the same as the DocumentRoot (more on this later). If the file being requested is not cached or has been marked as invalid, then the Dispatcher will connect to a Publish instance and pass along the request to be rendered. Once this is done the Dispatcher takes the rendered asset, saves it to the cache root, and then serves this content to the end user.

The high level of how the Dispatcher works is fairly simple to understand, however once you start digging into the configuration and customization of the Dispatcher it may seem a little overwhelming. Adobe has quite a bit of documentation on the Dispatcher, such as explanations and examples of the Dispatcher’s configuration. If you are unsure what something does, this documentation may shed some light on it. I am dropping this disclaimer here so I don’t have to state it multiple times below, the configurations I show you are how we at Axis41 do things, your situation or needs may vary. With that said hopefully this will get you on the right path to success. Like the other articles we are going to cover a base level configuration as well as some added tips.

Dispatcher Install
As the Dispatcher is a module there is some installation that is required, as a general rule you should try and use the latest version of the Dispatcher. The Dispatcher comes packaged as a archive and can be found for download here. As we use a linux derivative of RHEL I am going to use that as my example, obviously some of the same information will apply even if you are running a different platform. The way the Dispatcher is packaged you can drop the tar.gz file into the http directory and extract it from that location. This will add Dispatcher specific files to your conf directory and a few files at the httpd directory level. You can then move the dispatcher.so into the modules directory. For ease of use I would then recommend creating a symlink between the versioned .so and mod_dispatcher.so. Once this is done you can then look at the files in the conf directory. One of those files httpd.conf.disp2 is an example httpd.conf file that contains the Dispatcher IfModule configuration.

You have a few options here, the first option is my strong recommendation, as it can save you a lot of frustration later on. I will also mention a couple other ways this could be handled, but seriously just chose number one.

  1. Create a custom conf file with just the Dispatcher LoadModule and IfModule inside and then have Apache include your custom conf file. This is fairly easy to do, and has the added benefit that you don’t need to go mucking about in your httpd.conf file.
  2. Copy out the LoadModule and IfModule lines from the above mentioned httpd.conf.disp2 example file and move those lines into your existing httpd.conf file.
  3. Replace your existing httpd.conf with the example file. If you have a vanilla Apache installation is the only time I would even consider this, and even then I would still chose option one.

Once you have decided how you are going to implement the Dispatcher configurations you can start customizing. To start you off I would highly recommend setting DisspatcherUseProcessedURL to “On”, this allows you to modify the request such as with mod_rewrite and then have the Dispatcher use the updated request.

Dispatcher.any
If you thought installing the Dispatcher and setting up the IfModule configuration was all that is needed, then I have a surprise for you. There is another more comprehensive configuration file called dispatcher.any, and no “any” is not a typo. This file is mainly responsible for how the Dispatcher will behave. Inside this file there are sections that are each responsible for a different part of this behavior. Again in the interest of time I am using the default dispatcher.any file which is covered in Adobe’s documentation. What I am going to cover is some of the changes you might want to make as well as some of the configurations to pay specific attention to. The /website section is normally where you will start to configure things, you can even setup multiple websites each using the /virtualhosts section contained within to tie specific domains to specific Dispatcher configurations, just be aware that once you start down that road things can become tricky very quickly. We are going to stick with using one website using a wildcard for all domains. The /clientheaders section by default uses a wildcard and I would recommend specifying the headers you expect the Dispatcher to see.

/renders
This section sets up your Publish backend that the Dispatcher will connect to. The Dispatcher does have the ability to set up connections to more than one Publish backend, which will allow the Dispatcher to load balance between the Publish servers. If you are only using one Dispatcher this may be the way to go, however if you use two Dispatchers and two Publish servers I would not recommend setting connections to both Publish servers and here is why. In theory having the cross talking between two Publish and two Dispatchers seems like it would be a good thing. In practice you can run into an issue where the Publish server tells the Dispatcher to invalidate a page, then that Dispatcher, because of load balancing, requests the page from another Publish server which has not yet finished ingesting the content from Author. This then results in either out of date content, or content that is cached with errors on page. In essence you can end up with bad cache on one Dispatcher, and then to further complicate the matter because of load balancing it can be difficult to diagnose which Publish served up the bad content. This is why we recommend using a one to one Publish to Dispatcher configuration.

/filter
This section is where the WAF behavior comes into play. It is fairly straightforward if you look at the rules in the default file. The concept is to deny everything first and then whitelist only the requests that make sense for your project. This section is also setup so that the last rule wins. This means if you place broad rules near the bottom, those rules may expose or open up paths that you previously meant to restrict. These filters can be expanded to address each part of the http request; allowing you to move away from glob and wildcards to a more secure multi filter rule. For example using { /type “allow” /method “GET” /url “/content*” } tightens security when whitelisting the content path by only allowing GET requests to those paths. You can also do a similar filter to only allow POST to a specific servlet { /type “allow” /method “GET” /url “/bin/customservlet” }. If you are using a version earlier than 4.1.9 these expanded filters are not available and I would strongly recommend upgrading to a more recent version of the dispatcher.

Caching
The /cache section and caching on the Dispatcher in general really deserve to be singled out as they are one of the major functions of the dispatcher. Let’s start by talking about caching in general and then we will talk about the /cache section of the dispatcher.any file. First it’s important to point out that anything that is not cached or not able to be cached is sent back to the Publish server each time it is requested. As a general rule you want the Dispatcher to cache as much as possible (markup as well as content). This not only removes excess load from the Publish servers, it also greatly speeds up the delivery of content to the end user. The Dispatcher by default will try and cache everything with a few important exceptions.

  1. Missing extensions – If the requested path is missing an extension such as .html
  2. Method Type – If the method is not a GET request
  3. Errors – If the http response from the Publish server contains an error code, it will not be cached
  4. Http Header – If the response.setHeader (“Dispatcher”, “no-cache”); is used
  5. Authorized Headers – If the request contains authorized headers
  6. Query strings – If the request contains a query string
  7. Rules – If the request does not match any /rules defined under the /cache section of the dispatcher.any file

With the above list a few of these can be modified based on settings within the /cache section, which I will cover now. I also cover a couple other important properties to pay attention to in this section.

/allowAuthorized
This property allows the Dispatcher to cache content even if there is authorization being used in the headers, most of the time you would not want this behavior as you would not want content that requires authentication to be served to someone who is not authorized. If you do intend on using authorized headers but you still would like the benefits of cached content there is the option of using /auth_checker which may not be a configuration that was included with the default file.

/ignoreUrlParams
This property can allow you to cache query strings. It basically allows the Dispatcher to treat the query string as though it were a unique path to be cached. Even though this option is available to you, I would recommend trying to avoid query strings as much as possible, and instead use a selector such as page.string.html this still allows you to modify what is cached when a different string is provided as a selector. If you end up just using a query string for something like analytics tracking, which is fairly common, you can instead use something like mod_rewrite to leave the query in the URL but drop the query with a pass through to the Dispatcher.

/rules section
By default this section is set to cache everything with a wildcard, if for some reason there is a specific resource you do not want to allow to be cached you can create a rule under this section that denies that behavior.

/docroot
This property sets where the dispatcher will cache its assets on disk. This should end up being set to the same path as the Apache DocumentRoot. This sets Apache to treat the cache created by the Dispatcher as normal static assets of a website.

/statfileslevel
This property sets how deep in the directory structure the .stat files should be created, 0 being the /docroot cache path

/serveStaleOnError
This allows the Dispatcher to serve files from its cache even if they have been invalidated in the event that the Dispatcher is not able to reach the Publish backend.

Flushing
If you remember from the other articles we have talked a little bit about flushing and the different types of flushing. As the following configurations go hand in hand, I have decided to include them in this section. We will start out by talking about the easier concept which is /allowedClients this configuration is near the end of the Dispatcher.any file and it controls who, or what hosts/IPs, are allow to send flush requests to the Dispatcher. By default this section comes commented out which if left unchanged would allow anyone to send a flush request and clear your Dispatcher’s cache, which could be used as part of a denial of service attack. I strongly recommend you set this section to deny all and then only allow the hosts or IPs you trust, such as the Publish and Author servers.

Now we are going to talk about .stat files. In my opinion this method of cache invalidation is not very efficient, and it can be a little difficult to wrap your head around. So I will try to keep it as simple as I can. First off I mentioned the /statfileslevel just above, which determines how deep the .stat files are created. The .stat file itself is a zero byte file that the Dispatcher uses to help it determine if cached files are invalid. To do this when a file is requested the Dispatcher checks the last modified date of the .stat file as well as the cached file; if the .stat file is newer than the cached file the Dispatcher knows the cached file is stale. The Dispatcher will then check the /invalidate section, and anything stale that also matches a pattern in that section is then considered invalid by the Dispatcher, and requested again from the Publish. I will use an example to help explain what I mean.
statsfile1

Let’s say you have a site under the path /content/mysite/en and a content author updates the contact.html page located at /content/mysite/en/about-us/contact.html. When that page is activated the Dispatcher receives an invalidation for that same path located in the docroot, this file is then automatically considered invalid. Next for every level the Dispatcher traverses to reach the contact.html page it would touch a .stat file up to the number you define under /statfileslevel. So if you have that property set to 4, remembering that 0 is the root, then it would touch the following .stat files:
/.stat
/content/.stat
/content/mysite/.stat
/content/mysite/en/.stat
/content/mysite/en/about-us/.stat

Looking at the default /invalidate section rules we see { /glob “*.html” /type “allow” }. This means any .html file is allowed to be flushed, so any .html file located in the same directory where a .stat file was touched is now considered invalid by the Dispatcher. In our example you can see this would be the .html pages under /content/mysite/en/ that are mainly affected. Now if this only affected those files that would be one thing, however there are more files which will also be considered invalid. If the Dispatcher doesn’t have a .stat file at the same level as the page that was requested, it will traverse up the directory structure to find the nearest .stat file.
statsfile2

So, keeping with the example above if the next request was for /content/mysite/en/about-us/investors/how-to-invest.html the Dispatcher would see there is no .stat file under /content/mysite/en/about-us/investors/, as that would be at level 5. So it would then look for the nearest .stat file available, which in our case is (/content/mysite/en/about-us/.stat), and use that as its .stat file when determining if it should serve how-to-invest.html directly from cache or request it from the Publish again before serving. In our example this would mean the Dispatcher now also considers how-to-invest.html invalid. This would be the same for any other document that lives in cache under the /content/mysite/en/about directory. No matter how deep they are, they will always refer back to /content/mysite/en/about-us/.stat to see if they are invalid or not.
statsfile3

I should also mention the flip side, if you had a page /content/mysite/en/blog/post1.html, for example, this would not be invalid as its nearest stat file would be /content/mysite/en/blog/.stat which was not touched by updating /content/mysite/en/about-us/contact.html. You should put serious thought into your configuration when it comes to cache invalidation. Depending on your site structure, /statfileslevel, and /invalidate rules you could end up wiping out large sections of your cache by accident with a simple page update.

Apache Config
When it comes to the Apache configuration, I am not really going to cover much in the way of Apache itself. Suffice it to say that this is an Apache server that runs a custom Dispatcher handler for serving content. Like I mentioned above this means that most of the configurations you might be used to with Apache will still apply. What I am going to cover here is some configurations you can use to make the Dispatcher work a little more smoothly.

First off we use a vhost configuration block and override the Apache DocumentRoot property as well as set the Apache handler to allow the dispatcher module to take over for serving content.

DocumentRoot /opt/aem/dispatcher/docroot
SetHandler dispatcher-handler

When we covered the Publish server we talked about setting up etc/map rules, now we will talk about the Dispatcher side of that coin. For this we would be using mod_rewrite on the Dispatcher in combination with etc map rules to strip the /content/mysite/en path from URLs. To do this you could use something like I have listed below, added to that same vhost configuration I talked about above. This will add the content path back onto the request behind the scenes before the Dispatcher processes the request.

RewriteRule    ^/$                               /content/mysite/en.html      [PT,L]
RewriteRule    ^/index.html$               /                                          [R=301,L]

RewriteCond   %{REQUEST_URI}    !^/content
RewriteCond   %{REQUEST_URI}    !^/etc
RewriteCond   %{REQUEST_URI}    !^/bin
RewriteCond   %{REQUEST_URI}    !^/lib
RewriteCond   %{REQUEST_URI}    !^/apps
RewriteCond   %{REQUEST_URI}    !^/mysite
RewriteCond   %{REQUEST_URI}    !^/en
RewriteCond   %{REQUEST_URI}    !^/dam
RewriteCond   %{REQUEST_URI}    !^/assets
RewriteRule    ^/(.+)$                         /content/mysite/en/$1         [PT,L]

The other mod_rewrite you can do in the vhost file, that might make your life easier, is mapping your DAM paths. The above covers content paths, but anything under the DAM might have a path like /content/dam/mysite/. My suggestion is to setup etc/mapping to have this path changed to something like /assets/. Oh, so you noticed that I excluded /assets in my block above. Well that was intended, as now you can setup a rewrite rule to handle DAM assets.

RewriteRule    ^/assets/(.+)$                         /content/dam/mysite/$1         [PT,L]

In the immortal words of Porky Pig, “Th-th-th-that’s all folks!”. I know that this all can seem like an insurmountable task when you first start out, but hopefully these articles will speed you on your way. Now you have earned yourself a “I survived the desert of AEM” t-shirt, if such a thing existed. Make sure you take a look at some of the other articles available on this site, and as always if you have questions or comments feel free to email info@aempodcast.com. Thank you for taking this journey with me.

Do not rely on Online Compaction in AEM 6.x

$
0
0

05287_Do-not-rely-on-Online-Compaction-in-AEM-6x

Most of you should probably already be aware that there are some challenges in AEM 6.x with repository disk growth. While this has improved marginally in more recent releases, it’s absolutely vital to a healthy Adobe Experience Manager stack that offline compaction be performed regularly. For our Managed Services clients, this is a service our Systems Engineers take care of, including watching the growth rates and knowing what kind of schedule a given environment needs.

Old hat, right? Everyone knew about this? Great. However, there is one piece you MAY NOT know about that we feel it’s important to notify people of.

Within Adobe Experience Manager itself, there is a JMX agent that implies online compaction is available. In fact, you might find some people using this in conjunction with the occasional offline compaction; however, Adobe has stated that this “online compaction” is unsupported:

Q: What is the supported way of performing revision cleanup?

A: Offline revision cleanup is the only supported way of performing revision cleanup*. Online revision cleanup is present in AEM 6.2 under restricted support.”

This phrase “restricted support” is defined in the AEM documentation to have the following meaning:
“To ensure our customers project success, Adobe provides full support within a restricted support program, which requires that specific conditions are met. R-level support requires a formal customer request and confirmation by Adobe. For more informations, please contact Adobe Customer Care.”

From experience, we have found that online compaction will not fully compact the store, and sometimes has been know to corrupt the data store. Although the JMX agent can still be found in Adobe Experience Manager, until Adobe has adjusted their position on it, we would recommend against using this on your production instances.

 

*emphasis mine

Maven Build Profiles and the Sling JCR Install Service

$
0
0

05287_Maven-Build-Profiles-and-the-Sling-JCR-Install-Service

tldr;
Doing both autoInstallPackage and autoInstallBundle at the same time can cause conflicts in the OSGi/Felix/System/Web Console as it installs the jars. JCR config and install folders will not work correctly when the conflict happens. Just do autoInstallPackage. If there is a conflict, there will be folders to mark progress under /system/sling/installer/jcr/pauseInstallation—delete them, and then things should work.

I have been seeing this issue on some local environments, and recently we had issues on a customer’s servers that are being built by bamboo. In our builds, we have zips and jars. The zips go into the JCR and the jars go into the OSGi. autoInstallPackage triggers the content-package-maven-plugin, which picks up the zips and throws them into ${crx.serviceUrl}/crx/packmgr/service.jsp. Similarly, autoInstallBundle triggers the maven-sling-plugin, which picks up the jars and throws them into ${crx.serviceUrl}/system/console/install.

So autoInstallPackage is to zip, JCR, and packmgr, as autoInstallBundle is to jar, OSGi, and system/console.

One of the zips in most projects, here at Axis41, is called bundle-install. It embeds the jars from the project bundles, puts them in a zip package, which then gets installed to the JCR at /apps/[project]/install. The package should also include the OSGi configs in the /apps/site/config.[runmode] folders. The Sling JCR Install Service listens to those two folders, takes the jars and configs, and installs them into the OSGi container. We use this strategy because we have found that having the entire project build to zips to feed into the packmgr gives us more consistency and agility when moving code around.

If the jars in the bundle-install zip try to install at the same time that ${crx.serviceUrl}/system/console/install is catching the jars, there will be a conflict and the installation will enter a paused state, marked by nodes under /system/sling/installer/jcr/pauseInstallation. In this paused state, the Sling JCR Install Service stops listening to the config and install folders. The runmode dependent configs you build in the project will not get applied to the OSGi services, and new jars trying to get installed through the install folder will do nothing. To return the JCR Install Service to normal operation, delete the nodes under /system/sling/installer/jcr/pauseInstallation.

To prevent this from happening in the first place, always install bundles via the bundle-install content package by using the autoInstallPackage profile. Make sure your automated builds are doing that too. While you’re working within a single bundle, autoInstallBundle is meant to provide a way to more rapid deploy/test the bundle, but it shouldn’t be used in combination with autoInstallPackage. Using both profiles at the same time to ensure the bundles install is usually a result of improperly configured bundle-install pom, which needs both a dependency and an embed block for each of your bundles. If you rely on manual code deployments through zip files and the Package Manager then the process of deployment should not have any issues.

Why are you in the Publish server on Prod?

$
0
0

05287_Why-are-you-in-the-Publish-server-on-Prod

I wrote a terse article last year about the fact that no one should ever need to be in CRXDE in Production. And they shouldn’t be! I kinda skated over another fact related to this that I want to address. You shouldn’t be on your Publish server, especially not in Production.

As Tyler mentioned in his article series about the entire Adobe Experience Manager infrastructure, “the Publish server is the workhorse”. It shares a similar setup to the Author server, as it also runs as a Java application. Most of the information about setup and startup that were done to the Author server also applies to the Publish server (including using the same jar file). You just use the installation run mode of “publish” and the default port changes as well.

The difference is that Publish is responsible for rendering all the content that is meant for consumption by the front-end user. As Tyler said, “it is a render host”. The nature of the Publish server is to handle serving up content and markup for caching to the Dispatcher. It takes all that and generates static data that is delivered to the browser. The Author server is where you input content using the tools, components, and templates that developers build. But until they hit that Publish (activate) button it is just staged content. Its job is to communicate between the Author and the Dispatcher, not to do actual authoring or configuration.

One of the reasons you shouldn’t be on the Publish server is server consistency. Most likely you will have more than one Publish server. The concern is that if someone were to go to one of the Publish servers to make an adjustment to the code, content, or configuration, they might forget to make that adjustment to the remaining servers as well. If you make whatever change is needed on Author and then replicate it, then all the Publish servers stay consistent. It’s not tenable for someone to have to hit each publish instance to make whatever change they need. You are ignoring the way that AEM was designed in the first place.

Another reason to avoid this is from a scaling perspective. Having all the logic, configurations, code, and content live on the Author server ensures that, when you scale up, everything gets replicated out to the Publish server. So as long as you have your replication properly set up you won’t have to worry about making those changes to a new Publish server, you can just replicate it from Author again. Without that, you have no guarantee.

Lastly, I know it sounds pessimistic, but if you are making changes directly on a server, rather than in a repo, there is a high probability that you are going to forget to get it into your repo. Then the next time you deploy your code it will get overridden and you will be left wondering why something is broken that was working before—all because you wanted to make a “quick fix”.

Someone might whine and say, “but something is wrong and this will be faster, just let me make this small change”. I would argue that your real problem is that something is not configured correctly or you have a blocked replication queue and that your real problem is how Author is communicating with Publish. Getting on the Publish server only resolves a symptom and isn’t the cure. Don’t make your implementation of Adobe Experience Manager dependent on people needing to access the Publish server. Don’t do it!

AEM Spark: Install/Stop/Start CRXDE Lite in AEM 6

$
0
0

05287_AEM-Spark_Install_Stop_Start-CRXDE-Lite-in-AEM-6

Should you be running CRXDE Lite or not? I think we all agree that the IDEAL situation is to have this disabled. However, there are times where you just really feel like you need to debug that one problem by looking something up in the CRXDE Lite. Although Axis41’s Managed Services team never debugs in production, we will occasionally download an Adobe Experience Manager instance locally to do some debugging. In such cases, we figure why not turn CRXDE Lite back on? It will likely save the Systems Engineer some time, and any change has to be deployed through the immutable infrastructure process anyway.

When looking at your error.log, if you try to load the CRXDE Lite while it is disabled, you will see an error message that looks something like this:
org.apache.sling.engine.impl.SlingRequestProcessorImpl service: Resource /crx/server/crx.default/jcr:root/.1.json not found

If this is a new instance 6.1 or later, and it was initialized with the nosamplecontent runmode, you must first create the missing SlingDavExServlet component:
curl -u admin:admin -F “jcr:primaryType=sling:OsgiConfig” -F “alias=/crx/server” -F “dav.create-absolute-uri=true” -F “dav.create-absolute-uri@TypeHint=Boolean” http://localhost:4502/apps/system/config/org.apache.sling.jcr.davex.impl.servlets.SlingDavExServlet

Now we may need to start the bundle. In any 6.x instance with the SlingDavExServlet component created, to START CRXDE Lite, use:
curl –user “admin:admin” -X POST ‘http://localhost:4502/system/console/bundles/org.apache.sling.jcr.davex’ –data ‘action=start’

We’re looking here for a JSON response that shows the bundle as started:
{“fragment”:false,”stateRaw”:32}

On the off chance that someone started this in your production instance, it’s just as easy to stop it. In any 6.x instance with the SlingDavExServlet component created, to STOP CRXDE Lite, use:
curl –user “admin:admin” -X POST ‘http://localhost:4503/system/console/bundles/org.apache.sling.jcr.davex’ –data ‘action=stop’

And confirm that the JSON response shows the bundle as stopped:
{“fragment”:false,”stateRaw”:4}

If you have a better solution to this problem that you’d like to share, contact us at info@aempodcast.com

This Week in AEM… Akamai and AEM

$
0
0

05287_This-Week-in-AEM_Akamai-and-AEM

We have had customers ask in the past if the Dispatcher is the same as a CDN. The Answer: no. The Dispatcher is not the same thing as a CDN. Yes, you can use a CDN with your AEM infrastructure. The problem becomes the scenario behind cache invalidation and TTL. It requires some fine tuning, in addition, the understanding that your content cache likely won’t be immediately flushed when you author and publish something in AEM. The team at CodeBay wrote up a good explanation of Akamai, a CDN, and how it can be setup with AEM. If you or your customer is looking to use a CDN then this would be a good introduction.

This Week in AEM… Setup an author cluster with MongoDB

$
0
0

If you are going to need to have multiple Author instances setup for your Adobe Experience Manager infrastructure (AEM 6 and higher), then you are going to need to cluster them with MongoDB. This can be a difficult process. Javier Reyes Alonso and Marco Pasini from CodeBay Innovations have put together some helpful instructions about how to cluster your Author instances with MongoDB. This isn’t a complete step by step process but some general guidelines. Their article also contains some helpful links that can give some deeper intel into certain areas of this process. Go check it out.


Online Compaction for AEM 6.3

$
0
0

I recently was reading a thread on the AEM Tech slack board about online garbage compaction (also known as online revision cleanup) from user “edgar.nielsen” where he was asking about online compaction with AEM 6.3. In essence, he said that online compaction works now but he was not seeing much in the way of measurable compaction. He was wondering if anyone had much experience with it. “I tested it during the beta phase several times and it didn’t seem to ever compact very much. I would basically spin up an instance, add assets then delete them, then do datastore GC and online compactions once a day for 4-5 days, the space recovered was quite small. Once I stopped the instance and did an offline compaction, the space recovered was quite large like one would expect. Haven’t had time to test with the released 6.3 to see if the behaviour has changed.”

I was pretty interested in this because online compaction was one of the improvements that the Adobe team promised had returned for AEM 6.3. For reference, we recommended (based on things Adobe had stated themselves) that you turn it off for version 6.2 and lower because it didn’t work so well (see our article http://aempodcast.com/2016/infrastucture/not-rely-online-compaction-aem-6-x/).

It seems that online compaction will never be able to do as much as offline compaction; as the documentation states: “The offline mode can usually reclaim more space because the online mode needs to account for AEM’s working set which retains additional segments from being collected.” Despite Adobe’s documentation stating: “Offline Revision cleanup should be used only on a exceptional basis – for example, before migrating to the new storage format or if you are requested by Adobe Customer Care to do so,” Axis41 continues to consider regular Offline Compaction an important part of the long-term health of an AEM deployment. As this feature matures in upcoming releases, we will continue to follow its effectiveness.

According to the documentation from Adobe: “In AEM 6.3 Online Revision Cleanup is turned on by default and it is the recommended way of performing a revision cleanup.” From their FAQ:
Q: How frequently should Online Revision Cleanup be executed?
A: Once per day. This is the default configuration in the Operations Dashboard.

You will need to configure it so that it runs when you want it to. Thankfully, there are notes as to how to configure the maintenance window. I think more testing also needs to be done, so let us know what you see in your environments.

AEM 6.3 Cumulative Fix Pack 2 Released

$
0
0

One of our favorite improvements in the 6.x line is less about AEM itself and more about how Adobe releases fixes. During the lifecycle of AEM 6.2, Adobe began to release “Cumulative Fix Packs,” which are aggregated content packages containing multiple bug fixes and even occasionally Feature Packs.

On August 8th, Cumulative Fix Pack 6.3.0.2 – a Cumulative Fix Pack for AEM 6.3 – was released (referred to as CFP 2). In addition to closing some issues deep in the product internals (such as some unclosed resource resolvers acquired by the product itself), it upgrades the Jackrabbit Oak version to 1.6.2, which provides several bug fixes for the repository maintenance tools.

As always, you should validate your code under this fix pack before applying it to your production servers.

This Week in AEM… How to Set Up a Dispatcher on macOS

$
0
0

Not long ago, Joey and I completed an Ask the AEM Community Experts presentation that explained how to Develop with the Dispatcher in Mind for AEM, and also why it is so important. You can watch the session here: http://bit.ly/ATACE0717. We genuinely think that Developers need to include a Dispatcher in their development stack when doing any development. It’s just good practice. After that presentation, Yuri Simione (from the Adobe AEM and Marketing Cloud Group on Linkedin) posted a link to one of the Adobe HelpX articles: Set up AEM Dispatcher on macOS. “This is an accelerated walk through of setting up AEM Dispatcher on macOS, using the macOS installation of Apache HTTPD Web Server.” The page includes several steps to set things up as well as a video to guide you through it. Check it out. And start using a Dispatcher in your development stack.

It’s Publish, not Publisher

$
0
0

This article is dumb and won’t improve your AEM development in the slightest. It’s a personal gripe. For me, it ranks right up there with people who mispronounce arthor, liberry, foilage, and nucular. I think that it’s just lazy and shows a lack of intelligence. What’s more is customers or product owners hear the incorrect name and just proliferate it around in their communication. It’s the Publish server, not Publisher. I know it’s annoying to say “Publish Server” all the time, instead of shortening to “Publisher”. Including the extra word server can get old.

To clarify…
A Publish server is part of the Adobe Experience Manager infrastructure. And from the Adobe doc page “Publish: An AEM instance that serves the published content to the public.”

A Publisher is someone who publishes content from the Author environment. It is an actual person.

Certainly we all make mistakes. After all it is an easy slip of the tongue to make. I’ve just seen too many people using the improper vernacular intentionally, where it is clearly not a mistake. If you have managed to learn how to manipulate AEM and do so many things with this incredibly complex content management system, then you can use the right terminology. I know you can. Hopefully, my little rant will help some of you break a bad habit.

This Week in AEM… Creating an Akamai Replication Agent

$
0
0

I shared a post back in March 2017 about how to integrate Adobe Experience Manager and Akamai. Joey and I even did a podcast about CDNs—in general, the following week. Regardless I recognize that you may need more help in getting your Replication Agents setup to actually delete the cache all the way out to the Akamai CDN layer. That is where this week’s article comes in. Nate Yolles wrote about this with his article: “Creating a custom Akamai replication agent in AEM”. In it, he takes the time to walk you through the process including; code snippets, screenshots, and a link to his GitHub account where you can access the code for yourself. Please keep in mind that he wrote this in 2016, so some of the terminology has changed, as well as the Products he references. After all, AEM 6.1 was the released version of AEM at that time. But it is still relevant and useful.

This Week in AEM… Andrew Khoury’s Dispatcher Webinar

$
0
0

Andrew Khoury is a tiny god in Joey’s world; specifically when it comes to the Dispatcher for Adobe Experience Manager. Being a self-proclaimed smart man, Joey was greatly surprised at the things that he learned by watching the Webinar about how to optimize the Dispatcher cache, presented by Andrew, back in 2013. Since that presentation, we have done things differently in our AEM implementations and environments, for the better. The principles have not changed that much since this was given, so don’t be concerned about the timing. Learn these principles and start working accordingly. Your project, and customer will thank you for it by having a solid infrastructure.

Multi Domain Dispatcher Configuration

$
0
0

Currently, most companies have multiple domains; they want to serve these domain requests from one single AEM instance, because of the obvious cost and management savings.
This article describes configuring multiple domains (/content/geometrixx-outdoors and /content/geometrixx-media) and their caching mechanisms in AEM.

The below configuration changes are validated in:

  • Linux
  • AEM 6.2 (no service packs or hot fixes)
  • Apache 2.4 (Make sure you use the port number: 80. You can use any port you want, but you have to change the below config files)
  • Dispatcher versions: 4.2.2

Set up the AEM Instances:
Install and configure the AEM author and publish instances. Please see the below link on how to install the AEM on your local environment:
https://docs.adobe.com/docs/en/aem/6-2/deploy.html#Default Local Install

Dispatcher Setup:
The process to setup a Dispatcher for your local environment is considerably easier than you think. If you need help, see the below documents:

Caching Location:
Let’s set up a separate caching locations for both the domains.
Create the below folders under /var/www/html:
geometrixx-outdoors
geometrixx-media
Make sure to give the full permissions to these folders so dispatcher can write the files to these folders.
sudo chmod -R 0644 /var/www/html/geometrixx-outdoors
sudo chmod -R 0644 /var/www/html/geometrixx-media

The idea is to place the geometrixx-outdoors website cache under: /var/www/html/ geometrixx-outdoors folder and geometrixx-media cache under: /var/www/html/ geometrixx-media.

Virtual Hosts:
When you have multiple domains, you need to create a separate dispatcher urls for each domain.
In this article, I am using the below url:
www.geometrix-outdoors.com to hit the geometrixx outdoors pages.
www.geometrix-media.com to hit the geometrixx media pages.
Default dispatcher setup will have only one virtual host. Let’s say you have two domains, then you need to create two separate virtual host files.
1. Create geometrixx-outdoors.conf file under: /etc/apache2/sites-enabled
2. Open the above file and paste the below content:


 
DocumentRoot "/var/www/html/geometrixx-outdoors"
 
ServerName www.geometrix-outdoors.com
ServerAlias geometrix-outdoors.com
 

	
        ModMimeUsePathInfo On
    	SetHandler dispatcher-handler
	
 
	Options FollowSymLinks
	AllowOverride None

 
IncludeOptional /etc/apache2/conf/geometrixx-outdoors-redirects.conf
 

geometrixx-outdoors-redirects.conf can have the url shortening configs, rewrites and redirects configurations.

You can include them in this virtual host also. You don’t need a separate file. However, it comes in handy in the maintenance if you fork this out.
3. Create geometrixx-media.conf file under: /etc/apache2/sites-enabled
4. Open the above file and paste the below content:


 
DocumentRoot "/var/www/html/geometrixx-media"
 
ServerName www.geometrix-media.com
ServerAlias geometrix-media.com
 

	
        ModMimeUsePathInfo On
    	SetHandler dispatcher-handler
	
 
	Options FollowSymLinks
	AllowOverride None

 
IncludeOptional /etc/apache2/conf/geometrixx-media-redirects.conf
 

geometrixx-media-redirects.conf can have the url shortening configs, rewrites and redirects configurations.
Make sure you are loading/including these two conf files in apache2.conf.
IncludeOptional /etc/apache2/ sites-enabled /geometrixx-outdoors.conf
IncludeOptional /etc/apache2/ sites-enabled /geometrixx-media.conf

Farm files setup:
It is important to set up the separate farm files for easy maintenance. You can have one single file for multiple domains. However, you can achieve the following goals if you have a separate farm file for each domain:
1. Restricting a specific path(s) in one domain while allowing access in the other domain
2. Enable caching for specific types of files in one domain, while denying caching in the other domain
3. Maintaining separate. stat file depth (/statfileslevel)
4. Enable ttl on one domain and disable on the other domain.
5. Maintain separate headers for each domain
6. Maintain separate vanity url file for each domain
I’ve listed only a few use cases to demonstrate the use of a separate farm file.
Create a separate farm file for geometrixx-outdoors and make sure you configure the correct virtual host. See below.

/virtualhosts
  	{
  	# Entries will be compared against the "Host" request header
  	# and an optional request URL prefix.
  	" www.geometrix-media.com"
  	}

Create a separate farm file for geometrixx-outdoors and make sure you configure the correct virtual host. See below.

/virtualhosts
  	{
  	# Entries will be compared against the "Host" request header
  	# and an optional request URL prefix.
  	" www.geometrix-outdoors.com"
  	}

Make sure to include these two farm files in your dispatcher.conf file.

Disabling the cross-domain access:
Users can access geometrixx-media pages from www.geometrix-outdoors.com and vice versa. To avoid this, you should throw an error page when user are trying to access another domain page. You can use RewriteCond to achieve this.

Add the below conditions to geometrixx-outdoors-redirects.conf file.

RewriteCond %{REQUEST_URI} ^/content/geometrixx-media
RewriteCond %{REQUEST_URI} ^/content/dam/geometrixx-media
RewriteRule .* [R=404,L,NC]

Add the below conditions to geometrixx-media-redirects.conf file.

RewriteCond %{REQUEST_URI} ^/content/geometrixx-outdoors
RewriteCond %{REQUEST_URI} ^/content/dam/geometrixx-outdoors
RewriteRule .* [R=404,L,NC]

Setting the dispatcher flush agents:
You should set up a separate flush agent for each domain. Below is the dispatcher flush agent configuration for geometrixx-outdoors.
1. Transport tab settings:

2. Make sure this agent has the below headers.

3. Trigger this agent whenever it receives the replication events:

Similarly, create a new dispatcher flush agent for geometrixx-media and follow steps 1-3. Make sure you change the URI.

Flushing Cache:
When authors replicate the geometrixx-outdoors page then it should touch the corresponding .stat files to invalidate the cache. Similarly, when authors replicate the geometrixx-media pages, it should invalidate the geometrixx-media pages.

To achieve this, make sure you have the below configuration in your apache2.conf file:


	# Geometrixx-outdoors Domain
	SetEnvIfNoCase CQ-Path ".*/content/geometrixx-outdoors/.*" FLUSH_HOST=www. geometrixx-outdoors.com
	RequestHeader set Host %{FLUSH_HOST}e env=FLUSH_HOST
 
	SetEnvIfNoCase CQ-Path ".*/content/dam/ geometrixx-outdoors /.*" FLUSH_HOST=www. geometrixx-outdoors.com
	RequestHeader set Host %{FLUSH_HOST}e env=FLUSH_HOST
 
	# Geometrixx-media Domain
	SetEnvIfNoCase CQ-Path ".*/content/geometrixx-media/.*" FLUSH_HOST=www. geometrixx-media.com
	RequestHeader set Host %{FLUSH_HOST}e env=FLUSH_HOST
 
	SetEnvIfNoCase CQ-Path ".*/content/dam/geometrixx-media/.*" FLUSH_HOST=www. geometrixx-media.com
	RequestHeader set Host %{FLUSH_HOST}e env=FLUSH_HOST

In the above config, its reading the header values set in the dispatcher flush agents and determining the host.

Modify the host file:
You need to modify your local host file to test these changes. You need to add the below entries to your host file:
127.0.0.1 www.geometrixx-outdoors.com
127.0.0.1 www.geometrixx-media.com

Restart the apache and test your changes.

References:
http://www.cognifide.com/our-blogs/cq/multidomain-cq-mappings-and-apache-configuration/
https://www.netcentric.biz/blog/aem-dispatcher-cache-invalidation-for-multiple-dispatcher-farms.html

ivo_eersels2Author bio
Singaiah Chintalapudi is a Senior Developer who has worked on a multitude of AEM implementations. He has been working with AEM since 2012 and is heavily involved in designing and developing numerous AEM projects. His interests include performance optimization, security, scalability, and third-party integrations with AEM.


Programmatic Cache Deletion

$
0
0

Every AEM infrastructure leverages the dispatcher for one or all of caching, security, and load balancing. As far as caching is concerned, many AEM customers rely on flush agents to keep things fresh. Flush agents are great as they recognize activation requests for page content and then send a flush request to their configured dispatcher but often times, these flush agents are not enough. For example, consider a site that leverages MSM extensively. Unless you write your own custom activation flow that replicates all live copies whenever their blueprint is activated, you’ll find that only the blueprint’s page gets refreshed. This is as expected from a technical standpoint but often doesn’t jive with the business’s thought process. Other times, the flush agent can just fail for whatever reason leaving your massive production content deployment 90% new hotness and 10% old stodginess. Because of these and other situations that could arise when messing around with cache, many customers will rely on flush agents for one-off activations but for production content deployments of any substance, will prefer to just blow it all away.

Recently I implemented a utility for a client such that they can clear a single or all dispatchers of their cache with a single click. To date they’d been using a shell script living directly on the dispatchers. During production deployments, they’d ssh into each dispatcher one by one and run the script. That script was basically rm -rf on that dispatcher’s docroot. So, to do the same in code, I threw together the following:

  1. Author-side TouchUI utility makes GET request to servlet on that same author instance, passing the relevant publish instance(s) as parameter(s).
  2. Servlet on above author instance makes GET request to servlet on each relevant publish instance requesting a cache deletion.
  3. Servlet on each publish instance makes POST request to relevant dispatchers to formally make the cache deletion request.
  4. Publish-side servlet responds with status code and message to author-side servlet.
  5. Author-side servlet checks the status code and message and builds a relevant response that then gets passed to the client.
  6. Client presents user with success or failure messaging
  7. ???
  8. Profit!

A few things to note about the above flow:

  • As is good practice, the dispatchers were configured to only allow cache invalidation/deletion requests from themselves or from their associated publish instance. As a result, I had to make the formal deletion request from publish and not from author.
  • Since I was already going to have to make the formal request from publish, that means I’d have to call publish from author in some way. I opted to go server-side first on author before calling over to publish as going author client-side to publish server-side comes with its own security issues (not least of which is the good ‘ol Javascript same-origin policy).

The actual deletion request, which is made via POST from my publish-side servlet, looks as follows:

HttpClient client = new HttpClient();
PostMethod post = new PostMethod(agent.getConfiguration().getTransportURI());

post.setRequestHeader("CQ-Action", "DELETE");
post.setRequestHeader("CQ-Handle", "/");
post.setRequestHeader("Content-Length", "0");
post.setRequestHeader("Content-Type", "application/octet-stream");

client.executeMethod(post);

In the above, the variable agent is an instance of Agent and I retrieve this agent via an ID that I passed over from author. That ID is the ID for the relevant flush agent that’s configured on this publish instance. I then use the transportURI of that flush agent to get the appropriate URL for the dispatcher for which cache needs deleted. As for the deletion itself, passing a CQ-Handle of just “/” means “blow away everything under the configured docroot” – which is exactly what we want.

Viewing all 36 articles
Browse latest View live




Latest Images