Summary
How To . . .

Unique Visitors
Counting Visits
Summary, Summary Plus, Summary SP Lite, and Summary SP
Incremental Processing
Passwords and Sub-reports
Physical Servers
Goals
Value
Leakage
Service on Windows
Customizing Reports
Searching Within Reports
Translating CGI Arguments to be More Readable

 

Unique Visitors

Many people ask 'How many people visited my web site?' Unfortunatly, you aren't going to be able to find out how many human people visited your site. Usually you can make a reasonable guess, but you can never know for sure. There are many factors, such as caching proxy servers, that consipre to prevent you from ever having exact numbers.

Summary provides two numbers that are a good starting point for approximating how many people visited your site. Unique Hosts and Visits are both related to the number of visitors. Unique Hosts is generaly lower than number of unique people and Visits is generally higher than the number of unique people.

There are many different factors that conspire to make it impossible to exactly track human visitors. For example, many ISPs send their users through a caching proxy server. A caching proxy server can return a copy of your page that it has cached without accessing your server at all. In this case you wouldn't have any clue that the visit even occurred.

AOL uses a proxy clustering system, which means that a single AOL visitor will produce hits on your site from a cluster of ten or more computers. That means that a single person can appear as ten unique hosts.

A search engine indexing robot might access one page at your site every hour. It would appear as a single unique host with twenty four different visits each day. Yet, there isn't any human person involved at all.

No log analysis program is going to be able to give you an exact number for human visitors. That information isn't in the log files. Because of the way the Internet works, there are just way too many confounding factors. There are many additional effects, like the three described above, that mask the true counts. Summary gives you a lot of power to explore the situation, and make estimates, but no log analyzer can give you exact numbers.

There are several things you can do to get a feeling for how many people are visiting your site. One thing to do is to take a look at the Visitors: Robots reports sorted by visits. Almost all of these visits are from web robots, not from human people. It's possible to get an occasional person on that list, and it is also possible for robots to evade detection. This is only useful as a guideline, or suggestion, of how much traffic is coming from robots.

Another place to look is Problems: Hijacking. If there is a large amount of graphics hijacking going on at your site, you will need to lower you estimates by the approximate number of hijacks.

Another thing you can check requires some knowledge of how you have designed your pages. If you have a graphic that only appears on your home page and only appears there once, you can compare the number of hits on the home page to the number of hits on the graphic by looking at the Paths: Images Loaded report. Robots are very unlikely to fetch the graphic, so the number of hits of the graphic may be closer to the number of human hits on the page than the number of hits reported on the page.

For most sites, most of the time, the number of Unique Hosts and the number of Visits will be similar when measured over periods of less than a week. Visits is almost always higher by 10% to 100%. At most sites, most of the time, the number of human visitors is between these two numbers, or at least near that range. There are many effects that can throw off the counts, both high and low. Generally these effects more or less cancel out, so Unique Hosts and Visits numbers are a good place to start when making estimates.

Counting Visits

Intuitively a "visit" is very obvious. A person sits down at their computer, types your URL, wanders around your site reading various pages, and then goes to do something else. Because of the details of how the web works it is very difficult to identify that sequence of events from the server's log file. No visit tracking system can ever tell you exactly how many times a person visited your site. Some visits may be served from caches without any of their hits reaching your server. Some visits will be by two or more people watching a single computer screen. Some robots simulate human behavior so well that they cannot be distinguished from people. No system can ever compensate for all of these effects.

Summary provides three different approaches to visit counting. Each one will give you a different number of visits. Depending on how you use visit counts none of these differences may matter. Visits are most useful for comparing the level of activity on your server between two different time periods. No matter which visit counting approach you use, an increase of, for example, ten percent in visit counts between January and February is very likely to correspond to an increase in human people viewing your site of very close to ten percent. It's important, however, that you always compare visit counts using the same system. Visit counts using host and agent names can be very different from visit counts using session IDs (cookies) . Comparing a visit count for January using host and agents to a visit count for February using session IDs is meaningless.

1  First Approach - Host and Agent Names

Summary counts a sequence of hits coming from a single host that all have the same agent name, but with no gap between requests of more than 30 minutes, to be a single visit. This fits well with the intuitive definition of a visit. The user is running a single browser on a single computer. Under normal circumstances it will appear in your log files that way and Summary will recognize the visit correctly.

2  Second Approach - Proxy Cluster Renaming

Sometimes a visit will go through a proxy cluster. The machines in the cluster work together, each requesting a different element for the current page. Each machine in the cluster will appear to Summary as a distinct host. Using the first approach Summary will count each machine in the culster as a seperate visitor.

Summary has a built-in database of the most commonly used proxy clusters. If "Combined proxy clusters into one host" is checked Summary will rename the host in the clusters it knows about so that all of the machines in a single cluster have the same name. Once that is done, the host and agent name rules from the previous approch are applied. This prevents a single visit from a proxy cluster from being counted as many different visits.

3  Third Approach - Session IDs (cookies)

Some servers put a session ID into a cookie. The visitor's browser will return that cookie with each subsequent request. Each visitor is given a different session ID. By examining session IDs in the log file it is possible to determine which requests constitute a visit. This approach avoids the proxy cluster problem entirely. However, it doesn't handle robots or visitors who refuse cookies.

All requests which are missing cookie values will be counted as a single large visit. Depending on how your server is setup the very first request might not yet have a session ID and therefore might not get counted as part of the visit. If requests without session IDs were tracked using the host and agent approach that would result in every visit getting counted twice, once for the first request that didn't have a session ID, and again for all the subsequent requests that do have a session ID.

While nominaly more accurate, visit tracking with cookies can present it's own difficulties. The first request in a visit, which isn't normally included in the visit, could have important information associated with it. For example, it is normally the first hit in a visit which has the referrer from the external site that lead the visitor to your site. Because of that some features such as goal tracking will be siginificantly less accurate when session IDs are used for visit tracking.

Visit tracking using session IDs is only available in Summary SP Lite and SP. It is enabled by checking "Use session IDs to determine visits" on the options configuration page. Most requests which don't have session IDs will be from robots. You can filter out requests which don't have a session ID by checking "Ignore requests with no session ID" on the filtering configuration page. If the session ID appears in your log file as part of the cookie value you need to tell Summary which token name identifies the cookie value. This can be configured with "Cookie tokens used as session ID" on the miscellaneous configuration page. "Number of idle minutes to end visits" on the time units configuration page still applies when you are using session IDs.

How to configure various servers - Cold Fusion, IIS and Apache

Cold Fusion uses the "CFTOKEN" and "CFID" cookie tags to identify a visitor.

Microsoft IIS ASP pages use "ASPSESSIONID" to identify a visitor.

Apache has a module called "mod_usertrack". When this module is enabled Apache uses as a cookie token name "Apache" by default. The token name can be reconfigured. If you reconfigure the token name you need to be careful to make sure that the names in Apache and the names in Summary match.

Summary vs. Summary Plus vs. Summary SP Lite vs. Summary SP

Summary can operate in four modes (or versions), Summary, Summary Plus, Summary SP Lite, and Summary SP. Which features are available depends on which registration code you are using. You get a registration code specific to a particular when you either registering your copy or by getting the appropriate 30 day free demo registration code. The 30 day free demo registration codes activate the full functionality of the corresponding version, for 30 days from when you get them.

A registered copy of Summary supports up to three sets of reports for virtual domains and downloading logs from a single remote directory.

A registered copy of Summary Plus provides all the functionality of Summary, plus:

  • Up to 50 sets of reports for virtual domains
  • Incremental log processing
  • Emailing of reports
  • Log downloading from multiple remote servers
  • Advanced user configurable filtering
  • Report disabling
  • Adding your logo or other information to the report heading.
  • Value of visit
  • Content groups
  • Local search engine reports
  • Overviews of all sub-reports
  • Extra user customizable reports
  • Pattern replace on request strings
  • Customizable main page
  • The ability to write the first few pages of each report to static HTML files
  • A 'modem speed' proxy server - experience your web site as viewed through a typical 56K modem connection
  • Spreadsheet and text formatted versions of the reports - export your statistics to your favorite spreadsheet for easy graph creation
  • Search feature - searches the raw log files
  • Subsets of the Requests Report showing only items on a user configurable list
  • Option to count visits from proxy clusters as a single visit
  • Several additional reports, such as Search Phrases by Entry Point, Requests by Auth User, and Crawl Dates by Search Indexer.

Summary Plus is designed for professional users, people with several different domains (or virtual servers), or people who want several different sub-sets of their web site reported on separately. Most people with a single web site will do just fine with Summary.

A registered copy of Summary SP Lite or SP provides all the functionality of Summary Plus, plus:

  • Up to 100 sets of reports for virtual domains in Summary SP Lite
  • Up to 1000 sets of reports for virtual domains in Summary SP
  • Custom HTML headers and footers, including removing the Summary.Net logo
  • Custom email configuration
  • Disable different reports in each sub-report
  • Filter different requests in each sub-report
  • Per sub-report CGI token filtering
  • Option to count visits using session IDs
  • Use readable names for CGI arguments
  • Custom Overviews
  • Free form custom HTML reports
  • Additional user configurable report filters

Summary SP Lite and SP are designed for people who host many different domains or that want to provide individually customized reports for their clients.

Incremental Processing

Normally, when incremental processing is turned off, Summary reads all of the log files in the Logs folder every time it processes the logs. When incremental processing is turned on Summary only reads log files that have been added or changed since the last time it processed the logs. This can dramatically speed up log processing since Summary will only have to process the new log entries. Information about the old log entries is stored in the report database.

Because the report database encodes many of the configuration settings, it is not possible to change these settings while incremental processing is enabled. To change the configuration you need to disable incremental processing which will delete the report database and require re-processing all of your logs. Incremental processing is only available in Summary Plus, SP Lite, and SP.

If you want to use incremental processing you should set up Summary and make sure you are happy with all of your configuration settings before enabling incremental processing. It's simplist to experiment with configuration settings using only a couple of days worth of log entries. Then, when you are satisfied that everything is configured correctly, you can add in the remainder of your log files. Next, check to make sure you are still happy with the results, and finally, enable incremental log processing.

Summary assumes that all log entries added between processing runs are dated after the most recent log entry it saw in the previous processing run. You can add older log files but visits which were in progress at the end of the older log files will get counted twice. To get everything to be counted correctly you should reset the database after adding all of your older log files. To reset the database, disable incremental processing and then re-enable it. That will cause Summary to reset the database the next time it processes logs.

When incremental processing is enabled many of the configuration settings are locked. You can still see what they are set to on the configuration page but you can't change them. All of the configuration settings except "Stop processing after (MM/DD/YY)" and "Number of days to include (0 means all)" apply to incremental processing. Those two are ignored.

Reporting on logs covering long periods of time can take large amounts of memory. When using incremental log processing you should check the amount of memory Summary is using occasionally to insure that it remains below the amount of memory physically installed in the machine. The settings on the memory configuration page can be used to keep the amount of memory Summary needs under control. "Expire hosts and failed requests after a month" is a particularly good way to save memory. See the FAQ entry for more information on saving memory.

When attempting to determine if it has seen a log file before, Summary will look at the file name, file size, modification date, and file contents. If a log file is exactly the same as the last time it was processed it will not be re-read. Summary can also detect that new log entries have been added to the end of an existing log file and it will only read the new log entries. Old log files do not even need to be present in the Logs folder if they have been read by a previous processing run. Summary will also detect most situations where log files have been renamed and handle them appropriately. This allows Summary to work with log rotation systems that rename old log files.

If the machine crashes or looses power Summary's report database may become corrupt. Summary automatically keeps backups of several different recent processing runs. If the report databse becomes corrupt you can reload one of the backups using the "Restore Backups" link on the Tools page. A link to the Tools page can be found near the bottom of the main configuration page.

If you are deleting old log files after they have been processed please keep in mind that a serious system crash or the need to reset the database because of a configuration change may cause Summary to lose all of your log data. Keep this in mind before deleting old log files.

Passwords and Sub-reports

There are two primary ways to have your clients access their sub-reports. You can either configure an "Identifier for use in report URLs" and then tell your client a URL that includes the report identifier or you can configure report name/password for each subreport and Summary will automatically take them to the subreport based on the name/password they enter.

Each sub-report has a number and you can access sub-reports by number, however, the number associated with a specific sub-report will change if you delete a lower numbered sub-report. Therefore, using the sub-report number to locate a specific sub-report is not recommended.

"Identifier for use in report URLs" can be used to assign a short, easily remembered, name to a sub-report that can be used in a URL to locate that subreport quickly. For example, if your copy of Summary is accessed with the URL "http://www.hostingcompany.com:9000/" and you have assigned an identifier of "fred" for the sub-report associated with Fred Company website, you can tell the Fred Company to use the URL "http://www.hostingcompany.com:9000/~fred/".

Alternatively, you can assign that sub-report a name and password. Then you can tell the Fred Company to go to the main URL for Summary, "http://www.hostingcompany.com:9000/" and they will be prompted for a name and password. Summary will search through all of the sub-reports for the first one that has the name and password that they entered and it will automatically display that sub-report.

Things can get a little confusing when only some of your sub-reports have name/password configured. Anyone going to the main URL, including yourself, will be prompted for a name and password even though there are some sub-reports that they could view without a password. In order to not be prompted for name and password, they need to know the URL for one of the sub-reports that is not password protected.

The configuration name/password is separate from everything above. The configuration name/password is required to access any of the configuration screens or to initiate a log processing run. Anyone using the configuration name/password also has access to all sub-reports and sees all reports even if they are disabled.

Physical Servers

The Physical Server report is designed to show information for each machine in a cluster of servers which are all serving the same domain or set of domains. There aren't any real standards for how to distinguish which physical machine a log file corresponds to. Summary normally gets the physical server name from the second level folder/directory name (within the logs folder/directory). You can also use this feature to report on something other than physical servers. If "Always use subfolder name as server name", on the Options configuration page, is checked Summary will always use the subfolder name instead of attempting to get the physical server name from a log entry.

Here's an example of how you might set up your logs folder:

Logs
   yourdomain.com
      machine1
         access_log
      machine2
         access_log

   mydomain.com
      machine1
         access_log
      machine2
         access_log

In this example yourdomain.com and mydomain.com are two different domains, each of which is being served by two physical servers, machine1 and machine2.

Goals

Decide what the most valuable outcome of a visitor coming to your site is. The goal of your website is to get the visitor to do this thing. For example, someone ordering your product. Summary can track which visits reach the goal. It will report the percentage of visits associated with each referring domain, search word and search phrase that reach the goal.

You need to select a URL which indicates that your goal/desired outcome has occurred. For example, a request to "/thanksforyourorder.html". Enter this request into "Request to count as Goal" on the Subreport Details page.

Once the goal is configured the Referrers: Domains report will show what percentage of visitors referred by that domain reached the goal in the "Goals % of Hits" column. This can help you determine which external links are bring qualified visitors to your site. This information is also available for search words and search phrases which can help you determine what terms to emphasize at your site to get the most effective results from search engines.

Value

Summary Plus, SP Lite, and SP calculate the value of visits associated with each time period, referring domain, search phrase and search word. Each visit is tracked to see if it requested any item from each of the content groups. The value of a visit is the sum of the values of all the content groups that the visit requested content from.

This feature can be used to anticipate future income. Imaginethat you sell 2 product lines. Cheap wigits sell for $1 each and expensive wigits sell for $1,000 each. Let's say that you estimate that one out of each 100 visitors to a wigit information page evenutally purchases a wigit. You would configure the content groups as follows. The first content group contains the cheap wigit information page and you would value it at 1 cent. The second content group would contain the expensive wigit page at $10.

You can look at the Time: Monthly: Monthly Metrics report to see the estimated sales amount that you can expect from visitors during each month. You can look at the Referrers: Domains: By Value report to see the approximate dollar value of business generated by each site that has links to yours. You can also look at Referrers: Search Words: By Value to see how much business each search word is bringing in. This feature is only available in Summary Plus, SP Lite, and SP.

Leakage

Leakage information is only available in Summary Plus, SP Lite, and SP. Summary can track one page per sub-report to see what percentage of the time an associated graphic fails to get loaded. High leakage percentages may indicate a problem with either your page design or your server.

Users will frequently leave a page before it finishes loading if that page takes a very long time to load. Long load times can be caused by having too many graphics or graphics which are too large on the page or by server problems which slow down the serving of content. There are various other things that can cause the graphic to not get loaded such as when a visitor has the loading of graphics disabled or if a web robot is loading pages but not graphics.

The page and associated graphic to track are configured with "Leakage tracking page" and "Leakage tracking graphic" on the Subreport Details configuration page. You should set the tracking page to the name of a page that is commonly used at your site. Set the tracking graphic to a graphic which is used only once on that page, preferrably one which is near the bottom of the page. Results are reported in the Leakage column of the Time: Metrics reports.

Installing Summary as a Windows Service

Summary can be installed to run as a service under Windows NT/2000/XP/2003. If you don't install Summary as a service it will quit automatically any time you log out. Installing it as a service allows it to remain running any time the machine is on, even if no one is logged in.

To install Summary as a service you should run the installer that you have downloaded from our website and select Custom Install. Make sure that you install all of the components including the Service component, which is not installed with a Typical install.

After Summary is installed you should do the basic configuration using the normal version of Summary. Then, quit the normal, application, version of Summary, locate the directory where the Summary application was installed, and double-click Install Service. This will install Summary as a service and start it up.

When the Summary service starts up it will post an event to the event log that gives the URL for connecting. This will normally be the same as the one it was using when run as a normal application. You can check the status of the Summary service using the Services Control Panel.

When running as a service, Summary will write status messages to a log file in the data folder.

You can stop the Summary service using the Services Control Panel or you can remove it completely by double-clicking Remove Service, which is in the same folder where you installed the Summary application.

Customizing Reports

Summary allows you to reconfigure which columns appear in a report, what the default sort order is for the report, and allows you to define custom filtering. Any changes to the report layout apply to all sub-reports. Any report can have it's layout customized, but in Summary Plus, SP Lite, and SP six reports are provided specifically for this purpose in the custom section.

To customize a report look at the report in your browser and click on the wrench icon in the right-hand corner of your screen. This will take you to the Configure Report page where you can make the changes you want.

Most of the reports have a large number of columns that can appear in the report including several that aren't used in Summary by default. For example, you can add line numbers. Because the browser tends to cache the old copy of a page you may not see the results of your change immediately. Reloading the page should solve the problem.

Searching Within Reports

Summary allows you to search most reports for specific items that you might be interested in. If the search string you enter matches any portion of the name of an item in the report it will be included in the results. Lowercase letters will give both lowercase and uppercase results. Uppercase letters will only give uppercase results.

There are couple of special characters that you can use for more flexible searching. A question mark (?) will match any single character. An asterisk (*) will match any sequence of 0 or more characters. A back slash (\) forces the next character in the search string to be matched exactly. This can be used to search for question marks or to force lower case letters to only match lower case. If the search string starts with a tilde (~) Summary will display all of the items that don't match the search string.

You can also use regular expressions in search strings. If the first character of the search string is a single quote (') the rest of the search string will be interpreted as a regular expression. Summary uses Perl style regular expressions. You can also use a tilde (~) along with regular expressions to look for everything that doesn't match the regular expression. In that case a tilde (~) must be the first character of the first string and the single quote (') must be the second character.

Translating CGI Arguments to be More Readable

Summary SP Lite and SP can translate the value portion of "name=value" pairs into more readable strings based on translations that you supply. These translations can be used to make the CGI Arguments reports more readable. For example, if you have a CGI argument "partnumber=37" you can have Summary translate that into "partnumber=blue widget" (assuming part 37 is a blue widget).

Summary scans it's config folder for files whose name ends with ".tran" and reads these files for the translations that you have supplied. Lines in these files can be in several different formats:

<name>=<original value><TAB><translated to>
Translate the value <original value> into <translated to> for CGI argument <name>. <TAB> is one ore more tab characters that you entered when you created the .tran file.
<original value><TAB><translated to>
Translate the value <original value> into <translated to> for the current CGI argument name. The default CGI argument name starts as the name of the .tran file with the .tran removed. It can also be set with the [tag <name>] (see below).
[tag <name>]
Set the current CGI argument name to <name>.
[sub-report-num <number>]
Subsequent translations apply only to the sub-report of the given number.
[sub-report <name-of-sub-report>]
Subsequent translations apply only to the named sub-report. Name is matched against the "Site descriptive name".
[sub-report-id <id-of-sub-report>]
Subsequent translations apply only to the sub-report with the given id. Id is matched against "Identifier for use in report URLs".
[sub-report-all]
Subsequent translations apply to all sub-reports.

If the <original value> is "(default)" summary will translate all values that you don't have an explicit translation for into the your default.

For example, if you have a translation file named product.tran containing:

product=1    small wigit
2    large wigit
product=(default)    unknown product
[sub-report Joes Wigits]
3    blue wigit

Summary will translate "product=1" to "product=small wigit", "product=2" into "product=large wigit", only in the Joe's Wigits sub-report it will translate "product=3" into "product=blue wigit", and will translate "product=<anything else>" into "product=unknown product".

 

Quick Start | Overview | Tutorial | How To | Configuration
Javascript Code | Virtual Domains | Log Formats | Custom Overviews
Questions | Reports | Purchasing | FAQ | Glossary

Copyright 1998-2004 by Summary.Net - Updated 2/27/04