There are several mechanisms for monitoring (and troubleshooting) SFM.
For more information on troubleshooting, see Troubleshooting.
To reach the monitoring page, click “Monitor” on the header of any page in SFM UI.
The monitor page provides status and queue lengths for harvesters and exporters.
The status is based on the most recent status reported back by each harvester or exporter (within the last 3 days). A harvester or exporter reports its status when it begins a harvest or export. It also reports its status when it completes the harvest or exporter. Harvesters will also provide status updates periodically during a harvest.
Note that if there are multiple instances of a harvester or exporter (created with docker-compose scale), each instance will be listed.
The queue length lists the number of harvest or export requests that are waiting. A long queue length can indicate that additional harvesters or exporters are needed to handle the load (see Scaling up with Docker) or that there is a problem with the harvester or exporter.
It can be helpful to peek at the logs to get more detail on the work being performed by a harvester or exporter.
The logs for harvesters and exporters can be accessed using Docker’s log commands.
First, determine the name of the harvester or exporter using
docker ps. In general,
the name will be something like sfm_twitterrestharvester_1.
Second, get the log with
docker logs <name>. Add -f to follow the log. For example,
docker logs -f sfm_twitterrestharvester_1.
Side note: To follow the logs of all services, use
docker-compose logs -f.
Twitter Stream Harvester logs¶
Since the Twitter Stream Harvester runs multiple harvests on the same host, accessing its logs are a bit different.
First, determine the name of the Twitter Stream Harvester as described above.
Second, determine the harvest id. This is available from the harvest’s detail page.
Third, get the log with
docker exec -t <name> cat /var/log/sfm/<harvest id>.out.log.
To follow the log, use tail -f instead of cat. For example,
docker exec -t sfm_twitterstreamharvester_1 tail -f /var/log/sfm/d7a900095efa449cb9a1460e70780ccc.out.log.
Several of the services used by SFM offer management consoles that can be useful for monitoring.
For each of these, the username, password, and port are available from your .env file.
The RabbitMQ Admin is usually available on port 15672. For example, http://localhost:15672/.
The Heritrix management console is usually available on port 8443. For example, https://localhost:8443/.
Note that you must used HTTPS to reach the management console. You may be warned by your browser about the certificate; it is safe to proceed.