WebShot  

 
Documentation

Command Line Parameters
Command Line Usage
Batch Mode
Xml Configuration
Comma Separated Values Output
MessageBox Automation
Output Filename Masking
Watermarking Images
Implementation Considerations
Implementation Tips
Developer Implementation
Developer API Documentation
Frequently Asked Questions


Command Line Parameters

The command line version of WebShot is named webshotcmd.exe and is located in the WebShot installation folder. Below are the command line arguments that you can use to configure the screenshot generation process. Some parameters require double-quotes around their values.

Parameter Table

ArgumentXmlDescriptionDefault Value
/urlWebsite url
/inBatchFileText file with urls on each line 
/outImagePathOutput image filewebshot.jpg
/widthImageWidthImage widthBrowser width
/heightImageHeightImage heightBrowser height
/xwidthImageWidthMaximumImage width maximum  
/xheightImageHeightMaximumImage height maximum  
/bwidthBrowserWidthBrowser widthAutomatically determined
/bheightBrowserHeightBrowser heightAutomatically determined
/timeoutTimeoutMaximum time to wait in sec for process to finish
Last ditch attempt to kill process, use other wait parameters first
Do not use with batch mode
Infinite / 0
/timeoutpgTimeoutPageMaximum time to wait for page to load in secondsInfinite / 0
(Recommended 85)
/timeoutmetaTimeoutMetaMaximum time in secs allowed for meta refresh 
/waitdocWaitDocumentTime to wait for scripts and controls to load after document is complete (seconds) 0
(Recommended 10)
/waitdocflWaitDocumentFlashTime to wait for scripts and controls to load after document is complete on a page with flash content (seconds)
waitdoc value used if not specified
 
/waitimgWaitImageTime to wait right before the image is captured0
(Recommended 2)
/waitimgloopWaitImageLoopNumber of images to capture1
/bmwidthBrowserWidthMinMinimum browser width0
/bmheightBrowserHeightMinMinimum browser height0
/bxwidthBrowserWidthMaxMaximum browser width0
/bxheightBrowserHeightMaxMaximum browser height0
/qualityImageQualityImage quality (0-100)100
/typeImageTypeImage encoder (ie. png, gif, jpg, bmp)jpg
/cropImageCropAmount to crop image l,t,r,b (ie. 100,0,50,0) 
/csvCsvPathOutput results to specified csv file 
/usernameHttpUsernameHTTP Authentication username
/passwordHttpPasswordHTTP Authentication password
/linkLinkPathOutput the main page's links to specified file
/htmlHtmlPathOutput main html source to specified file.
Specifying a .txt file will only output the text from the page
/headersHttpHeadersCustom http headers separated by || 
/postdataHttpPostDataCustom post string 
/useragentHttpUserAgentCustom user agent string 
/redirectmaxRedirectMaxMaximum number of redirects to allow1
/grayscaleImageGrayscaleMake output image in grayscale 
/bprintBrowserPrintPrints to the default printer (CEF only)
/bshowBrowserVisibleShows the web browser control window
/wmfilenameWatermarkFilenameSet the watermark image
Only uses 24-bit bitmaps, blend mode is Multiply
 
/wmpositionWatermarkPositionSet the position of the watermark image"0.0x0.0"
/wmopacityWatermarkOpacitySet the watermark image opacity100
/threadsThreadMaxBatch mode only, number of threads to use1
/processesBatch mode only, number of processes to use
Use /ptimeout for main process timeout.
1
/clrcacheClears ALL Internet Explorer cache
Should only be used once every 5000-10000 screenshots.
 
-nosaveImageSaveToDiskSets wheither to save images to disk 
-noactivexDisableActiveXDisables running of ActiveX controls 
-noscriptsDisableScriptsDisables running of scripts 
-ignoreerrIgnoreErrorsContinues to capture image even if navigation error 
-fileDebugTurns on debug logging to file 
 DebugFilenameSpecifies the log filename (ie -file mine.log) 
-perthreadDebugPerThreadTurns on creation of debug logs per thread 
-appendAppendTurns on debug append logging 
-verboseVerboseTurns on verbose debug logging 
-noapp Turns off the pluggable protocol handler 

By default the image width and height are taken from the browser width and height which are determined automatically. On some pages it is difficult to correctly determine the width and height for a page, therefore it is recommended that you specify and minimum browser width and height.

Timing is everything

Just like a camera has a bunch of options to allow you to take the best possible picture, so does WebShot. Some of these configurable options come in the form of wait parameters that specify amounts of time to wait before or after certain browser events occur. Because of the nature of the web these days, a lot of websites have scripts or activex controls that load after a page has been downloaded by the browser. Flash is a good example of an activex control that takes time to load after the page has completed downloading. Some content heavy flash and active-x controls may take longer to load than others. Therefore the wait parameters allow you to wait a specified amount of time for objects on the pages to load.

Command Line Usage

webshotcmd.exe /url "http://www.microsoft.com/"
Takes a full screenshot capture of a website

webshotcmd.exe /url "http://www.microsoft.com/" /bwidth 600
Takes a screenshot of a website and clips the image off at 600 pixels high

webshotcmd.exe /url "http://www.microsoft.com/" /width 800 /height 600
Takes a full screenshot capture of a website and creates a thumbnail of it with the size 800x600

webshotcmd.exe /url "http://www.google.com/" /headers "Accept-Language: en||Referer: http://www.google.com/ig"
Takes a screenshot of a website with request headers

webshotcmd.exe /url "http://www.google.com/" /postdata "Username=TestUser&Password=TestPass&Submit=OK".
Takes a screenshot of a website with request headers

Batch Mode

When using batch mode it is also possible to pass command line parameters in the batch text file. There are two ways to format the batch text file as shown below.

Method 1 Method 2
http://www.google.com/
http://www.yahoo.com/
/url "http://www.google.com/" /out "google.gif"
/url "http://www.yahoo.com/" /out "yahoo.gif"

Xml Configuration

Instead of passing the same parameters through the command line each time you run WebShot, you can setup an xml configuration file that contains your most commonly used parameters.

<WebShot>
	<Debug>FALSE</Debug>
	<ImagePath>\images\</ImagePath>
	<BrowserWidth>1024</BrowserWidth>
	<BrowserHeight>768</BrowserHeight>
	<BatchFile>urls.txt</BatchFile>
	<Verbose>TRUE</Verbose>
</WebShot>

The xml configuration file should be in the same directory as webshotcmd.exe and should be named webshotcmd.xml.

Comma Separated Value Output

The success or failure of screenshot generation can be output to a csv file. By default, the csv file is named webshot.csv and is in the same directory as WebShot. You can specify your own filename if needed using the /csv parameter. Results are always appended to the specified file and follow the following format:

Url, Image FileName, Error Message, Browser Width, Browser Height, Image Width, Image Height, Timestamp, Page Title, Meta Keywords, Meta Description, Redirect Url

Click here to see an example csv file.

All column values have double quotes around them. The error message field may have more than one error message. Each csv entry is separated by a return break sequence \r\n. As a final check to see wheither the screenshot was successful you may want to check to make sure that the image file exists on disk.

MessageBox Automation

Sometimes the web browser engine can popup messages. When a message box occurs WebShot allows you to choose the best response for the dialog. It then stores you selection so that it will know how to deal with message boxes of that type in the future.

The settings for this automation are stored in webshotauto.xml in the same directory as the webshotcmd.exe. UTF8 is supported because Internet Explorer uses multiple languages. Below is an example xml configuration automation file.

<WebShotAutomation>					
	<MessageBox>
		<Text>Stack overflow at line: 0</Text>
		<Caption>Windows Internet Explorer</Caption>
		<Result>Ok</Result>
	</MessageBox>
</WebShotAutomation>
SectionInformation
TextText to match against. Match occurs if Text is found in the text of the message box. UTF8 support. Case insensitive support for ASCII.
CaptionCaption to match against. Match occurs if Caption is found in the title of the message box. Not required. Case insensitive matching.
ResultResponse to message box. Can be: Ok, Cancel, Yes, No, Retry, Close, Ignore, Abort, Help, Try Again, Countinue.
DebugInfoInformation about the message box which is displayed in debug log. Not required.
DebugUrlInformation about the url that it occured at. Only for debugging purposes, not required.
MatchUsed to match all message boxes against a particular rule. Can be: TRUE or FALSE.

WebShot comes with an automation configuration file that contains some of the common message boxes that I've encountered.

Output Filename Masking

The following masks are supported and only apply to the file title portion of the resultant path only. The graphical, command line, and dll interfaces all support this masking. The command line parameters that support it are /out, /html, and /link.

MaskValue
%mUrl Md5 Hash (Default)
%hUrl Hostname
%dUrl Domain name
%eUrl Domain name without Tld
%pUrl Path
%uPage Title
%lLiteral Timestamp (20060130120505 / YMDHMS)
%tUnix Timestamp

The final output path should not exceed more than 256 characters. The following characters are stripped from the resultant output filename "? \ / : & = %". When using these output filename masks in a command prompt batch file please note that you will have to use %% instead of %, because batch files process % as batch file arguments.

Watermarking Images

All watermark images must be 24-bit bitmaps. You can position your watermark image using /wmposition which uses x and y multiplier values to set the location of the watermark. For instance to set the watermark image at the bottom of the screenshot use /wmposition "0.0x1.0". To set it all the way to the right use "1.0x0.0". Opacity is set using 0-100%. Below is an example.


24-bit bitmap watermark image


Resulting screenshot

Implementation Considerations

WebShot has been designed to have low footprint and high performance.

If you are planning on using batch mode, it is important to know that WebShot uses the same engine as Internet Explorer to take screenshots of webpages. Occasionally Internet Explorer web browser engine has been known to leak resources. The leaks that occur leak into the WebShot process space. When the WebShot process closes all the leaked resources are reclaimed by the operating system. There is a limitation to the amount of screenshots that you can do per batch in batch mode. You should only be concerned about this if you are using batch mode to do several thousand screenshots PER BATCH.

If you are planning developing a service that uses the WebShot DLL it is important to make your service recyclable.

The configuration of your screenshot harvesting machine makes a difference. It is important not to install a bunch of unnessecary BHO's (browser helper objects) or add-ons for Internet Explorer. The more add-ons installed, the longer it takes to load the web browser control.

If on the machine you are harvesting you are browsing with Internet Explorer and the window hangs it is possible that it will hang all of the web browser controls used by the system. In this case, if you just kill the hanged Internet Explorer process it will release all of the rest.

It is recommended that you so some simple validation on the urls that you pass to the utility in the essence of saving time.

Implementation Tips

There are many ways to implement the command line interface into your application. Below are some implementations tips that have been helpful for others.

Windows XP/2003 32-bit

In Windows there is a limitation to the number of consecutive Internet Explorer windows that can be opened. It is limited, so you can open maybe 30-40 or less, depending what else is running on the system. The solution for this problem is to increase the desktop heap size via the registry.

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\
Control\Session Manager\SubSystems

Key: Windows
Value: %SystemRoot%\system32\csrss.exe ObjectDirectory=\Windows SharedSection=1024,3072,512 Windows=On SubSystemType=Windows ServerDll=basesrv,1 ServerDll=winsrv:UserServerDllInitialization,3 ServerDll=winsrv:ConServerDllInitialization,2 ProfileControl=Off MaxRequestThreads=24

http://support.microsoft.com/kb/126962

IIS7 / IIS6?

In the Internet Information Services Manager under Application Pool, the DefaultAppPool needs to be started by the LocalSystem identity. The default NetworkService identity does not have the permissions required to run WebShot.

PHP and IIS6

From the php.net website relating to the exec function..

"When trying to run an external command-line application in Windows 2000 (Using IIS), I found that it was behaving differently from when I manually ran it from a DOS prompt.

Turned out to be an issue with the process protection. Actually, it wasn't the application itself that was having the problem but one it ran below it! To fix it, open computer management, right-click on Default Web Site, select the Home Directory tab and change Application Protection to 'Low (IIS Process)'. "

You might also want to create a new Application Pool and set the security account for the application pool from Local Service to Local System and use that.

PHP and Internet Explorer Permissions

Internet Explorer web browser engine stores the security settings for all users separately. If you run WebShot from the command line you run it as the user you are logged in as, but when you run it from PHP it is run under the SYSTEM user. In certain such instances Javascript will not have the proper permissions to execute when taking a screenshot of a Javascript enabled webpage.

Try adding the following registry values to make Internet Explorer use the HKLM, instead of HKCU security settings.

HKEY_LOCAL_MACHINE\Software\Policies\Microsoft\
Windows\CurrentVersion\Internet Settings\

(DWORD) "Security_HKLM_only" = 1

http://support.microsoft.com/kb/182569

ColdFusion

You may need to modify the ColdFusion service to run under a higher privilaged account such as Network Service or Administrator in order to get it to work properly under Windows 2003.

Developer Implementation

With the developer dll included it is possible to implement screenshot generation into your own applications. It can be used to:

  • Create a Windows NT service that that continuely polls a database for urls that need to be screenshotted.
  • Create a Windows NT service that acts as a HTTP server. So that when an HTTP request is set to the service, it responses with an image.
  • Create a COM service that can be used by your scripting language

Below is an example on how to use the dll in C. Source code examples in other languages can be found in the Examples directory in the WebShot installation folder.

int32 WebShotHandle;


WebShot_DllInit("webshot.log", DEBUG_FLAGWINDOW | DEBUG_FLAGFILE);

WebShot_Create(&WebShotHandle);
WebShot_SetVerbose(WebShotHandle, TRUE);

if (WebShot_Open(WebShotHandle, "http://www.websitescreenshots.com/") == FALSE)
	printf("Error: Cannot take screenshot!\n");

WebShot_Destroy(&WebShotHandle);

WebShot_DllUninit();

Frequently Asked Questions

General

Does WebShot have spyware or malware?

No.

Does WebShot work on Windows 95, 98, ME?

No.

When I try to run it on Windows 2000 it says gdiplus.dll is missing.

Download the GDI+ Platform SDK Redistributable and copy the gdiplus.dll file to your WebShot installation directory.
Browser

What rendering engine does WebShot use to take screenshots?

It uses Internet Explorer's rendering engine. WebShot is always tested and development using the latest version of Internet Explorer. Internet Explorer versions 6 and lower are no longer supported.

How come some pages looks different than when I load up Internet Explorer?

By default the web browser control runs in quirks mode. To enable standards mode you have to add a registry key:

HKEY_CURRENT_USER or HKEY_LOCAL_MACHINE

SOFTWARE\Microsoft\Internet Explorer\Main\FeatureControl\FEATURE_BROWSER_EMULATION\

"webshot.exe"=(DWORD) 0x22b8 (IE8) or 0x2328 (IE9)
"webshotcmd.exe"=(DWORD) 0x22b8 (IE8) or 0x2328 (IE9)
"webshot64.exe"=(DWORD) 0x22b8 (IE8) or 0x2328 (IE9)
"webshotcmd64.exe"=(DWORD) 0x22b8 (IE8) or 0x2328 (IE9)

Also see this MSDN article.

Does it support multiple browsers such as Firefox, Chrome or Safari?

It only supports Internet Explorer's rendering engine.

What browser settings are used when navigating pages?

It uses the settings specified by Control Panel >> Internet Options.

How do I prevent script popup windows when navigating to a url?

In Control Panel >> Internet Options >> Advanced, turn on "Disable script debugging" and turn off "Display a notification about every script error".

Why do css styled comboboxes not render properly?

It is a bug in Internet Explorer 6's rendering engine. Upgrading to Internet Explorer 7 or higher resolves the problem.
Javascript

How do I enable or disable Javascript from loading on a page?

To enable or disable Javascript, go to Control Panel >> Internet Options >> Security >> Active Scripting. Also the -noscripts command line parameter can be used to disable javascript on a particular url.
ActiveX / Flash / Embedded Objects

Does WebShot capture flash pages?

Yes. In order to capture flash pages successfully you need to use have Adobe Flash Player installed.

When I take a screenshot the flash does not show correctly?

When you visit a website with a flash object it takes time to download the flash object and run it. The /waitdoc parameter can be used to set the amount of time you want to wait for the page to load to finish loading. By using this parameter it allows time for the flash objects to load on the page.

How do I turn off sounds when navigating pages?

In Control Panel >> Internet Options >> Advanced >> Multimedia there is an option called "Play sounds in webpages". Turning this option off should disable sound from playing. You can also turn off all the sounds on a web page by disabling ActiveX controls from running. There is also the Windows Navigation Sound which can be disabled in Control Panel >> Sound >> Sounds.

How do I disable ActiveX controls from running?

They can be disabled in Control Panel >> Internet Options >> Security. Also using the -norunactivex command line parameter can be used to disable ActiveX controls from running per url.
Security

How secure is WebShot?

WebShot uses the Internet Explorer rendering engine that is a part of Microsoft Windows. The security of WebShot depends on the configuration found in Control Panel >> Internet Options >> Security.

What things can I do to make it more secure?

The -noscripts and -noactivex command line parameters will disable scripts and activex controls from running giving you further security, although it will in some cases effect the resultant image. You can take extra precaution by not installing Office or any Internet Explorer toolbars/add-on which malware writers like to frequently exploit.
Batch Mode

How are images named in batch mode?

In batch mode the filenames of images are MD5 hashes of the complete input url.
Frames

Does WebShot capture sites with frames?

Yes. WebShot does capture sites with frames. By default it tries to automatically determine the width and height of a page with frames which is very difficult and sometimes inaccurate. For best results with framed pages set a minimum browser width and height in the options to ensure that the viewing area of the framed page is properly captured.
Command Line

When running the command line version long urls are cut off

The maximum length a url can be is defined an RFC and is 2048 characters. More information can be found here.

What does the debug log mean when it says "File download window blocked"

In order to stop all popups, WebShot cancels all events that fire the file download web browser notification. It cancels all those events wheither or not an actual file download window is shown.

How stable is WebShot?

WebShot is as stable as the stability of the sum of all its parts. WebShot uses its own internal memory and COM tracking to make sure it does not leak any memory or interfaces. However, Internet Explorer rendering engine has been known to leak memory. Exiting the process will allow the operating system to reclaim the memory if Internet Explorer rendering engine has leaked any.
Miscellaneous

How long did it take you to make WebShot?

I started the project in 2004/2005 and then stopped. I restarted the work in early 2006 and have been actively working on it since.

Who made the icons that you use in WebShot?

All icons are provided by FamFamFam.

What was WebShot written with?

WebShot was written on Windows XP/Vista/7 in C.

What are some sites that use WebShot?

 

WebShot