Update Diagnosing EP06-A Drops After Attach

2025-11-24 08:51:49 +00:00
parent 44283f11d6
commit 410bdf7b77
1 changed files with 8 additions and 8 deletions
--- a/Attach.-.md
+++ b/Attach.-.md
@@ -26,13 +26,13 @@ Monitor QMI Connection Status: Use the QMI control utility to watch the link's s
 [forum.gl-inet.com](https://forum.gl-inet.com/t/e750-wan-not-reconnecting/19576#:~:text=Yes%2C%20when%20it%20happened%2C%20LTE,or%20reboot%20the%20whole%20router)
 . In that case the issue might be a silent network drop or a desync between the modem and host. (In summary, a carrier-initiated drop will usually be reflected in QMI status switching to disconnected relatively promptly, whereas a purely host-side glitch might not update the status correctly.)
-Poll PDP Context via AT Commands: In parallel with QMI, query the modem's own view of the PDP context. Using the AT command interface (e.g. via AT+CGACT? and AT+CGPADDR=1 on the modem's AT USB port) will show whether the PDP context is active and its IP address. Do this at startup, and around the 3-minute mark. Before the drop, you should see the context activated (e.g. +CGACT: 1,1 and an IP address from +CGPADDR: 1,<ip>). If the link drops, check these again: if you now see +CGACT: 1,0 (inactive) and no IP, that means the modem itself acknowledges the PDP context deactivated. In conjunction with QMI reporting a disconnect, this confirms the session truly ended (likely carrier or modem triggered it). Conversely, if the AT commands still show the context up (1,1 with an IP) even when you've lost connectivity or QMI says disconnected, that indicates a mismatch – the modem thinks it's still attached while the host/network layer is out of sync. For example, a QMI client might have dropped while the modem PDP is actually still active, or the context is “stuck” active in the modem despite the network path being gone. Detecting such a state means the drop was not cleanly handled: possibly a QMI client issue or a network drop that the modem firmware didn't report properly. Using AT+CGACT/CGPADDR polling basically lets you double-check the modem's truth vs. the QMI/OS view at the critical moment.
+Poll PDP Context via AT Commands: In parallel with QMI, query the modem's own view of the PDP context. Using the AT command interface (e.g. via AT+CGACT? and AT+CGPADDR=1 on the modem's AT USB port) will show whether the PDP context is active and its IP address. Do this at startup, and around the 3-minute mark. Before the drop, you should see the context activated (e.g. +CGACT: 1,1 and an IP address from +CGPADDR: 1,<ip>). If the link drops, check these again: if you now see +CGACT: 1,0 (inactive) and no IP, that means the modem itself acknowledges the PDP context deactivated. In conjunction with QMI reporting a disconnect, this confirms the session truly ended (likely carrier or modem triggered it). Conversely, if the AT commands still show the context up (1,1 with an IP) even when you've lost connectivity or QMI says disconnected, that indicates a mismatch - the modem thinks it's still attached while the host/network layer is out of sync. For example, a QMI client might have dropped while the modem PDP is actually still active, or the context is “stuck” active in the modem despite the network path being gone. Detecting such a state means the drop was not cleanly handled: possibly a QMI client issue or a network drop that the modem firmware didn't report properly. Using AT+CGACT/CGPADDR polling basically lets you double-check the modem's truth vs. the QMI/OS view at the critical moment.
-Inspect System Logs (netifd/QMI Events): Examine the logs on the router (e.g. logread or dmesg on OpenWrt) for any messages around the 180-second mark. You'll want to look for QMI or wwan interface related logs. For instance, OpenWrt's netifd might log events like “Interface 'wwan' is now down” when the cellular interface drops. You might also see messages from uqmi or the QMI driver if an error occurred (e.g. “uqmi[xxxx]: Failed to connect to service” or “Call failed” or QMI error codes). In some cases, the modem may output an unsolicited message on the AT log (e.g. a +QIND: PDP DEACT or similar) if the network initiated a cut – though uqmi should catch it. Key things to grep for: “wwan” (to see interface up/down changes), “qmi” (any QMI client or driver errors), and any obvious error or disconnect messages. If the drop is carrier-initiated, often you'll see a log around that time indicating loss of data service or network detach – for example, a message that the modem is no longer registered or that the PDP context was lost (on some systems, you might see a change of state logged, similar to how MikroTik logs show “not registered, state: 0” when the link drops)
+Inspect System Logs (netifd/QMI Events): Examine the logs on the router (e.g. logread or dmesg on OpenWrt) for any messages around the 180-second mark. You'll want to look for QMI or wwan interface related logs. For instance, OpenWrt's netifd might log events like “Interface 'wwan' is now down” when the cellular interface drops. You might also see messages from uqmi or the QMI driver if an error occurred (e.g. “uqmi[xxxx]: Failed to connect to service” or “Call failed” or QMI error codes). In some cases, the modem may output an unsolicited message on the AT log (e.g. a +QIND: PDP DEACT or similar) if the network initiated a cut - though uqmi should catch it. Key things to grep for: “wwan” (to see interface up/down changes), “qmi” (any QMI client or driver errors), and any obvious error or disconnect messages. If the drop is carrier-initiated, often you'll see a log around that time indicating loss of data service or network detach - for example, a message that the modem is no longer registered or that the PDP context was lost (on some systems, you might see a change of state logged, similar to how MikroTik logs show “not registered, state: 0” when the link drops)
 [forum.mikrotik.com](https://forum.mikrotik.com/t/lte-cat6-modem-disconnecting-every-2-3-minutes/135493#:~:text=23%3A09%3A22%20lte%2Cinfo%20WAN2,LTE%20link%20up)
-. If the logs clearly show the interface going down at 3m with no manual intervention, that implies something (network or device) triggered it – this is strong evidence. On the other hand, if nothing is logged at all and the interface only drops much later when you, say, manually restart it, that suggests the modem didn't inform the host immediately (which again points to a silent network drop or a host not listening for the event). Also check for any “client ID released” or “USB disconnect” in dmesg – if, for example, the USB interface reset (would hint the modem rebooted) or netifd closed the QMI client, you'd catch it here. The presence of a QMI error or timeout in logs at the drop would lean towards a QMI/host issue (e.g. uqmi might have crashed or given up), whereas a clean “network disconnected” type message would point to the carrier/network layer initiating it.
+. If the logs clearly show the interface going down at 3m with no manual intervention, that implies something (network or device) triggered it - this is strong evidence. On the other hand, if nothing is logged at all and the interface only drops much later when you, say, manually restart it, that suggests the modem didn't inform the host immediately (which again points to a silent network drop or a host not listening for the event). Also check for any “client ID released” or “USB disconnect” in dmesg - if, for example, the USB interface reset (would hint the modem rebooted) or netifd closed the QMI client, you'd catch it here. The presence of a QMI error or timeout in logs at the drop would lean towards a QMI/host issue (e.g. uqmi might have crashed or given up), whereas a clean “network disconnected” type message would point to the carrier/network layer initiating it.
-Use Heartbeat Traffic (Ping Tests): Sending periodic pings through the cellular interface is an excellent way to see the real-time connectivity. Set up a cron or script to ping -I wwan0 -c 1 8.8.8.8 every 15–30 seconds and timestamp the results. This will reveal exactly when connectivity is lost and whether it recovers. Interpretation: If you observe that pings respond normally for, say, 170 seconds and then consistently time out after ~180s (and continue failing), it means traffic can no longer get through – a strong sign the PDP context is down or the path is blocked. If those ping failures align with QMI reporting disconnect and a log event, the case for a carrier-drop is very strong. If instead you see one or two pings drop around 3 minutes and then ping replies resume on their own, that suggests the PDP context actually remained active and a NAT binding simply timed out and got re-established when you sent new traffic. In other words, a temporary outage with self-recovery points to NAT timeout rather than full PDP teardown – the ping you sent after the idle period effectively refreshed the NAT mapping and restored traffic flow
+Use Heartbeat Traffic (Ping Tests): Sending periodic pings through the cellular interface is an excellent way to see the real-time connectivity. Set up a cron or script to ping -I wwan0 -c 1 8.8.8.8 every 15-30 seconds and timestamp the results. This will reveal exactly when connectivity is lost and whether it recovers. Interpretation: If you observe that pings respond normally for, say, 170 seconds and then consistently time out after ~180s (and continue failing), it means traffic can no longer get through - a strong sign the PDP context is down or the path is blocked. If those ping failures align with QMI reporting disconnect and a log event, the case for a carrier-drop is very strong. If instead you see one or two pings drop around 3 minutes and then ping replies resume on their own, that suggests the PDP context actually remained active and a NAT binding simply timed out and got re-established when you sent new traffic. In other words, a temporary outage with self-recovery points to NAT timeout rather than full PDP teardown - the ping you sent after the idle period effectively refreshed the NAT mapping and restored traffic flow
 [blog.wirelessmoves.com](https://blog.wirelessmoves.com/2020/09/carrier-grade-nat-timeouts-and-how-to-configure-your-xmpp-server.html#:~:text=So%20a%20TCP%20keep%20alive,alive%20was)
 . On the flip side, if pings never actually fail (suppose you had a continuous ping running and it sails through the 3-minute mark without issues), yet around that time your management interface says “disconnected,” that would mean QMI/netifd thought the link dropped even though data was still flowing. That scenario would implicate a false drop indication by the software (a QMI client bug or mis-detection). Using pings in conjunction with the above checks not only helps detect the drop moment, but also can prevent drops due to inactivity. In fact, many implementations recommend periodic pings or keep-alive packets to keep the cellular link alive
 [docs.monogoto.io](https://docs.monogoto.io/getting-started/general-device-configurations/iot-devices/simcom-sim7600g-h#:~:text=When%20cellular%20modems%20are%20idle,device%20as%20being%20actively%20used)
@@ -42,16 +42,16 @@ Correlate Multi-Layer Data to Pinpoint the Cause: Finally, bring all the observa
 Carrier/Network Initiated: If you see the modem's PDP context go down (AT reports inactive) and QMI status shows disconnected right at ~3 minutes, and logs/netifd indicate the link dropped without your input, it's likely the carrier ended the session. This would align with an inactivity timeout or some network policy expiring the PDP context
 [networkengineering.stackexchange.com](https://networkengineering.stackexchange.com/questions/23810/in-which-cases-is-the-pdp-context-terminated#:~:text=Yes%2C%20simply%20not%20sending%20any,minutes)
-. The fact it stays up indefinitely after traffic is introduced reinforces this – essentially the network requires early traffic or it will assume the session isn't needed. In this case, focusing on keep-alives (or contacting the carrier about PDP timeout settings) is the solution. (Example cause: GGSN/PGW idle timer expired – the network sent a PDP deactivate, which the modem obeyed, dropping the link.)
+. The fact it stays up indefinitely after traffic is introduced reinforces this - essentially the network requires early traffic or it will assume the session isn't needed. In this case, focusing on keep-alives (or contacting the carrier about PDP timeout settings) is the solution. (Example cause: GGSN/PGW idle timer expired - the network sent a PDP deactivate, which the modem obeyed, dropping the link.)
-NAT Timeout (no explicit PDP drop): If the only symptom of the “drop” is that traffic stops after 3 minutes idle but the QMI/AT status still show as if connected, then the PDP context is still up but the path was broken by NAT. In this scenario, you might notice that sending a new ping or some data after the drop reanimates the connection (since it causes a new NAT mapping). The modem never indicated a disconnect in this case. The layer that “initiated” the apparent drop is the carrier's NAT gateway – it silently stopped forwarding traffic due to inactivity. The best detection here is the pattern of ping failures that recover with a new ping, combined with steady “connected” status in both QMI and AT. The remedy is to implement a periodic keepalive packet (ping or a UDP packet) to keep the NAT binding alive
+NAT Timeout (no explicit PDP drop): If the only symptom of the “drop” is that traffic stops after 3 minutes idle but the QMI/AT status still show as if connected, then the PDP context is still up but the path was broken by NAT. In this scenario, you might notice that sending a new ping or some data after the drop reanimates the connection (since it causes a new NAT mapping). The modem never indicated a disconnect in this case. The layer that “initiated” the apparent drop is the carrier's NAT gateway - it silently stopped forwarding traffic due to inactivity. The best detection here is the pattern of ping failures that recover with a new ping, combined with steady “connected” status in both QMI and AT. The remedy is to implement a periodic keepalive packet (ping or a UDP packet) to keep the NAT binding alive
 [blog.wirelessmoves.com](https://blog.wirelessmoves.com/2020/09/carrier-grade-nat-timeouts-and-how-to-configure-your-xmpp-server.html#:~:text=file%20takes%20immediate%20effect%2C%20even,kill%20the%20TCP%20session%20and)
 [docs.monogoto.io](https://docs.monogoto.io/getting-started/general-device-configurations/iot-devices/simcom-sim7600g-h#:~:text=When%20cellular%20modems%20are%20idle,device%20as%20being%20actively%20used)
 . This will prevent the illusion of a drop by ensuring the network sees the device as active.
-Device/QMI Initiated: If you find that the modem's PDP context was actually still active (AT says active, or it re-connects quickly without an OTA attach) but the host network interface went down around 3 minutes (for example, log shows “wwan down” or uqmi error), then the drop was triggered internally. It could be that netifd or the QMI driver decided the link was dead (perhaps due to a missed heartbeat or a mis-read state) and it issued a disconnect or reset. Or uqmi might have crashed/timeout, dropping the client ID. In this case, QMI status might show “disconnected” (because netifd closed it) while the modem was in fact still registered and reachable. The telltale signs would be a log message about QMI or the interface shutting down without a corresponding network deregistration. Another sign is if immediately after the drop you can manually query the modem (via AT or another QMI client) and find the data session still present. To double-check, you could try running a manual uqmi --get-data-status or even an AT+PING from the modem at that time – if it works despite the interface being marked down, definitely the host dropped the ball. In summary, a host-initiated drop means the issue lies in the device firmware, QMI software, or configuration. The solution might be updating firmware, using a more robust connection manager, or adding explicit watchdog logic to reconnect if this condition is detected. Logging the QMI client ID allocation and any “release” events would also help confirm this. (For instance, if you see a log like “<wizard> releasing QMI client” around 180s, you've found the smoking gun that the software closed it.)
+Device/QMI Initiated: If you find that the modem's PDP context was actually still active (AT says active, or it re-connects quickly without an OTA attach) but the host network interface went down around 3 minutes (for example, log shows “wwan down” or uqmi error), then the drop was triggered internally. It could be that netifd or the QMI driver decided the link was dead (perhaps due to a missed heartbeat or a mis-read state) and it issued a disconnect or reset. Or uqmi might have crashed/timeout, dropping the client ID. In this case, QMI status might show “disconnected” (because netifd closed it) while the modem was in fact still registered and reachable. The telltale signs would be a log message about QMI or the interface shutting down without a corresponding network deregistration. Another sign is if immediately after the drop you can manually query the modem (via AT or another QMI client) and find the data session still present. To double-check, you could try running a manual uqmi --get-data-status or even an AT+PING from the modem at that time - if it works despite the interface being marked down, definitely the host dropped the ball. In summary, a host-initiated drop means the issue lies in the device firmware, QMI software, or configuration. The solution might be updating firmware, using a more robust connection manager, or adding explicit watchdog logic to reconnect if this condition is detected. Logging the QMI client ID allocation and any “release” events would also help confirm this. (For instance, if you see a log like “<wizard> releasing QMI client” around 180s, you've found the smoking gun that the software closed it.)
-By implementing the above logging and checks, you'll be able to catch the moment of failure in real time and see which layer's status changes first. In practice, a combination of QMI status change + modem PDP dropping is usually a carrier-triggered event, whereas a host interface drop with modem still saying connected points to a QMI/client issue. And if everything stays nominal except the ability to pass traffic, that points to a networking issue like NAT. Using a 3-minute marker in your logs (as you suggested) is wise – print out uqmi --get-data-status, uqmi --get-serving-system, AT+CGACT?, etc., at T+180s, and continue for a few minutes. This comprehensive view will definitively show whether the EP06-A is dropping due to the carrier's PDP context timing out (in which case you see a clean teardown from the network side) or due to something internal like a QMI client state mismatch or software reset (where the network was fine but the device dropped). Once you identify who initiates the drop, you can take targeted action – e.g. enable periodic keep-alive pings to appease a carrier idle timer
+By implementing the above logging and checks, you'll be able to catch the moment of failure in real time and see which layer's status changes first. In practice, a combination of QMI status change + modem PDP dropping is usually a carrier-triggered event, whereas a host interface drop with modem still saying connected points to a QMI/client issue. And if everything stays nominal except the ability to pass traffic, that points to a networking issue like NAT. Using a 3-minute marker in your logs (as you suggested) is wise - print out uqmi --get-data-status, uqmi --get-serving-system, AT+CGACT?, etc., at T+180s, and continue for a few minutes. This comprehensive view will definitively show whether the EP06-A is dropping due to the carrier's PDP context timing out (in which case you see a clean teardown from the network side) or due to something internal like a QMI client state mismatch or software reset (where the network was fine but the device dropped). Once you identify who initiates the drop, you can take targeted action - e.g. enable periodic keep-alive pings to appease a carrier idle timer
 [docs.monogoto.io](https://docs.monogoto.io/getting-started/general-device-configurations/iot-devices/simcom-sim7600g-h#:~:text=When%20cellular%20modems%20are%20idle,device%20as%20being%20actively%20used)
 , or fix the QMI driver/client usage if that's the culprit. This layered diagnostic approach ensures you catch the exact trigger of the 3-minute dropout and address the correct layer. 
 [networkengineering.stackexchange.com](https://networkengineering.stackexchange.com/questions/23810/in-which-cases-is-the-pdp-context-terminated#:~:text=Yes%2C%20simply%20not%20sending%20any,minutes)